Computational Analysis and Deep Learning for Medical Care. Группа авторов
parameters of AlexNet.
Table 1.2 Every column indicates which feature map in S2 are combined by the units in a particular feature map of C3 [1].
0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
0 | X | X | X | X | X | X | X | X | X | X | ||||||
1 | X | X | X | X | X | X | X | X | X | X | ||||||
2 | X | X | X | X | X | X | X | X | X | X | ||||||
3 | X | X | X | X | X | X | X | X | X | X | ||||||
4 | X | X | X | X | X | X | X | X | X | X | ||||||
5 | X | X | X | X | X | X | X | X | X | X |
Figure 1.2 Architecture of AlexNet.
First Layer: AlexNet accepts a 227 × 227 × 3 RGB image as input which is fed to the first convolutional layer with 96 kernels (feature maps or filters) of size 11 × 11 × 3 and a stride of 4 and the dimension of the output image is changed to 96 images of size 55 × 55. The next layer is max-pooling layer or sub-sampling layer which uses a window size of 3 × 3 and a stride of two and produces an output image of size 27 × 27 × 96.
Second Layer: The second convolutional layer filters the 27 × 27 × 96 image with 256 kernels of size 5 × 5 and a stride of 1 pixel. Then, it is followed by max-pooling layer with filter size 3 × 3 and a stride of 2 and the output image is changed to 256 images of size 13 × 13.
Third, Fourth, and Fifth Layers: The third, fourth, and fifth convolutional layers uses filter size of 3 × 3 and a stride of one. The third and fourth convolutional layer has 384 feature maps, and fifth layer uses 256 filters. These layers are followed by a maximum pooling layer with filter size 3 × 3, a stride of 2 and have 256 feature maps.
Sixth Layer: The 6 × 6 × 256 image is flattened as a fully connected layer with 9,216 neurons (feature maps) of size 1 × 1.
Seventh and Eighth Layers: The seventh and eighth layers are fully connected layers with 4,096 neurons.
Output Layer: The activation used in the output layer is softmax and consists of 1,000 classes.
1.2.3 ZFNet
The architecture of ZFNet introduced by Zeiler [3] is same as that of the AlexNet, but convolutional layer uses reduced sized kernel 7 × 7 with stride 2. This reduction in the size will enable the network to obtain better hyper-parameters with less computational efficiency and helps to retain more features. The number of filters in the third, fourth and fifth convolutional layers are increased to 512, 1024, and 512. A new visualization technique, deconvolution (maps features to pixels), is used to analyze first and second layer’s feature map.
Table 1.3 AlexNet layer details.
Sl. no. | Layer | Kernel size | Stride | Activation shape | Weights | Bias | # Parameters | Activation |
# Connections
|