A Guide to Convolutional Neural Networks for Computer Vision. Salman Khan
5.1.4 Unsupervised Pre-training
5.1.5 Xavier Initialization
5.1.6 ReLU Aware Scaled Initialization
5.1.7 Layer-sequential Unit Variance
5.1.8 Supervised Pre-training
5.2 Regularization of CNN
5.2.1 Data Augmentation
5.2.2 Dropout
5.2.3 Drop-connect
5.2.4 Batch Normalization
5.2.5 Ensemble Model Averaging
5.2.6 The ℓ2 Regularization
5.2.7 The ℓ1 Regularization
5.2.8 Elastic Net Regularization
5.2.9 Max-norm Constraints
5.2.10 Early Stopping
5.3 Gradient-based CNN Learning
5.3.1 Batch Gradient Descent
5.3.2 Stochastic Gradient Descent
5.3.3 Mini-batch Gradient Descent
5.4 Neural Network Optimizers
5.4.1 Momentum
5.4.2 Nesterov Momentum
5.4.3 Adaptive Gradient
5.4.4 Adaptive Delta
5.4.5 RMSprop
5.4.6 Adaptive Moment Estimation
5.5 Gradient Computation in CNNs
5.5.1 Analytical Differentiation
5.5.2 Numerical Differentiation
5.5.3 Symbolic Differentiation
5.5.4 Automatic Differentiation
5.6 Understanding CNN through Visualization
5.6.1 Visualizing Learned Weights
5.6.2 Visualizing Activations
5.6.3 Visualizations based on Gradients
6 Examples of CNN Architectures
6.1 LeNet
6.2 AlexNet
6.3 Network in Network
6.4 VGGnet
6.5 GoogleNet
6.6 ResNet
6.7 ResNeXt
6.8 FractalNet
6.9 DenseNet
7 Applications of CNNs in Computer Vision
7.1 Image Classification
7.1.1 PointNet
7.2 Object Detection and Localization
7.2.1 Region-based CNN
7.2.2 Fast R-CNN
7.2.3 Regional Proposal Network (RPN)
7.3 Semantic Segmentation
7.3.1 Fully Convolutional Network (FCN)
7.3.2 Deep Deconvolution Network (DDN)
7.3.3 DeepLab
7.4 Scene Understanding
7.4.1 DeepContext
7.4.2 Learning Rich Features from RGB-D Images
7.4.3 PointNet for Scene Understanding
7.5 Image Generation
7.5.1 Generative Adversarial Networks (GANs)
7.5.2 Deep Convolutional Generative Adversarial Networks (DCGANs)
7.5.3 Super Resolution Generative Adversarial Network (SRGAN)
7.6 Video-based Action Recognition
7.6.1 Action Recognition From Still Video Frames
7.6.2 Two-stream CNNs
7.6.3 Long-term Recurrent Convolutional Network (LRCN)
8 Deep Learning Tools and Libraries
8.1 Caffe
8.2 TensorFlow
8.3 MatConvNet
8.4 Torch7
8.5 Theano
8.6 Keras
8.7 Lasagne
8.8 Marvin
8.9 Chainer
8.10 PyTorch
Preface
The primary goal of this book is to provide a comprehensive treatment to the subject of convolutional neural networks (CNNs) from the perspective of computer vision. In this regard, this book covers basic, intermediate and well as advanced topics relating to both the theoretical and practical aspects.
This book is organized into nine chapters. The first chapter introduces the computer vision and machine learning disciplines and presents their highly relevant application domains. This sets up the platform for the main subject of this book, “Deep Learning”, which is first defined towards the later part of first chapter. The second chapter serves as a background material, which presents popular hand-crafted features and classifiers which have remained popular in computer vision during the last two decades. These include feature descriptors such as Scale-Invariant Feature Transform (SIFT), Histogram of Oriented Gradients (HOG), Speeded-Up Robust Features (SURF), and classifiers such as Support Vector Machines (SVM), and Random Decision Forests (RDF).
Chapter 3 describes neural networks and covers preliminary concepts related to their architecture, basic building blocks, and learning algorithms. Chapter 4 builds on this and serves as a thorough introduction to CNN architecture. It covers its layers, including the basic ones (e.g., sub-sampling, convolution) as well as more advanced ones (e.g., pyramid pooling, spatial transform). Chapter 5 comprehensively presents techniques to learn and regularize CNN parameters. It also provides tools to visualize and understand the learned parameters.