A Guide to Convolutional Neural Networks for Computer Vision. Salman Khan
libraries that are commonly used in computer vision. Further, this text describes and discusses case studies that are related to the application of CNN in computer vision, including image classification, object detection, semantic segmentation, scene understanding, and image generation.
This book is ideal for undergraduate and graduate students, as no prior background knowledge in the field is required to follow the material, as well as new researchers, developers, engineers, and practitioners who are interested in gaining a quick understanding of CNN models.
KEYWORDS
deep learning, computer vision, convolution neural networks, perception, back-propagation, feed-forward networks, image classification, action recognition, object detection, object tracking, video processing, semantic segmentation, scene understanding, 3D processing
SK: | To my parents and my wife Nusrat |
HR: | To my father Shirzad, my mother Rahimeh, and my wife Shahla |
AS: | To my parents, my wife Maleeha, and our children Abiya, Maryam, and Muhammad. Thanks for always being there for me. |
MB: | To my parents: Mostefa and Rabia Bennamoun and to my nuclear family: Leila, Miriam, Basheer, and Rayaane Bennamoun |
Contents
1.1.2 Image Processing vs. Computer Vision
2.1 Importance of Features and Classifiers
2.2 Traditional Feature Descriptors
2.2.1 Histogram of Oriented Gradients (HOG)
2.2.2 Scale-invariant Feature Transform (SIFT)
2.2.3 Speeded-up Robust Features (SURF)
2.2.4 Limitations of Traditional Hand-engineered Features
2.3 Machine Learning Classifiers
2.3.1 Support Vector Machine (SVM)
3.3.2 Parameter Learning
3.4 Link with Biological Vision
3.4.1 Biological Neuron
3.4.2 Computational Model of a Neuron
3.4.3 Artificial vs. Biological Neuron
4 Convolutional Neural Network
4.2 Network Layers
4.2.1 Pre-processing
4.2.2 Convolutional Layers
4.2.3 Pooling Layers
4.2.4 Nonlinearity
4.2.5 Fully Connected Layers
4.2.6 Transposed Convolution Layer
4.2.7 Region of Interest Pooling
4.2.8 Spatial Pyramid Pooling Layer
4.2.9 Vector of Locally Aggregated Descriptors Layer
4.2.10 Spatial Transformer Layer
4.3 CNN Loss Functions
4.3.1 Cross-entropy Loss
4.3.2 SVM Hinge Loss
4.3.3 Squared Hinge Loss
4.3.4 Euclidean Loss
4.3.5 The ℓ1 Error
4.3.6 Contrastive Loss
4.3.7 Expectation Loss
4.3.8 Structural Similarity Measure
5.1 Weight Initialization
5.1.1 Gaussian Random Initialization
5.1.2 Uniform Random Initialization
5.1.3 Orthogonal Random Initialization