Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic
+ 1)/∂(vec(wi)T) and vec(xi + 1)/∂(vec(xi)T), we can easily get Eq. (3.79). The terms ∂vec(xi + 1)/∂(vec(wi)T) and ∂vec(xi + 1)/∂(vec(xi)T) are much easier to compute than directly computing ∂z/∂(vec(wi)T) and ∂vec(xi + 1)/∂(vec(xi)T) because xi is directly related to xi + 1 through a function with parameters wi. The details of these partial derivatives will be discussed in the following sections.
3.6.2 Layers in CoNN
Suppose we are considering the l‐th layer, whose inputs form an order‐3 tensor xl with
The Rectified Linear Unit (ReLU) layer: An ReLU layer does not change the size of the input; that is, xl and y share the same size. The ReLU can be regarded as a truncation performed individually for every element in the input:
The convolution layer: Figure 3.23 illustrates a convolution of the input image (3 × 4 matrix) and the convolution kernel of size 2 × 2. For order‐3 tensors, the convolution operation is defined similarly. Figure 3.24 illustrates an RGB (black/light gray/gray) image with three channels and three kernels. Suppose the input in the l‐th layer is an order‐3 tensor of size Hl × Wl × Dl. A convolution kernel is also an order‐3 tensor of size H × W × Dl. When we overlap the kernel on top of the input tensor at the spatial location (0, 0, 0), we compute the products of the corresponding elements in all the Dl channels and sum the HWDl products to get the convolution result at this spatial location. Then, we move the kernel from top to bottom and from left to right to complete the convolution. In a convolution layer, multiple convolution kernels are usually used. Assuming D kernels are used and each kernel is of spatial span H × W, we denote all the kernels as f. f is an order‐4 tensor in
Stride is another important concept in convolution. At the bottom of Figure 3.23, we convolve the kernel with the input at every possible spatial location, which corresponds to the stride s = 1. However, if s > 1, every movement of the kernel skips s − 1 pixel locations (i.e., the convolution is performed once every s pixels both horizontally and vertically). In this section, we consider the simple case when the stride is 1 and no padding is used. Hence, we have y (or xl + 1) in
Figure 3.23 Illustration of the convolution operation. If we overlap the convolution kernel on top of the input image, we can compute the product between the numbers at the same location in the kernel and the input, and we get a single number by summing these products together. For example, if we overlap the kernel with the top‐left region in the input, the convolution result at that spatial location is 1 × 1 + 1 × 4 + 1 × 2 + 1 × 5 = 12. (for more details see the color figure in the bins).
Figure 3.24 RGB image/three channels and three kernels. (for more details see the color figure in the bins).
(3.80)
Convolution as matrix product: There is a way to expand xl and simplify the convolution as a matrix product. Let us consider a special case with Dl = D = 1, H = W = 2, and Hl = 3, Wl = 4. That is, we consider convolving a small single‐channel 3 × 4 matrix (or image) with one