Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic
and yout , but this is confusing because y generally stands for output in the machine learning literature. In the interests of clarity, we break with this convention and use i, f , and o to refer to input, forget, and output gates, respectively.
Computation in the LSTM model is presented by the following equations, performed at each time step. These equations give the full algorithm for a modern LSTM with forget gates:
(3.72)
The value of the hidden layer of the LSTM at time t is the vector h(t), while h(t − 1) is the values output by each memory cell in the hidden layer at the previous time. Note that these equations include the forget gate. The calculations for the simpler LSTM without forget gates are obtained by setting f (t) = 1 for all t. We use the tanh function ϕ for the input node g. However, in the original LSTM paper, the activation function for g is the sigmoid σ.
BRNNs: Besides the LSTM, one of the most used RNN architectures is the BRNN (Figure 3.20).
In this architecture, there are two layers of hidden nodes. Both hidden layers are connected to input and output. Only the first layer has recurrent connections from the past time steps, while in the second layer, the direction of recurrent connections is flipped, passing activation backward along the sequence. Given an input sequence and a target sequence, the BRNN can be trained by ordinary backpropagation after unfolding across time. The following three equations describe a BRNN:
(3.73)
where h(t) and z(t) are the values of the hidden layers in the forward and backward directions, respectively.
NTMs: The NTM extends RNNs with an addressable external memory [12]. This enables RNNs to perform complex algorithmic tasks such as sorting. This is inspired by the theories in cognitive science that suggest humans possess a “central executive” that interacts with a memory buffer [13]. By analogy with a Turing machine, in which a program directs read heads and write heads to interact with external memory in the form of a tape, the model is called an NTM.
The two primary components of an NTM are a controller and a memory matrix. The controller, which may be a recurrent or feedforward neural network, takes input and returns output to the outside world, as well as passing instructions to and reading from the memory. The memory is represented by a large matrix of N memory locations, each of which is a vector of dimension M. Additionally, a number of read and write heads facilitate the interaction between the controller and the memory matrix. Despite these additional capabilities, the NTM is differentiable end‐to‐end and can be trained by variants of SGD using Backpropagation through Time (BPTT).
Figure 3.20 A bidirectional recurrent neural network (BRNN). (for more details see the color figure in the bins).
In [12], five algorithmic tasks are used to test the performance of the NTM model. By algorithmic we mean that for each task, the target output for a given input can be calculated by following a simple program, as might be easily implemented in any universal programming language. One example is the copy task, where the input is a sequence of fixed length binary vectors followed by a delimiter symbol. The target output is a copy of the input sequence. In another task, priority sort, an input consists of a sequence of binary vectors together with a distinct scalar priority value for each vector. The target output is the sequence of vectors sorted by priority. The experiments test whether an NTM can be trained via supervised learning to implement these common algorithms correctly and efficiently. Interestingly, solutions found in this way generalize reasonably well to inputs longer than those presented in the training set. By contrast, the LSTM without external memory does not generalize well to longer inputs. The authors compare three different architectures, namely an LSTM RNN, and NTM, with a feedforward controller, and an NTM with an LSTM controller. On each task, both NTM architectures significantly outperform the LSTM RNN both in training set performance and in generalization to test data.
3.5 Cellular Neural Networks (CeNN)
A spatially invariant CeNN architecture [14, 15] is an M × N array of identical cells (Figure 3.21 [top]). Each cell, Cij, (i, j) ∈ {1, M} × {1, N}, has identical connections with adjacent cells in a predefined neighborhood, Nr(i, j), of radius r. The size of the neighborhood is m = (2r + 1)2, where r is a positive integer.
A conventional analog CeNN cell consists of one resistor, one capacitor, 2m linear voltage‐controlled current sources (VCCSs), one fixed current source, and one specific type of nonlinear voltage‐controlled voltage source (Figure 3.21 [bottom]). The input, state, and output of a given cell Cij correspond to the nodal voltages uij, xij , and yij respectively. VCCSs controlled by the input and output