Artificial Intelligence and Quantum Computing for Advanced Wireless Networks. Savo G. Glisic

Artificial Intelligence and Quantum Computing for Advanced Wireless Networks

target="_blank" rel="nofollow" href="#fb3_img_img_362d6807-893d-5b96-bc04-3a5f8824aa42.png" alt="equation"/>

where Θ ∈ ℝ^{C × F} is a matrix of filter parameters, and Z ∈ ℝ^{N × F} is the convolved signal matrix. The first convolutional network for DGCN is the same as Eq. (5.15). The second network replaces the adjacency matrix with a positive pointwise mutual information (PPMI) matrix:

(5.16) equation

where X_P is the PPMI matrix, and D_P is the diagonal degree matrix of X_P.

When dealing with large‐scale networks, low‐dimensional vector embeddings of nodes in large graphs have proved extremely useful as feature inputs for a wide variety of prediction and graph analysis tasks [18–22]. The basic idea behind node‐embedding approaches is to use dimensionality reduction techniques to distill the high‐dimensional information about a node’s graph neighborhood into a dense vector embedding. These node embeddings can then be fed to downstream machine learning systems and aid in tasks such as node classification, clustering, and link prediction [19–21].

GraphSAGE (SAmple and aggreGatE) [23] is a general inductive framework. The framework generates embeddings by sampling and aggregating features from a node’s local neighborhood.

(5.17) equation

However, the original work utilizes in Eq. (5.17) only a fixed‐size set of neighbors by uniformly sampling. Three aggregator functions are used.

The mean aggregator could be viewed as an approximation of the convolutional operation from the transductive GCN framework [14], so that the inductive version of the GCN variant could be derived by

(5.18) equation

The mean aggregator is different from other aggregators because it does not perform the concatenation operation that concatenates images and images in Eq. (5.17). It could be viewed as a form of “skip connection” [24] and could achieve better performance.

The long short‐term memory (LSTM) aggregator, which has a larger expressive capability, is also used. However, LSTMs process inputs in a sequential manner so that they are not permutation invariant. Reference [23] adapts LSTMs to operate on an unordered set by permutating the node’s neighbors.

Pooling aggregator: In the pooling aggregator, each neighbor’s hidden state is fed through a fully connected layer, after which a max ‐pooling operation is applied to the set of the node’s neighbors:

(5.19) equation

Note that any symmetric function could be used in place of the max ‐pooling operation here. The operation of GraphSAGE is illustrated in Figure 5.1 [23].

Gate: Several works have attempted to use a gate mechanism such as gate recurrent units (GRUs) [25] or LSTM [26] in the propagation step to mitigate the restrictions in the former GNN models and improve the long‐term propagation of information across the graph structure.

Gated graph neural network (GGNN) [27] uses GRUs in the propagation step, unrolls the recurrence for a fixed number of steps T, and uses backpropagation through time in order to compute gradients. So, the propagation model can be presented as

(5.20) equation

Schematic illustration of operation of GraphSAGE: (a) sample neighborhood, (b) aggregate feature information from neighbors, (c) predict graph context and label using aggregated information.

Figure 5.1 Operation of GraphSAGE: (a) sample neighborhood, (b) aggregate feature information from neighbors, (c) predict graph context and label using aggregated information.

Source: Hamilton et al. [23].

The node v first aggregates message from its neighbors, where A_v is the submatrix of the graph adjacency matrix A and denotes the connection of node v with its neighbors. The GRU‐like update functions incorporate information from the other nodes and from the previous time step to update each node’s hidden state. Matrix a gathers the neighborhood information of node v, and z and r are the update and reset gates.

LSTM architecture extensions, referred to as the Child‐Sum Tree‐LSTM and the N‐ary Tree‐LSTM, are presented in [28]. As in standard LSTM units, each Tree‐LSTM unit (indexed by v) contains input and output gates i_v and o_v, a memory cell c_v , and a hidden state h_v . Instead of a single forget gate, the Tree‐LSTM unit contains one forget gate f_vk for each child k, allowing the unit to selectively incorporate information from each child. The Child‐Sum Tree‐LSTM transition equations are given as

(5.21) equation

images is the input vector at time t in the standard LSTM setting. If the branching factor of a tree is at most K and all children of a node are ordered, – that is, they can be indexed from 1 to K – then the N‐ary Tree‐LSTM can be used. For node v, images and images denote the hidden state and memory cell of its k‐th child at time t, respectively. The transition equations are now

(5.22) equation

The introduction of separate parameter matrices for each child k allows the model to learn more fine‐grained representations conditioning on the states of a unit’s children than the Child‐Sum Tree‐LSTM.

The two types of Tree‐LSTMs can be easily adapted to the graph. The graph‐structured LSTM in [29] is an example of the N‐ary Tree‐LSTM applied to the graph. However, it is a simplified version since each node in the graph has at most two incoming edges (from its parent and sibling predecessor). Reference [30] proposed another variant of the Graph LSTM based on the relation extraction task. The main difference between graphs and trees is that edges of graphs have labels. Work in [30] utilizes different weight matrices to represent different labels:

(5.23) equation

where m(v, k) denotes the edge label between node v and k.

The attention mechanism has been successfully used in many sequence‐based tasks such as machine translation [31–33] and machine reading [34]. Work in [35]

Скачать книгу