Introduction to Graph Neural Networks. Zhiyuan Liu

Introduction to Graph Neural Networks - Zhiyuan Liu


Скачать книгу

      In this section, we describe the vanilla GNNs proposed in Scarselli et al. [2009]. We also list the limitations of the vanilla GNN in representation capability and training efficiency. After this chapter we will talk about several variants of the vanilla GNN model.

      The concept of GNN was first proposed in Gori et al. [2005], Scarselli et al. [2004, 2009]. For simplicity, we will talk about the model proposed in Scarselli et al. [2009], which aims to extend existing neural networks for processing graph-structured data.

      A node is naturally defined by its features and related nodes in the graph. The target of GNN is to learn a state embedding hv ∈ ℝs, which encodes the information of the neighborhood, for each node. The state embedding hv is used to produce an output ov, such as the distribution of the predicted node label.

      In Scarselli et al. [2009], a typical graph is illustrated in Figure 4.1. The vanilla GNN model deals with the undirected homogeneous graph where each node in the graph has its input features xv and each edge may also have its features. The paper uses co[v] and ne[v] to denote the set of edges and neighbors of node v. For processing other more complicated graphs such as heterogeneous graphs, the corresponding variants of GNNs could be found in later chapters.

      Given the input features of nodes and edges, next we will talk about how the model obtains the node embedding hv and the output embedding ov.

      In order to update the node state according to the input neighborhood, there is a parametric function f, called local transition function, shared among all nodes. In order to produce the output of the node, there is a parametric function g, called local output function. Then, hv and ov are defined as follows:

Image Image

      where x denotes the input feature and h denotes the hidden state. co[v] is the set of edges connected to node v and ne[v] is set of neighbors of node v. So that xv, xco[v], hne[v], xne[v] are the features of v, the features of its edges, the states and the features of the nodes in the neighborhood of v, respectively. In the example of node l1 in Figure 4.1, xl1 is the input feature of l1. co[l1] contains edges l(1,4), l(1,6), l(1,2), and l(3.1). ne[l1] contains nodes l2, l3, l4, and l6.

      Figure 4.1: An example of the graph based on Scarselli et al. [2009].

      Let H, O, X, and XN be the matrices constructed by stacking all the states, all the outputs, all the features, and all the node features, respectively. Then we have a compact form as:

Image Image

      where F, the global transition function, and G is the global output function. They are stacked versions of the local transition function f and the local output function g for all nodes in a graph, respectively. The value of H is the fixed point of Eq. (4.3) and is uniquely defined with the assumption that F is a contraction map.

      With the suggestion of Banach’s fixed point theorem [Khamsi and Kirk, 2011], GNN uses the following classic iterative scheme to compute the state:

Image

      where Ht denotes the t th iteration of H. The dynamical system Eq. (4.5) converges exponentially fast to the solution of Eq. (4.3) for any initial value of H(0). Note that the computations described in f and g can be interpreted as the FNNs.

      After the introduction of the framework of GNN, the next question is how to learn the parameters of the local transition function f and local output function g. With the target information (tv for a specific node) for the supervision, the loss can be written as:

Image

      where p is the number of supervised nodes. The learning algorithm is based on a gradient-descent strategy and is composed of the following steps.

      • The states Image are iteratively updated by Eq. (4.1) until a time step T. Then we obtain an approximate fixed point solution of Eq. (4.3): H(T) ≈ H.

      • The gradient of weights W is computed from the loss.

      • The weights W are updated according to the gradient computed in the last step.

      After running the algorithm, we can get a model trained for a specific supervised/semi-supervised task as well as hidden states of nodes in the graph. The vanilla GNN model provides an effective way to model graphic data and it is the first step toward incorporating neural networks into graph domain.

      Though experimental results showed that GNN is a powerful architecture for modeling structural data, there are still several limitations of the vanilla GNN.

      • First, it is computationally inefficient to update the hidden states of nodes iteratively to get the fixed point. The model needs T steps of computation to approximate the fixed point. If relaxing the assumption of the fixed point, we can design a multi-layer GNN to get a stable representation of the node and its neighborhood.

      • Second, vanilla GNN uses the same parameters in the iteration while most popular neural networks use different parameters in different layers, which serves as a hierarchical feature extraction method. Moreover, the update of node hidden states is a sequential process which can benefit from the RNN kernels like GRU and LSTM.

      Конец ознакомительного фрагмента.

      Текст предоставлен ООО «ЛитРес».

      Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.

      Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными


Скачать книгу