Introduction to Graph Neural Networks. Zhiyuan Liu

Introduction to Graph Neural Networks

Скачать книгу

rule, we get another useful formula:

which is the famous Bayes formula. Note that it also holds for more than two variables:

Using product rule, we can deduce the chain rule:

where X₁, X₂, …, Xn are n random variables.

The average value of some function f(x) (where x is the value of a certain random variable) under a probability distribution P(x) is called the expectation of f(x). For a discrete distribution, it can be written as

Usually, when f(x) = x, 𝔼[x] stands for the expectation of x.

To measure the dispersion level of f(x) around its mean value E[f(x)], we introduce the variance of f(x):

The standard deviation is the square root of variance. In some level, covariance expresses the degree to which two variables vary together:

Greater covariance indicates higher relevance between f(x) and g(y).

2.2.2 PROBABILITY DISTRIBUTIONS

Probability distributions describe the probability of a random variable or several random variables on every state. Several distributions that are useful in the area of machine learning are listed as follows.

• Gaussian distribution: it is also known as normal distribution and can be expressed as:

where μ is the mean of variable x and σ² is the variance.

• Bernoulli distribution: random variable X can either be 0 or 1, with a probability P(X = 1) = p. Then the distribution function is

It is quite obvious that E(X = p and Var(X) = p(1 − p).

• Binomial distribution: repeat the Bernoulli experiment for N times and the times that X equals to 1 is denote by Y, then

is the Binomial distribution satisfying that E(Y = np) and Var(Y) = np(1 − p).

• Laplace distribution: the Laplace distribution is described as

2.3 GRAPH THEORY

Graphs are the basic subjects in the study of GNNs. Therefore, to get a comprehensive understanding of GNN, the fundamental graph theory is required.

2.3.1 BASIC CONCEPTS

A graph is often denoted by G = (V, E), where V is the set of vertices and E is the set of edges. An edge e = u, v has two endpoints u and v, which are said to be joined by e. In this case, u is called a neighbor of v, or in other words, these two vertices are adjacent. Note that an edge can either be directed or undirected. A graph is called a directed graph if all edges are directed or undirected graph if all edges are undirected. The degree of vertice v, denoted by d(v), is the number of edges connected with v.

2.3.2 ALGEBRA REPRESENTATIONS OF GRAPHS

There are a few useful algebra representations for graphs, which are listed as follows.

• Adjacency matrix: for a simple graph G = (V, E) with n-vertices, it can be described by an adjacency matrix A ∈ ℝ^{n × n}, where

It is obvious that such matrix is a symmetric matrix when G is an undirected graph.

• Degree matrix: for a graph G = (V, E) with n-vertices, its degree matrix D ∈ ℝ^{n × n} is a diagonal matrix, where

• Laplacian matrix: for a simple graph G = (V, E) with n-vertices, if we consider all edges in G to be undirected, then its Laplacian matrix L = ℝ^{n × n} can be defined as

Thus, we have the elements:

Note that the graph is considered to be an undirected graph for the adjacency matrix.

• Symmetric normalized Laplacian: the symmetric normalized Laplacian is defined as:

The elements are given by:

• Random walk normalized Laplacian: it is defined as:

The elements can be computed by:

• Incidence matrix: another common used matrix in representing a graph is incidence matrix. For a directed graph G = (V, E) with n-vertices and m-edges, the corresponding incidence matrix is M ∈ ℝ^{n × m}, where

For a undirected graph, the corresponding incidence matrix satisfies that

CHAPTER 3

Basics of Neural Networks

Neural networks are one of the most important models in machine learning. The structure of artificial neural networks, which consists of numerous neurons with connections to each other, bears great resemblance to that of biological neural networks.

Скачать книгу