Introduction to Graph Neural Networks. Zhiyuan Liu
rule, we get another useful formula:
which is the famous Bayes formula. Note that it also holds for more than two variables:
Using product rule, we can deduce the chain rule:
where X1, X2, …, Xn are n random variables.
The average value of some function f(x) (where x is the value of a certain random variable) under a probability distribution P(x) is called the expectation of f(x). For a discrete distribution, it can be written as
Usually, when f(x) = x, 𝔼[x] stands for the expectation of x.
To measure the dispersion level of f(x) around its mean value E[f(x)], we introduce the variance of f(x):
The standard deviation is the square root of variance. In some level, covariance expresses the degree to which two variables vary together:
Greater covariance indicates higher relevance between f(x) and g(y).
2.2.2 PROBABILITY DISTRIBUTIONS
Probability distributions describe the probability of a random variable or several random variables on every state. Several distributions that are useful in the area of machine learning are listed as follows.
• Gaussian distribution: it is also known as normal distribution and can be expressed as:
where μ is the mean of variable x and σ2 is the variance.
• Bernoulli distribution: random variable X can either be 0 or 1, with a probability P(X = 1) = p. Then the distribution function is
It is quite obvious that E(X = p and Var(X) = p(1 − p).
• Binomial distribution: repeat the Bernoulli experiment for N times and the times that X equals to 1 is denote by Y, then
is the Binomial distribution satisfying that E(Y = np) and Var(Y) = np(1 − p).
• Laplace distribution: the Laplace distribution is described as
2.3 GRAPH THEORY
Graphs are the basic subjects in the study of GNNs. Therefore, to get a comprehensive understanding of GNN, the fundamental graph theory is required.
2.3.1 BASIC CONCEPTS
A graph is often denoted by G = (V, E), where V is the set of vertices and E is the set of edges. An edge e = u, v has two endpoints u and v, which are said to be joined by e. In this case, u is called a neighbor of v, or in other words, these two vertices are adjacent. Note that an edge can either be directed or undirected. A graph is called a directed graph if all edges are directed or undirected graph if all edges are undirected. The degree of vertice v, denoted by d(v), is the number of edges connected with v.
2.3.2 ALGEBRA REPRESENTATIONS OF GRAPHS
There are a few useful algebra representations for graphs, which are listed as follows.
• Adjacency matrix: for a simple graph G = (V, E) with n-vertices, it can be described by an adjacency matrix A ∈ ℝn × n, where
It is obvious that such matrix is a symmetric matrix when G is an undirected graph.
• Degree matrix: for a graph G = (V, E) with n-vertices, its degree matrix D ∈ ℝn × n is a diagonal matrix, where
• Laplacian matrix: for a simple graph G = (V, E) with n-vertices, if we consider all edges in G to be undirected, then its Laplacian matrix L = ℝn × n can be defined as
Thus, we have the elements:
Note that the graph is considered to be an undirected graph for the adjacency matrix.
• Symmetric normalized Laplacian: the symmetric normalized Laplacian is defined as:
The elements are given by:
• Random walk normalized Laplacian: it is defined as:
The elements can be computed by:
• Incidence matrix: another common used matrix in representing a graph is incidence matrix. For a directed graph G = (V, E) with n-vertices and m-edges, the corresponding incidence matrix is M ∈ ℝn × m, where
For a undirected graph, the corresponding incidence matrix satisfies that
CHAPTER 3
Basics of Neural Networks
Neural networks are one of the most important models in machine learning. The structure of artificial neural networks, which consists of numerous neurons with connections to each other, bears great resemblance to that of biological neural networks.