Graph Spectral Image Processing. Gene Cheung
image coding standard on the Internet, was published in 1992. MPEG1, the first digital video compression standard by ISO, was standardized in 1993. The IEEE International Conference on Image Processing (ICIP), the flagship image processing conference held annually for the IEEE Signal Processing Society (SPS), was also started in 1993 and has been in existence for 27 years, making it older than many image processing researchers now studying in graduate schools! Given the topic’s maturity, it is a legitimate question to ask if yet another book on image processing is warranted. As co-editors of this book, we emphatically answer this question with a resounding “Yes”. We will first discuss the following recent technological trends, which also serve as motivations for the creation of this publication.
1) Sensing and Display Technologies: The advent of image sensing technologies, such as active depth sensors and display technologies like head-mounted displays (HMD), in the last decade alone, means that the nature of a digital image has drastically changed. Beyond higher spatial resolution and bit-depth per pixel, a modern imaging sensor can also acquire scene depth, hyper-spectral properties, etc. Further, often acquired image data is not represented as a traditional 2D array of pixel information, but in an alternative form, such as light fields and 3D point clouds. This means that the processing tools must flexibly adapt to richer and evolving imaging contents and formats.
2) Graph Signal Processing: In the last eight years, we have also witnessed the birth of a new signal processing topic – called graph signal processing (GSP) – that generalizes traditional mathematical tools like transforms and wavelets, to process signals residing on irregular data kernels described by graphs (Shuman et al. 2013). Central to GSP is the notion of graph frequencies: orthogonal components, computed from a graph variation operator like the graph Laplacian matrix, that generalize the notion of Fourier modes to the graph domain, spanning a graph signal space. Because of its inherent powerful generality, one can easily adopt or design GSP tools for different imaging applications, where a node in a graph represents a pixel, and the graph connectivity is chosen to reflect inter-pixel similarities or correlations. For an example of the GSP tool being used for image restoration, see Figure I.1 for an illustration of a graph spectral method called left eigenvectors of the random walk graph Laplacian (LeRAG) for JPEG image dequantization (Liu et al. 2017). GSP tools can also easily adapt to the aforementioned modern imaging modalities, such as light field images and 3D point clouds, that do not reside on regular 2D grids.
Figure I.1. Visual comparison of JPEG dequantization methods for a butterfly at QF = 5. The corresponding PSNR values are also given. For a color version of this figure, see www.iste.co.uk/cheung/graph.zip
3) Deep Neural Networks: Without a doubt, the singular seismic paradigm shift in data science in the last decade is deep learning. Using layers of convolutional filters, pointwise nonlinearities and pooling functions, deep neural network (DNN) architectures like convolutional neural networks (CNN) have demonstrated superior performance in a wide range of imaging tasks from denoising to classification, when a large volume of labeled data is available for training (Vemulapalli et al. 2016; Zhang et al. 2017). When labeled training data is scarce, or when the underlying data kernel is irregular (thus complicating the training of convolutional filters and the selection of pooling operators), how to best design and construct DNN for a targeted image application is a challenging problem. Moreover, a CNN purely trained from labeled data often remains a “black box”, i.e. the learned operators like filtering remain unexplainable.
Motivated by these technological trends, we have focused this book on the theory and applications of GSP tools for image processing, covering conventional images and videos, new modalities like light fields and 3D point clouds, and hybrid GSP/deep learning approaches. Different from other graph-based image processing books (Lezoray and Grady 2012), we concentrate on spectral processing techniques with frequency interpretations such as graph Fourier transforms (GFT) and graph wavelets, drawing inspiration from the long history of frequency analysis tools in traditional signal processing. Graph frequency analysis enables the definition of familiar signal processing notions, such as graph Fourier modes, bandlimitedness, and signal smoothness, using graph spectral methods that can be designed.
Specifically, the content of this book is structured into two parts:
1) The first part of the book discusses the fundamental GSP theories. Chapter 1, titled “Graph Spectral Filtering” by Y. Tanaka, reviews the basics of graph filtering such as graph transforms and wavelets. Chapter 2, titled “Graph Learning” by X. Dong, D. Thanou, M. Rabbat and P. Frossard, reviews recent techniques to learn an underlying graph structure given a set of observable data. Chapter 3, titled “Graph Neural Networks” by G. Fracastoro and D. Valsesia, overviews recent works generalizing DNN architectures to the graph data domain, where input signals reside on irregular graph structures.
2) The second part of the book reviews different imaging applications of GSP. Chapters 4 and 5, titled “Graph Specral Image and Video Compression” by H.E. Egilmez, Y.-H. Chao and A. Ortega and “Graph Spectral 3D Image Compression” by T. Maugey, M. Rizkallah, N. M. Bidgoli, A. Roumy and C. Guillemot, focus on the design and applications of GSP tools for the compression of traditional images/videos and 3D images, respectively. Chapter 6, titled “Graph Spectral Image Restoration” by J. Pang and J. Zeng, focuses on the general recovery of corrupted images, e.g. image denoising and deblurring. As a new imaging modality, Chapter 7, titled “Graph Spectral Point Cloud Processing” by W. Hu, S. Chen and D. Tian, focuses on the processing of 3D point clouds for applications, such as low-level restoration and high-level unsupervised feature learning. Chapters 8 and 9, titled “Graph Spectral Image Segmentation” by M. Ng and “Graph Spectral Image Classification” by M. Ye, V. Stankovic, L. Stankovic and G. Cheung, narrow the discussion specifically to segmentation and classification, respectively, two popular research topics in the computer vision community. Finally, Chapter 10, titled “Graph Neural Networks for Image Processing” by G. Fracastoro and D. Valsesia, reviews the growing efforts to employ recent GNN architectures for conventional imaging tasks such as denoising.
Before we jump into the various chapters, we begin with the basic definitions in GSP that will be used throughout the book. Specifically, we formally define a graph, graph spectrum, variation operators and graph signal smoothness priors in the following sections.
I.2. Graph definition
A graph G(V, E, W) contains a set V of N nodes and a set E of M edges. While directed graphs are also possible, in this book we more commonly assume an undirected graph, where each existing edge (i, j) ∈ E is undirected and contains an edge weight wi,j ∈ R, which is typically positive. A large positive edge weight wi,j would mean that samples at nodes i and j are expected to be similar/correlated.
There are many ways to compute appropriate edge weights. Especially common for images, edge weight wi,j can be computed using a Gaussian kernel, as done in the bilateral filter (Tomasi and Manduchi 1998):
[I.1]
where li ∈ R2 is the location of pixel i on the 2D image grid, xi ∈ R is the intensity of pixel i, and
and are two parameters. Hence, 0 ≤ wi,j ≤ 1. Larger