Spatial Regression Models for the Social Sciences. Jun Zhu
environmental, behavioral, socioeconomic, genetic, and infectious risk factors (Elliott & Wartenberg, 2004; Waller & Gotway, 2004).
Spatial point patterns consist of event locations in a spatial domain of interest.
Areal data refer to spatial data observed over regular grid cells (or pixels) as seen in remotely sensed data or spatial data aggregated to irregular areal regions such as counties and census tracts; such data are often referred to as lattice data and are sometimes referred to as regional data (see, e.g., Schabenberger & Gotway, 2005; Waller & Gotway, 2004). Areal data analysis aims to quantify the spatial pattern of an attribute on a spatial lattice or region (regular or irregular) through a specific neighborhood structure and examines the relations between the attribute and the potential explanatory variables while accounting for spatial effects. Spatial regression modeling is a common approach used in areal data analysis. For the purposes of this book, the term areal data analysis is used.
Areal data are spatial data observed over regular grid cells or aggregated to irregular areal regions.
Geostatistical data refer to spatial data sampled at point locations that are continuous in space. The objectives of geostatistics are similar to those of areal data analysis, but geostatistics aims to also predict attribute values at locations that are not sampled (see, e.g., Cressie, 1993; Goodchild, 1992; Stein, 1999). Geostatistics is common in geology, soil science, and forest resource management research. For example, petroleum geologists estimate hydrocarbon fuel distribution based on a small number of hydrocarbon samples from known locations using geostatistical methods. Two key differences distinguish geostatistics from areal data analysis:
Geostatistical data are spatial data from point locations continuous in space.
geostatistical data are geographically referenced to specific point locations while areal data are geographically referenced to areal regions, and
geostatistics generally measures spatial dependence by distance-based functions while areal data analysis often uses neighborhood structures.
Spatial interaction data refer to the “flows” between origins and destinations (see, e.g., Bailey & Gatrell, 1995). Spatial interaction data analysis attempts to quantify the arrangement of flows and build models for origin and destination interactions in terms of the geographical accessibility of destinations versus origins as well as the “push factors” of origins and “pull factors” of destinations. Spatial interaction data analysis is often used in transportation planning, migration studies, and other research that has flow information.
Spatial interaction data are the “flows” between origins and destinations.
In this book, we restrict our attention to areal data analysis, as it is currently the spatial data analysis approach most used in the social sciences. The other methods discussed (point data analysis, geostatistics, and spatial interaction data analysis) are useful for social science studies as well, however. For example, geographers conduct demographic studies using geostatistics (e.g., Cowen & Jensen, 1998; Jensen et al., 1994; Langford, Maguire, & Unwin, 1991; Langford & Unwin, 1994; Mennis, 2003), and epidemiological and social network researchers use point data analysis and spatial interaction data analysis, respectively.
Areal data analysis is the focus of this book because it is the spatial approach currently most used in the social sciences.
1.3 Introduction to the Data Example
As we addressed in the Preface, the goal of this book is to help social scientists learn practical and useful statistical methods for spatial regression with relative ease. Our approach is to use concrete social data examples and in-depth analyses to illustrate the statistical concepts, models, and methods while keeping the use of statistical formulas and proofs at a minimum. No background of mathematical statistics is assumed of readers.
For ease of presentation, for most methods discussed in this book, we focus on one case study with one primary data example for addressing specific research questions rather than different studies with different data sets, variables, and/or research questions. Readers are encouraged to think about how their own data could be analyzed to address their research questions while reading our data analyses.
The data example used throughout this book is a possible template for readers to consider how their own data can be analyzed to address their research questions.
In the primary data example for this book, the state of Wisconsin in the United States is the study area of interest. We illustrate the use of spatial regression models and methods by studying population change as the response variable in relation to a variety of factors spatially and temporally at the minor civil division (MCD) level. In this book, population change is specifically referred to as a change in population size; that is, the outcome could be either population growth or population decline. Population change is a familiar subject to most social scientists and has been considered an essential component in many social science disciplines, making this data example quite accessible to many social scientists.
In the following subsections, we first review why and how population change is seen as a spatial phenomenon in several social science disciplines. There are two purposes for doing this. One, population change is the primary data example used throughout this book for demonstrating the use of spatial regression models, and thus it deserves a thorough understanding on its spatial dimension to build the theoretical foundation for spatially analyzing population change. Two, many social science phenomena are studied in multiple social science disciplines with different approaches; researchers often review and adopt approaches from other disciplines. Our review of population change as a spatial phenomenon (or spatial process) can serve as a template for studying the spatial effects of other social phenomena. We then briefly introduce the state of Wisconsin to readers who are not familiar with it and follow with a description of the MCD, which is the spatial unit of analysis. Finally, we present descriptive statistics of population change in Wisconsin.
1.3.1 Population Change as a Spatial Process
Population change is considered spatially both explicitly and implicitly in existing social science literature. Population change is theorized and modeled spatially and explicitly in human geography (including population geography, geographic information sciences, transportation geography, and health geography); regional science; and environmental planning. These fields have well-established theories and methodologies for spatial data analysis of population change.
Researchers in population geography are interested in the spatial variation of population distribution, growth, composition, and migration; they seek to explain the population patterns that can be attributed to spatial regularities and processes (Bailey, 2005; Trewartha, 1953). Tobler’s (1970) first law of geography states that everything is related to everything else, but nearer things more so. Population geography’s spatial diffusion theory argues that population growth forces spread (spillover) into surrounding areas (Hudson, 1972), which implies that population growth is spatially dependent.
Researchers in regional economics explain and model changes in land use patterns, which are nearly always associated with population change (Boarnet, 1998; Cervero, 2003). For example, the growth pole theory explains, through the concepts of spread and backwash, the mutual geographic dependence of economic growth and development, which in turn leads to population change (Perroux, 1955). The central place theory puts population in a hierarchy of urban places, where the movement of populations, firms, and goods is determined by the associated costs and city sizes (Christaller, 1966). In the “new” economic geography theory, Krugman (1991) adds space to the endogenous growth and studies the process of city network formation over time.
In environmental planning, researchers study how land use changes are encouraged or discouraged by the physical environment and the socioeconomic conditions and how this, in turn, leads to population change. The approach is generally empirical, usually using GIS overlay methods to answer what-if questions. There