1. Introduction
Definition of Spatial Data
Spatial data refers to information that has a geographic or location component. This means each observation is tied to a specific location on the Earth's surface, characterized by coordinates (latitude/longitude) or relative positions (neighborhoods, regions, etc.).
Importance of Spatial Relationships
In many real-world phenomena, nearby locations tend to influence each other more than distant ones. This spatial dependence violates the independence assumption of classical statistical models, potentially leading to biased estimates and incorrect inferences.
Motivation: Why Classical Models Fall Short
Traditional regression models assume observations are independent. However, with spatial data:
- Nearby locations often share similar characteristics (positive autocorrelation)
- Competing locations may show opposite patterns (negative autocorrelation)
- Ignoring these relationships leads to model misspecification
Key Insight: Spatial dependence is not just a nuisance—it often contains valuable information about the underlying processes generating the data.
2. What is a Spatial Matrix?
Definition and Types
A spatial weights matrix (W) is a square matrix that formally represents the spatial relationships between observations in a dataset. The elements wij quantify the potential interaction between location i and location j.
Contiguity-Based
Binary indicators (0/1) based on shared boundaries (queen's case) or edges (rook's case). Common for administrative units.
Distance-Based
Weights decay with distance (e.g., inverse distance, Gaussian kernel). Used for point data or continuous space.
Construction of Spatial Weights Matrix W
For n locations, W is an n×n matrix where:
Diagonal elements wii = 0 (no self-influence). The weights are typically:
- Binary: 1 if neighbors, 0 otherwise
- Distance-based: e.g., wij = 1/dijα
- K-nearest neighbors: fixed number of neighbors per location
Row-Standardization and Symmetry
Row-standardization converts weights to relative influence by dividing each element by its row sum:
This ensures each row sums to 1, making the spatial lag operation a weighted average of neighbors. Note that row-standardization typically makes W asymmetric even if the original matrix was symmetric.
Example with a Small Region Map
Consider 4 regions with the following contiguity:
A: neighbors with B, C; B: neighbors with A, D; C: neighbors with A, D; D: neighbors with B, C
Binary contiguity matrix:
Row-standardized version:
3. Spatial Autocorrelation
3.1 Concept
What is spatial autocorrelation?
Spatial autocorrelation measures the degree to which similar values cluster together in space. It quantifies the correlation of a variable with itself across space.
Positive
Similar values cluster together (high-high or low-low). Indicates spatial dependence.
Negative
Dissimilar values cluster (high-low). May indicate competition or repulsion.
No Autocorrelation
Random spatial pattern. Values are independent of location.
3.2 Moran's I
The most common measure of spatial autocorrelation, ranging approximately from -1 to +1:
Where:
- n = number of spatial units
- wij = spatial weight between units i and j
- xi = value at location i
- x̄ = mean of x
Interpretation
Values significantly greater than E(I) = -1/(n-1) indicate positive autocorrelation (clustering), while values significantly less indicate negative autocorrelation (dispersion). A permutation test is typically used to assess significance.
Use Cases
- Identifying clusters of disease incidence
- Detecting spatial patterns in economic indicators
- Testing residuals from regression models for remaining spatial structure
3.3 Geary's C
An alternative measure more sensitive to local differences:
Comparison with Moran's I
Feature | Moran's I | Geary's C |
---|---|---|
Range | ≈ -1 to +1 | 0 to 2 |
Expected Value | -1/(n-1) | 1 |
Positive Autocorrelation | I > E(I) | C < 1 |
Sensitivity | Global patterns | Local differences |
Interpretation and Sensitivity
Geary's C is more sensitive to differences between neighboring locations, making it better at detecting local instability. Values less than 1 indicate positive autocorrelation, while values greater than 1 indicate negative autocorrelation.
4. Spatial Regression Models
Why OLS Fails with Spatial Dependence
Ordinary Least Squares (OLS) regression assumes independent errors. When spatial dependence exists:
- Parameter estimates may be biased or inefficient
- Standard errors are underestimated, leading to inflated significance
- Model predictions may be poor due to ignored spatial patterns
Introduction to Spatial Regression
Spatial regression models explicitly incorporate spatial dependence through the weights matrix W. The two main approaches are:
4.1 Spatial Lag Model (SLM)
Also called spatial autoregressive model (SAR), includes a spatially lagged dependent variable:
Where:
- Wy = spatial lag (weighted average of neighbors' y values)
- ρ (rho) = spatial autoregressive coefficient
- X = matrix of explanatory variables
- β = vector of coefficients
- ε = error term
Interpretation of ρ
ρ measures the strength of spatial dependence in the dependent variable. A significant ρ suggests spillover effects or diffusion processes.
Use Cases
- Housing prices (neighboring values affect local prices)
- Disease spread (infection in nearby areas increases local risk)
- Technology adoption (neighborhood effects)
4.2 Spatial Error Model (SEM)
Accounts for spatial dependence in the error term:
Where:
- λ (lambda) = spatial autocorrelation coefficient for errors
- u = spatially autocorrelated error term
- ε = i.i.d. error term
Interpretation of λ
λ indicates the strength of spatial dependence in omitted variables or measurement errors. Significant λ suggests model misspecification regarding spatial effects.
Use Cases
- When unobserved local factors affect the outcome
- When spatial dependence is a nuisance rather than of substantive interest
- When measurement errors are spatially correlated
4.3 SLM vs. SEM: Comparison Table
Aspect | Spatial Lag Model (SLM) | Spatial Error Model (SEM) |
---|---|---|
Dependence Structure | In dependent variable | In error term |
Interpretation | Direct spatial interaction | Omitted spatial variables |
Key Coefficient | ρ (spatial lag) | λ (error autocorrelation) |
Impact of X | Direct + indirect (spillover) effects | Only direct effects |
When to Use | Theoretical expectation of spatial interactions | Spatial dependence as nuisance |
5. Conclusion
Recap of Spatial Matrix Utility
Spatial weights matrices provide the formal structure to represent spatial relationships in statistical models. They enable:
- Measurement of spatial autocorrelation (Moran's I, Geary's C)
- Specification of spatial regression models (SLM, SEM)
- Quantification of spatial spillover effects
Importance of Testing for Spatial Dependence
Before applying spatial models, diagnostic tests should be performed:
- Visual exploration (maps, variograms)
- Moran's I test on OLS residuals
- Lagrange Multiplier tests for SLM vs. SEM
Suggestions for Further Reading
Books
- • Anselin, L. (1988) "Spatial Econometrics"
- • LeSage, J. & Pace, R.K. (2009) "Introduction to Spatial Econometrics"
Software
- • GeoDa (GUI for exploratory analysis)
- • R: spdep, spatialreg, sf packages
- • Python: PySAL, GeoPandas
Final Thought: Spatial matrices transform geographic intuition into quantitative analysis. By properly specifying W and choosing appropriate models, we can uncover meaningful spatial patterns that would otherwise remain hidden in traditional analyses.