Introduction to Spatial Models
Spatial models are statistical models that incorporate spatial dependence between observations. They are essential when dealing with data that has a geographic component, as observations close to each other tend to be more similar than those farther apart.
The general form of a spatial model can be represented as:
Y(s) = μ(s) + ε(s)
Where:
- Y(s) is the observed value at location s
- μ(s) is the mean or trend component (may depend on covariates)
- ε(s) is the spatially correlated error term
In R, spatial modeling is primarily handled through packages like sp, sf, gstat, and spdep.
Types of Spatial Models
Spatial Autoregressive Models
These models account for spatial dependence in the dependent variable. The most common are:
- SAR (Spatial Autoregressive)
- SEM (Spatial Error Model)
- SDM (Spatial Durbin Model)
SAR model formulation:
Y = ρWY + Xβ + ε
Where:
- W is the spatial weights matrix
- ρ is the spatial autoregressive coefficient
# Implementation in R using spdep
library(spdep)
sar_model <- lagsarlm(y ~ x1 + x2, data=df, listw=weights)
Geostatistical Models
Used for continuous spatial data, modeling spatial covariance structure through variograms.
The semivariogram γ(h) is defined as:
γ(h) = ½ Var[Z(s+h) - Z(s)]
Common models:
- Exponential: γ(h) = c₀ + c[1 - exp(-h/a)]
- Spherical
- Gaussian
# Implementation using gstat
library(gstat)
vgram <- variogram(z ~ 1, locations=~x+y, data=df)
fit <- fit.variogram(vgram, model=vgm("Exp"))
krige_pred <- krige(z ~ 1, locations=~x+y, data=df, newdata=grid, model=fit)
Point Process Models
Model the distribution of points in space, often using Poisson processes.
Inhomogeneous Poisson process with intensity λ(s):
P[N(B) = n] = [∫Bλ(s)ds]nexp(-∫Bλ(s)ds)/n!
Where:
- N(B) is the number of points in region B
- λ(s) is the intensity function
# Implementation using spatstat
library(spatstat)
ppp_obj <- ppp(x, y, window=owin)
fit <- ppm(ppp_obj, ~x+y)
predict(fit, type="trend")
Mathematical Foundations
Spatial Weights Matrix
The spatial weights matrix W is fundamental to spatial modeling. It quantifies the spatial relationships between observations.
Common specifications:
- Contiguity-based: wij = 1 if regions i and j share a boundary, 0 otherwise
- Distance-based: wij = f(dij) where dij is distance between i and j
- k-nearest neighbors
# Creating spatial weights in R
library(spdep)
coords <- cbind(df$x, df$y)
nb <- knn2nb(knearneigh(coords, k=5))
weights <- nb2listw(nb, style="W") # Row-standardized
Spatial Autocorrelation
Spatial autocorrelation measures the degree of dependency among observations in space.
Moran's I statistic:
I = (n/S₀) × (ΣiΣjwij(yi-ȳ)(yj-ȳ)) / Σi(yi-ȳ)²
Where:
- n is the number of spatial units
- S₀ = ΣiΣjwij (sum of all weights)
- wij are the spatial weights
# Testing for spatial autocorrelation
moran.test(df$y, listw=weights)
geary.test(df$y, listw=weights)
Practical Implementation in R
1. Data Preparation
# Load necessary packages
library(sf) # For spatial data
library(spdep) # Spatial dependence
library(gstat) # Geostatistics
library(tmap) # Thematic maps
# Read spatial data (e.g., shapefile)
spatial_data <- st_read("path/to/shapefile.shp")
# Convert to spatial object if needed
sp_obj <- as(spatial_data, "Spatial")
2. Model Selection Workflow
- Test for spatial autocorrelation (Moran's I, Geary's C)
- Examine spatial patterns (variogram for geostatistical data)
- Choose appropriate model based on data characteristics
- Estimate model parameters
- Validate model (cross-validation, residuals analysis)
- Make predictions if needed
Complete Example: Spatial Regression
# Example: Spatial lag model
# Create neighbor list
coords <- coordinates(sp_obj)
nb <- knn2nb(knearneigh(coords, k=5))
weights <- nb2listw(nb, style="W")
# Fit spatial lag model
model <- lagsarlm(y ~ x1 + x2, data=sp_obj, listw=weights)
summary(model)
# Check residuals for remaining spatial autocorrelation
moran.test(residuals(model), listw=weights)
Key Considerations
Model Selection
- SAR when dependent variable shows spatial dependence
- SEM when residuals show spatial dependence
- SDM when both dependent and independent variables show spatial patterns
- Geostatistical models for continuous spatial processes
Common Challenges
- Modifiable areal unit problem (MAUP)
- Edge effects in spatial analysis
- Computational intensity with large datasets
- Specification of spatial weights matrix
Important Note
Always visualize your spatial data before modeling. Spatial patterns that are obvious visually may not always be captured by statistical tests, and vice versa.