Spatial Models in R with Mathematical Explanation

Introduction to Spatial Models

Spatial models are statistical models that incorporate spatial dependence between observations. They are essential when dealing with data that has a geographic component, as observations close to each other tend to be more similar than those farther apart.

The general form of a spatial model can be represented as:

Y(s) = μ(s) + ε(s)

Where:

Y(s) is the observed value at location s
μ(s) is the mean or trend component (may depend on covariates)
ε(s) is the spatially correlated error term

In R, spatial modeling is primarily handled through packages like sp, sf, gstat, and spdep.

Types of Spatial Models

Spatial Autoregressive Models

These models account for spatial dependence in the dependent variable. The most common are:

SAR (Spatial Autoregressive)
SEM (Spatial Error Model)
SDM (Spatial Durbin Model)

SAR model formulation:

Y = ρWY + Xβ + ε

Where:

W is the spatial weights matrix
ρ is the spatial autoregressive coefficient

# Implementation in R using spdep

library(spdep)

sar_model <- lagsarlm(y ~ x1 + x2, data=df, listw=weights)

Geostatistical Models

Used for continuous spatial data, modeling spatial covariance structure through variograms.

The semivariogram γ(h) is defined as:

γ(h) = ½ Var[Z(s+h) - Z(s)]

Common models:

Exponential: γ(h) = c₀ + c[1 - exp(-h/a)]
Spherical
Gaussian

# Implementation using gstat

library(gstat)

vgram <- variogram(z ~ 1, locations=~x+y, data=df)

fit <- fit.variogram(vgram, model=vgm("Exp"))

krige_pred <- krige(z ~ 1, locations=~x+y, data=df, newdata=grid, model=fit)

Point Process Models

Model the distribution of points in space, often using Poisson processes.

Inhomogeneous Poisson process with intensity λ(s):

P[N(B) = n] = [∫_Bλ(s)ds]ⁿexp(-∫_Bλ(s)ds)/n!

Where:

N(B) is the number of points in region B
λ(s) is the intensity function

# Implementation using spatstat

library(spatstat)

ppp_obj <- ppp(x, y, window=owin)

fit <- ppm(ppp_obj, ~x+y)

predict(fit, type="trend")

Mathematical Foundations

Spatial Weights Matrix

The spatial weights matrix W is fundamental to spatial modeling. It quantifies the spatial relationships between observations.

Common specifications:

Contiguity-based: w_ij = 1 if regions i and j share a boundary, 0 otherwise
Distance-based: w_ij = f(d_ij) where d_ij is distance between i and j
k-nearest neighbors

# Creating spatial weights in R

library(spdep)

coords <- cbind(df$x, df$y)

nb <- knn2nb(knearneigh(coords, k=5))

weights <- nb2listw(nb, style="W") # Row-standardized

Spatial Autocorrelation

Spatial autocorrelation measures the degree of dependency among observations in space.

Moran's I statistic:

I = (n/S₀) × (Σ_iΣ_jw_ij(y_i-ȳ)(y_j-ȳ)) / Σ_i(y_i-ȳ)²

Where:

n is the number of spatial units
S₀ = Σ_iΣ_jw_ij (sum of all weights)
w_ij are the spatial weights

# Testing for spatial autocorrelation

moran.test(df$y, listw=weights)

geary.test(df$y, listw=weights)

Practical Implementation in R

1. Data Preparation

# Load necessary packages

library(sf) # For spatial data

library(spdep) # Spatial dependence

library(gstat) # Geostatistics

library(tmap) # Thematic maps

# Read spatial data (e.g., shapefile)

spatial_data <- st_read("path/to/shapefile.shp")

# Convert to spatial object if needed

sp_obj <- as(spatial_data, "Spatial")

2. Model Selection Workflow

Test for spatial autocorrelation (Moran's I, Geary's C)
Examine spatial patterns (variogram for geostatistical data)
Choose appropriate model based on data characteristics
Estimate model parameters
Validate model (cross-validation, residuals analysis)
Make predictions if needed

Complete Example: Spatial Regression

# Example: Spatial lag model

# Create neighbor list

coords <- coordinates(sp_obj)

nb <- knn2nb(knearneigh(coords, k=5))

weights <- nb2listw(nb, style="W")

# Fit spatial lag model

model <- lagsarlm(y ~ x1 + x2, data=sp_obj, listw=weights)

summary(model)

# Check residuals for remaining spatial autocorrelation

moran.test(residuals(model), listw=weights)

Key Considerations

Model Selection

SAR when dependent variable shows spatial dependence
SEM when residuals show spatial dependence
SDM when both dependent and independent variables show spatial patterns
Geostatistical models for continuous spatial processes

Common Challenges

Modifiable areal unit problem (MAUP)
Edge effects in spatial analysis
Computational intensity with large datasets
Specification of spatial weights matrix

Important Note

Always visualize your spatial data before modeling. Spatial patterns that are obvious visually may not always be captured by statistical tests, and vice versa.