Spatial Models in R

Mathematical foundations and practical implementation

Introduction to Spatial Models

Spatial models are statistical models that incorporate spatial dependence between observations. They are essential when dealing with data that has a geographic component, as observations close to each other tend to be more similar than those farther apart.

The general form of a spatial model can be represented as:

Y(s) = μ(s) + ε(s)

Where:

  • Y(s) is the observed value at location s
  • μ(s) is the mean or trend component (may depend on covariates)
  • ε(s) is the spatially correlated error term

In R, spatial modeling is primarily handled through packages like sp, sf, gstat, and spdep.

Types of Spatial Models

Spatial Autoregressive Models

These models account for spatial dependence in the dependent variable. The most common are:

  • SAR (Spatial Autoregressive)
  • SEM (Spatial Error Model)
  • SDM (Spatial Durbin Model)

SAR model formulation:

Y = ρWY + Xβ + ε

Where:

  • W is the spatial weights matrix
  • ρ is the spatial autoregressive coefficient

# Implementation in R using spdep

library(spdep)

sar_model <- lagsarlm(y ~ x1 + x2, data=df, listw=weights)

Geostatistical Models

Used for continuous spatial data, modeling spatial covariance structure through variograms.

The semivariogram γ(h) is defined as:

γ(h) = ½ Var[Z(s+h) - Z(s)]

Common models:

  • Exponential: γ(h) = c₀ + c[1 - exp(-h/a)]
  • Spherical
  • Gaussian

# Implementation using gstat

library(gstat)

vgram <- variogram(z ~ 1, locations=~x+y, data=df)

fit <- fit.variogram(vgram, model=vgm("Exp"))

krige_pred <- krige(z ~ 1, locations=~x+y, data=df, newdata=grid, model=fit)

Point Process Models

Model the distribution of points in space, often using Poisson processes.

Inhomogeneous Poisson process with intensity λ(s):

P[N(B) = n] = [∫Bλ(s)ds]nexp(-∫Bλ(s)ds)/n!

Where:

  • N(B) is the number of points in region B
  • λ(s) is the intensity function

# Implementation using spatstat

library(spatstat)

ppp_obj <- ppp(x, y, window=owin)

fit <- ppm(ppp_obj, ~x+y)

predict(fit, type="trend")

Mathematical Foundations

Spatial Weights Matrix

The spatial weights matrix W is fundamental to spatial modeling. It quantifies the spatial relationships between observations.

Common specifications:

  1. Contiguity-based: wij = 1 if regions i and j share a boundary, 0 otherwise
  2. Distance-based: wij = f(dij) where dij is distance between i and j
  3. k-nearest neighbors

# Creating spatial weights in R

library(spdep)

coords <- cbind(df$x, df$y)

nb <- knn2nb(knearneigh(coords, k=5))

weights <- nb2listw(nb, style="W") # Row-standardized

Spatial Autocorrelation

Spatial autocorrelation measures the degree of dependency among observations in space.

Moran's I statistic:

I = (n/S₀) × (ΣiΣjwij(yi-ȳ)(yj-ȳ)) / Σi(yi-ȳ)²

Where:

  • n is the number of spatial units
  • S₀ = ΣiΣjwij (sum of all weights)
  • wij are the spatial weights

# Testing for spatial autocorrelation

moran.test(df$y, listw=weights)

geary.test(df$y, listw=weights)

Practical Implementation in R

1. Data Preparation

# Load necessary packages

library(sf) # For spatial data

library(spdep) # Spatial dependence

library(gstat) # Geostatistics

library(tmap) # Thematic maps

# Read spatial data (e.g., shapefile)

spatial_data <- st_read("path/to/shapefile.shp")

# Convert to spatial object if needed

sp_obj <- as(spatial_data, "Spatial")

2. Model Selection Workflow

  1. Test for spatial autocorrelation (Moran's I, Geary's C)
  2. Examine spatial patterns (variogram for geostatistical data)
  3. Choose appropriate model based on data characteristics
  4. Estimate model parameters
  5. Validate model (cross-validation, residuals analysis)
  6. Make predictions if needed

Complete Example: Spatial Regression

# Example: Spatial lag model

# Create neighbor list

coords <- coordinates(sp_obj)

nb <- knn2nb(knearneigh(coords, k=5))

weights <- nb2listw(nb, style="W")

# Fit spatial lag model

model <- lagsarlm(y ~ x1 + x2, data=sp_obj, listw=weights)

summary(model)

# Check residuals for remaining spatial autocorrelation

moran.test(residuals(model), listw=weights)

Key Considerations

Model Selection

  • SAR when dependent variable shows spatial dependence
  • SEM when residuals show spatial dependence
  • SDM when both dependent and independent variables show spatial patterns
  • Geostatistical models for continuous spatial processes

Common Challenges

  • Modifiable areal unit problem (MAUP)
  • Edge effects in spatial analysis
  • Computational intensity with large datasets
  • Specification of spatial weights matrix

Important Note

Always visualize your spatial data before modeling. Spatial patterns that are obvious visually may not always be captured by statistical tests, and vice versa.