Quantile Regression in R

Understanding conditional quantiles with mathematical foundations

Introduction to Quantile Regression

Quantile regression is a type of regression analysis used in statistics and econometrics. While ordinary least squares (OLS) regression estimates the conditional mean of the dependent variable, quantile regression estimates the conditional median or other quantiles.

Key advantage: Quantile regression is more robust to outliers and provides a more comprehensive view of the relationship between variables.

This technique is particularly useful when:

  • The relationship between variables changes across the distribution
  • Outliers are present in the data
  • Heteroscedasticity exists
  • You need to understand the entire conditional distribution

Mathematical Foundation

1. Quantile Definition

For a random variable Y with cumulative distribution function F(y) = P(Y ≤ y), the τ-th quantile Q(τ) is defined as:

Q(τ) = inf{y : F(y) ≥ τ}, where τ ∈ (0,1)

2. Quantile Regression Model

The quantile regression model for the τ-th quantile can be expressed as:

Qτ(Y|X = x) = xTβ(τ)

Where:

  • Qτ(Y|X = x) is the conditional τ-th quantile of Y given X = x
  • x is the vector of explanatory variables
  • β(τ) is the vector of coefficients for the τ-th quantile

3. Estimation

The coefficients β(τ) are estimated by minimizing the following loss function:

minβi=1n ρτ(yi - xiTβ)

Where ρτ is the check function (also called pinball loss):

ρτ(u) = u(τ - I(u < 0)) = { τu if u ≥ 0 (τ - 1)u if u < 0 }

Implementation in R

1. Using the quantreg Package

The most common package for quantile regression in R is quantreg by Roger Koenker.

# Install and load the package
install.packages("quantreg")
library(quantreg)

# Load example data
data(engel)
head(engel)

# Fit median regression (τ = 0.5)
fit_median <- rq(foodexp ~ income, data = engel, tau = 0.5)
summary(fit_median)

# Fit multiple quantiles
fit_multi <- rq(foodexp ~ income, data = engel, tau = c(0.1, 0.25, 0.5, 0.75, 0.9))
summary(fit_multi)

# Plot the results
plot(engel$income, engel$foodexp, 
     xlab = "Income", ylab = "Food Expenditure",
     main = "Quantile Regression")
for(i in 1:length(c(0.1, 0.25, 0.5, 0.75, 0.9))) {
    abline(coef(fit_multi)[,i], col = i+1)
}
legend("topleft", legend = c("0.1", "0.25", "0.5", "0.75", "0.9"),
       col = 2:6, lty = 1, title = "Quantiles")

2. Visualizing Results

Visualizing multiple quantile regressions helps understand how relationships change across the distribution:

# Plot all quantile regression lines
library(ggplot2)

ggplot(engel, aes(income, foodexp)) +
  geom_point(alpha = 0.5) +
  geom_quantile(quantiles = c(0.1, 0.25, 0.5, 0.75, 0.9), 
                aes(color = factor(..quantile..))) +
  scale_color_viridis_d(name = "Quantile") +
  labs(title = "Quantile Regression of Food Expenditure vs Income",
       x = "Income", y = "Food Expenditure") +
  theme_minimal()

3. Hypothesis Testing

The quantreg package provides methods for hypothesis testing:

# Test for equality of slopes across quantiles
anova(fit_multi)

# Test if a coefficient is equal across quantiles
summary(fit_multi, se = "boot")

Applications

Economics

Studying wage determinants across income distribution, analyzing Engel curves for different expenditure quantiles.

Medicine

Analyzing growth charts where different quantiles represent different percentiles of child growth.

Environmental Science

Modeling extreme weather events by focusing on upper quantiles of temperature or precipitation distributions.

Finance

Value at Risk (VaR) calculations and other risk management applications that focus on tail behavior.