Understanding conditional quantiles with mathematical foundations
Quantile regression is a type of regression analysis used in statistics and econometrics. While ordinary least squares (OLS) regression estimates the conditional mean of the dependent variable, quantile regression estimates the conditional median or other quantiles.
Key advantage: Quantile regression is more robust to outliers and provides a more comprehensive view of the relationship between variables.
This technique is particularly useful when:
For a random variable Y with cumulative distribution function F(y) = P(Y ≤ y), the τ-th quantile Q(τ) is defined as:
The quantile regression model for the τ-th quantile can be expressed as:
Where:
The coefficients β(τ) are estimated by minimizing the following loss function:
Where ρτ is the check function (also called pinball loss):
The most common package for quantile regression in R is quantreg by Roger Koenker.
# Install and load the package install.packages("quantreg") library(quantreg) # Load example data data(engel) head(engel) # Fit median regression (τ = 0.5) fit_median <- rq(foodexp ~ income, data = engel, tau = 0.5) summary(fit_median) # Fit multiple quantiles fit_multi <- rq(foodexp ~ income, data = engel, tau = c(0.1, 0.25, 0.5, 0.75, 0.9)) summary(fit_multi) # Plot the results plot(engel$income, engel$foodexp, xlab = "Income", ylab = "Food Expenditure", main = "Quantile Regression") for(i in 1:length(c(0.1, 0.25, 0.5, 0.75, 0.9))) { abline(coef(fit_multi)[,i], col = i+1) } legend("topleft", legend = c("0.1", "0.25", "0.5", "0.75", "0.9"), col = 2:6, lty = 1, title = "Quantiles")
Visualizing multiple quantile regressions helps understand how relationships change across the distribution:
# Plot all quantile regression lines library(ggplot2) ggplot(engel, aes(income, foodexp)) + geom_point(alpha = 0.5) + geom_quantile(quantiles = c(0.1, 0.25, 0.5, 0.75, 0.9), aes(color = factor(..quantile..))) + scale_color_viridis_d(name = "Quantile") + labs(title = "Quantile Regression of Food Expenditure vs Income", x = "Income", y = "Food Expenditure") + theme_minimal()
The quantreg package provides methods for hypothesis testing:
# Test for equality of slopes across quantiles anova(fit_multi) # Test if a coefficient is equal across quantiles summary(fit_multi, se = "boot")
Studying wage determinants across income distribution, analyzing Engel curves for different expenditure quantiles.
Analyzing growth charts where different quantiles represent different percentiles of child growth.
Modeling extreme weather events by focusing on upper quantiles of temperature or precipitation distributions.
Value at Risk (VaR) calculations and other risk management applications that focus on tail behavior.