5 Quantile Regression in R

Author: Lan Wang and Ruosha Li

5.1 Goals

  • Fit a quantile regression model in R.
  • Test the significance of regression coefficients
  • Creat and plot pointwise confidence intervals

5.2 R Package quantreg

install.packages("quantreg")
  • Then, load the package by typing:
library(quantreg)
## Loading required package: SparseM
## 
## Attaching package: 'SparseM'
## The following object is masked from 'package:base':
## 
##     backsolve

5.3 Fitting linear quantile regression in R

rq(formula,tau,data,method,...)
  • method=“br” (default): simplex method. Efficient for problems with modest sample size (up to several thousands).
  • method=“fn”: Frisch-Newton interior point method; more efficient than simplex for larger sample sizes;
  • method=“pfn”: Frisch-Newton with pre-processing (suitable when n is very large).

5.4 Example 1: Fit quantile regression at one quantile level

-Engel food expenditure data used in Koenker and Bassett(1982).

-This is a regression data set consisting of 235 observations on income and expenditure on food for Belgian working class households.

  • income: annual household income in Belgian francs
  • foodexp: annual household food expenditure
data(engel) 
plot(engel, log = "xy", main = "'engel' data (log - log scale)") 

f1<-rq(log10(foodexp) ~ log10(income), tau=0.5, data=engel)
f1
## Call:
## rq(formula = log10(foodexp) ~ log10(income), tau = 0.5, data = engel)
## 
## Coefficients:
##   (Intercept) log10(income) 
##        0.1817        0.8766 
## 
## Degrees of freedom: 235 total; 233 residual

5.5 Example 2: Fit quantile regression at multiple quantile levels simultaneously

plot(log10(foodexp) ~ log10(income), data = engel, main = "'engel' data (log10 - transformed)") 
taus <- c(.15, .25, .50, .75, .95, .99) 
rqs <- as.list(taus) 
for(i in seq(along = taus)) 
  { rqs[[i]] <- rq(log10(foodexp) ~ log10(income), tau = taus[i], data = engel) 
  lines(log10(engel$income), fitted(rqs[[i]]), col = i+1) } 
legend("bottomright", paste("tau = ", taus), inset = .04, col = 2:(length(taus)+1), lty=1)

5.6 Inference on linear quantile regression in R

summary.rq(fit.obj,{\tblue{se=" "}},...)
  • se=“rank”: provides CI by inverting a rank score test.
  • se=“nid”: estimation of the asymptotic variance assuming non i.i.d. errors (through sparsity estimation).
  • se=“iid”: assumes i.i.d. errors in variance estimation.
  • se=“ker”: variance estimated using a kernel estimate (Powell Sandwich).

5.7 Resampling methods for inference on quantile regression

summary.rq(fit.obj,{\tblue{se="boot"}},bsmethod="")
  • bsmethod=“xy”: xy-paired bootstrap.
  • bsmethod=“pwy”: perturbing the estimating equation (Parzen, 1994)
  • bsmethod=“mcmb”: Markov chain marginal boostrap (He and Hu, 2002), computationally efficient for large n and p.
  • bsmethod=“wxy”: generalized bootstrap with unit exponential weights.
  • bsmethod=“wild”: wild bootstrap (Feng et al., 2011).
f1<-rq(log10(foodexp) ~ log10(income), tau=0.5, data=engel)
summary.rq(f1, se="boot", bsmethod="xy", R=1000)
## 
## Call: rq(formula = log10(foodexp) ~ log10(income), tau = 0.5, data = engel)
## 
## tau: [1] 0.5
## 
## Coefficients:
##               Value    Std. Error t value  Pr(>|t|)
## (Intercept)    0.18168  0.10234    1.77520  0.07717
## log10(income)  0.87659  0.03506   25.00502  0.00000
summary.rq(f1, se="boot", bsmethod="pwy", R=1000)
## 
## Call: rq(formula = log10(foodexp) ~ log10(income), tau = 0.5, data = engel)
## 
## tau: [1] 0.5
## 
## Coefficients:
##               Value    Std. Error t value  Pr(>|t|)
## (Intercept)    0.18168  0.10261    1.77048  0.07796
## log10(income)  0.87659  0.03524   24.87675  0.00000

5.8 Pointwise confidence intervals

tau.seq=seq(0.1,0.9,0.01)
f2 = rq(log10(foodexp) ~ log10(income),tau=tau.seq,data=engel)
sfm = summary(f2, se="rank", alpha=0.05)
plot(sfm, mfrow=c(1,2))