Distribution-Weighted Least Squares Regression Estimator. An application in Mendelian Randomization

A. García-Pérez1 and J. Lorenzo Bermejo2,3,4
  • 1

    Departamento de Estadística, IO, CN, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain [agar-per@ccia.uned.es]

  • 2

    Statistical Genetics Research Group, Institute of Medical Biometry, Heidelberg University, Heidelberg, Germany [lorenzo@imbi.uni.heidelberg.de]

  • 3

    Programa de Doctorado en Ciencias, Universidad Nacional de Educación a Distancia (UNED), Madrid, Spain [jlorenzo12@alumno.uned.es]

  • 4

    This paper is part of the PhD thesis of the second author under the supervision of the first one

Keywords: Robust Regression – Saddlepoint Approximation – Mendelian Randomization

1 Introduction

Trimmed least squares (LTS) regression estimator Rousseeuw [1984] is, probably, one of the most widely used estimators in robust regression. However, it is not possible to obtain a closed-form expression for it, and its computation could be difficult as it has a non-differentiable objective function. LTS gives 0 weights to large values of standard residuals and 1 weights to small values. In this contributed paper we consider a weighted least squares regression estimator, DWLS, where the weights are not so extreme, but are the probabilities that the corresponding studentized residual is an outlier, i.e. an unlikely observation, when we consider a scale or location contaminated normal model. We obtain a closed-form expression of this new estimator, which has good robustness properties. Based on DWLS, we define a novel robust Inverse Variance Weighted (IVW) estimator for Mendelian Randomisation studies. We study its main properties and include some examples.

2 Main results

In the classical Linear Regression Model with p covariates,

𝐲=𝐱𝜷+𝐞

where errors follow a normal distribution, ei|𝐱iN(0,σ2),i=1,,n, the residuals are defined as

ri=ri(𝜷)=yi-𝐱i𝜷

and the classical least squares estimator is obtained as

β^LS=min𝜷pi=1nri2(𝜷)

Outliers are usually detected using the Studentized Residuals for Regression Diagnostics

SRi=riσ^1-hii

where hii are the elements of the diagonal of the hat matrix 𝐇=𝐱(𝐱t𝐱)-1𝐱t.

We know that SRi are iid variables that follow a student t distribution with n-p-1 degrees of freedom, if homoscedasticity and normality are satisfied.

Within the class of weighted least squares estimators

min𝜷pi=1nwiri2(𝜷)

considering that ei|𝐱i(1-ϵ)N(0,σ2)+ϵN(μ,k2σ2),i=1,,n, we propose as weights

wi(t)=P{|SRi|>t}

for the distribution-weighted least squares regression line, DWLS, defined as

β^DWLS=min𝜷pi=1nwiri2(𝜷)

for which the lower the probability of obtaining the ith residual, the lower it weights on the regression line.

Assuming the contaminated normal distribution for the errors mentioned above and with a von Mises expansion for the functional tail probability

PF{Tn>t}PG{Tn>t}+TAIF(x;t;Tn,G)𝑑F(x)

and a Lugannani and Rice approximation for the Tail Area Influence Function TAIF, Field and Ronchetti [1985], we obtain, for a scale contaminated normal model, the weights

wi P{tn-p-1>t}+ϵA1{(A2-3t2+14(t2-1))[t2-k2(t2-1)]-1/2
+3t2-1t2-1k22[t2-k2(t2-1)]-3/2
-3k44[t2-k2(t2-1)]-5/2
+t2(1-k2)t2-1-A2}

and, for a location contaminated normal model, the weights

wiP{tn-p-1>t}+ϵA1{(eμ2(t2-1)/2-1)(A2+t2μ2t2-1)
-eμ2(t2-1)/2μ4t44}

DWLS estimator is regression, scale, and affine equivariant and has the highest possible finite sample breakdown point:

[(n+1)/2]n       if p=1
[(n-p)/2]+1n       if p>1

3 Mendelian Randomization

Mendelian randomisation is a method of inference for assessing the causal effect of an exposure X on an outcome Y. Nevertheless, to analyze the significance of the slope in the classical regression line of Y on X is a too simple method because there are a lot of covariates, than just X, to explain Y. To solve this problem, L instrumental variables Z (DNA markers related to X) are used, obtaining a ratio estimator β^Rj for each DNA marker, j=1,,L. Combining all ratio estimators (all DNA markers), we obtain the classical IVW estimator, which has a break point equal to 0. Since the weights in the IVW estimator are the inverse of the variance of β^Rj, and IVW is the slope of the classical regression line through the origin of U on V where Uj=β^Yj/σYj and Vj=β^Xj/σYj, we propose a robust IVW estimator in which the classical regression line is replaced by the robust DWLS regression line. We will conclude the talk with an application of this new estimator to the real data set investigated in.

References

  • Field and Ronchetti [1985] C. Field and E. Ronchetti. A tail area influence function and its application to testing. Sequential Analysis, 4:19–41, 1985.
  • Rousseeuw [1984] P. J. Rousseeuw. Least median of squares regression. Journal of the American Statistical Association, 79:871–880, 1984.