Robust Genomic Prediction and Heritability Estimation using Density Power Divergence

Upama Paul Chowdhury1 Ronit Bhattacharjee2
Susmita Das3
Abhik Ghosh4
Abstract

This manuscript delves into the intersection of genomics and phenotypic prediction, focusing on the statistical innovation required to navigate the complexities introduced by noisy covariates and confounders. The primary emphasis is on the development of advanced robust statistical models tailored for genomic prediction from single nucleotide polymorphism data in plant and animal breeding and multi-field trials. The manuscript highlights the significance of incorporating all estimated effects of marker loci into the statistical framework and aiming to reduce the high dimensionality of data while preserving critical information. This paper introduces a new robust statistical framework for genomic prediction, employing one-stage and two-stage linear mixed model analyses along with utilizing the popular robust minimum density power divergence estimator (MDPDE) to estimate genetic effects on phenotypic traits. The study illustrates the superior performance of the proposed MDPDE-based genomic prediction and associated heritability estimation procedures over existing competitors through extensive empirical experiments on artificial datasets and application to a real-life maize breeding dataset. The results showcase the robustness and accuracy of the proposed MDPDE-based approaches, especially in the presence of data contamination, emphasizing their potential applications in improving breeding programs and advancing genomic prediction of phenotyping traits.

  • 1

    University of Maryland Baltimore County, USA [upamap1@umbc.edu]

  • 2

    Indian Statistical Institute, Kolkata, India

  • 3

    Indian Statistical Institute, Kolkata, India [https://orcid.org/0000-0001-8034-6877]

  • 4

    Indian Statistical Institute, Kolkata, India [abhikghosh@isical.ac.in]

Keywords: Genomic Prediction, Field Trials, Robust Estimation, Minimum Density Power Divergence Estimator, Linear Mixed Models.

1 Tables, and Equations

Table 1: Prediction accuracy measures obtained from different existing methods and proposed method with different tunning parameters under the one-stage approach with artificial datasets (test data).
Methods Pure Data Random Contamination Block Contamination
ρ^ MAD ρ^ MAD ρ^ MAD
RMLA 0.532 23.84 0.270 18.38 0.322 11.12
RMLV 0.587 16.77 0.387 16.69 0.470 18.38
Rob-RMLA 0.540 25.10 0.343 17.50 0.520 13.29
Rob-RMLV 0.712 7.23 0.430 8.18 0.550 6.48
Proposed MDPDE based approach
α=0.1 0.667 6.02 0.620 7.34 0.667 6.34
α=0.3 0.745 6.30 0.650 7.37 0.645 6.37
α=0.5 0.737 6.67 0.537 6.67 0.537 5.67
α=0.7 0.710 5.37 0.630 6.37 0.210 5.37
α=1 0.823 4.37 0.670 5.30 0.440 5.13
Table 2: Prediction accuracy measures obtained from different methods under the two-stage approach with artificial datasets (test data).
Methods Pure Data Random Contamination Block Contamination
ρ^ MAD ρ^ MAD ρ^ MAD
Rob1 0.754 4.10 0.740 5.5 0.710 5.29
Rob2 0.793 4.23 0.730 4.18 0.740 4.48
Proposed MDPDE based approach
α=0.1 0.820 4.02 0.760 4.14 0.747 4.34
α=0.3 0.845 4.20 0.750 3.67 0.742 4.37
α=0.5 0.837 3.67 0.737 3.39 0.733 4.67
α=0.7 0.810 2.37 0.820 3.08 0.758 3.37
α=1 0.860 2.37 0.832 2.93 0.800 3.34

Here, ρ^ is the Pearsonian corelation coefficient and MAD is the median absolute deviation, which we used as a precision measure.

2 Data analysis

In addition, there is a real world data analysis on the breeding data set of life collected by the International Center for the Improvement of Maize and Wheat (CIMMYT). The data contains 300 tropical maize lines in 10 blocks and the genotyping was conducted using 1135 SNP markers and the target phenotypic trait was grain yield assessed under severe drought stress and well-watered conditions.

References

  • Basu et al. [1998] Ayanendranath Basu, Ian R Harris, Nils L Hjort, and MC Jones. Robust and efficient estimation by minimising a density power divergence. Biometrika, 85(3):549–559, 1998.
  • Ghosh [2019] Abhik Ghosh. Robust inference under the beta regression model with application to health care studies. Statistical methods in medical research, 28(3):871–888, 2019.
  • Ghosh and Basu [2013] Abhik Ghosh and Ayanendranath Basu. Robust estimation for independent non-homogeneous observations using density power divergence with applications to linear regression. 2013.
  • Ghosh and Basu [2015] Abhik Ghosh and Ayanendranath Basu. Robust estimation for non-homogeneous data and the selection of the optimal tuning parameter: the density power divergence approach. Journal of Applied Statistics, 42(9):2056–2072, 2015.
  • Ghosh and Basu [2016] Abhik Ghosh and Ayanendranath Basu. Robust estimation in generalized linear models: the density power divergence approach. Test, 25:269–290, 2016.
  • Giusti et al. [2014] Caterina Giusti, Nikos Tzavidis, Monica Pratesi, and Nicola Salvati. Resistance to outliers of m-quantile and robust random effects small area models. Communications in Statistics-Simulation and Computation, 43(3):549–568, 2014.
  • Goegebeur et al. [2014] Yuri Goegebeur, Armelle Guillou, and Andréhette Verster. Robust and asymptotically unbiased estimation of extreme quantiles for heavy tailed distributions. Statistics & Probability Letters, 87:108–114, 2014.
  • Holland [1973] Paul W Holland. Weighted ridge regression: Combining ridge and robust regression methods. Technical report, National Bureau of Economic Research, 1973.
  • Koller [2013] Manuel Koller. Robust estimation of linear mixed models. PhD thesis, ETH Zurich, 2013.
  • Koller [2016] Manuel Koller. robustlmm: an r package for robust estimation of linear mixed-effects models. Journal of statistical software, 75:1–24, 2016.
  • Lee and Jo [2023] Sangyeol Lee and Minyoung Jo. Robust estimation for bivariate integer-valued autoregressive models based on minimum density power divergence. Journal of Statistical Computation and Simulation, pages 1–29, 2023.
  • Lourenço et al. [2020] Vanda Milheiro Lourenço, Joseph Ochieng Ogutu, and Hans-Peter Piepho. Robust estimation of heritability and predictive accuracy in plant breeding: evaluation using simulation and empirical data. BMC genomics, 21(1):1–18, 2020.
  • Saraceno et al. [2020] Giovanni Saraceno, Abhik Ghosh, Ayanendranath Basu, and Claudio Agostinelli. Robust estimation under linear mixed models: The minimum density power divergence approach. arXiv preprint arXiv:2010.05593, 2020.