Robustness of a Proportion Estimator of the Extremal Index

M. Souto de Miranda1 C. Amado2
M.C. Miranda3,4
and M.I. Gomes4
  • 1

    Center for Research and Development in Mathematics and Applications (CIDMA), University of Aveiro, Aveiro, Portugal [manuela.souto@ua.pt]

  • 2

    Center for Computational and Stochastic Mathematics (CEMAT) and Department of Mathematics, IST, University of Lisbon, Lisbon, Portugal [conceicao.amado@tecnico.ulisboa.pt]

  • 3

    Institute of Accounting and Administration and CIDMA, University of Aveiro, Aveiro, Portugal [cristina.miranda@ua.pt]

  • 4

    Centre of Statistics and its Applications (CEAUL), University of Lisbon, Lisbon, Portugal [migomes@ciencias.ulisboa.pt]

Keywords: Compound Poisson Process – Clusters of Extreme Values – Extremal Index – Proportion Estimator – Robustness

Abstract

The extremal index (EI) has a special role in the characterization of sample data that exhibits signals of clusters of extreme values. Assume that a process has properties such that its cumulative distribution function belongs to the Extreme Values distributions family and that there exists a limit distribution, which also belongs to the same distribution family. Under specific stationary conditions, a random process that contains clusters of exceedances above high fixed thresholds defines a compound Poisson process, considering the occurrence times of exceedances as the singularities of the process. The extremal index (EI) can be interpreted in different ways. Most commonly, the EI is the reciprocal of the cluster’s expected size in the limit distribution Leadbetter et al. [1983], despite the unknown size distribution. The EI can also be interpreted as the proportion of non-null inter–exceedances times between clusters Ferro and Segers [2003]. The last interpretation has suggested the EI Proportion estimator defined in Souto de Miranda et al. [2025], since under regularity and stationary conditions the EI is the parameter of a Bernoulli model, and they might exist samples such that the EI is the parameter of the binomial model. The EI Proportion estimator is the relative frequency of non-null inter–exceedances times between clusters with N observed exceedances in n-dimensional samples. Thus, assuming those conditions and samples with N observed exceedances, the Proportion estimator is the EI maximum likelihood estimator (generally not robust). Robust estimation methods adequate for such discrete models are not deeply investigated; moreover, dealing with observations of extreme values. The study investigates the EI Proportion estimator’s robustness. The EI is a model parameter that belongs to a continuous bounded space with a finite discrete sample space. So, B-robustness is known (see Souto de Miranda et al. [2025] or Ruckstuhl and Welsh [2001]). Present work looks for low sensitivity to the dependence structure of the original process and studies the breakdown point (BP) properties. It is proven that for a subset of EI values, the estimator has a strictly positive breakdown point.

Results of Proportional point estimation with real data are already illustrated in Miranda et al. [2024], using a data set of weekly pharmacy sales data from antihistamines for systemic use (R06, according to the Anatomical Therapeutic Chemical (ATC) Classification System), from 2014 to 2019. The data set has free access from Kaggle (Zdravković [2020]).

In what concerns the binomial model, results and comparisons are based on simulations. There are some technical details related to the estimation method, like using only samples containing a constant number of exceedances. That number of samples must be the largest possible so that it can picture a significant part of the dependence structure. Possible contaminated samples are generated from specific dependence models for which the EI is known.

In the framework of the binomial model, the study integrates a section devoted to EI confidence intervals (CI). The CIs were computed using Wald and Wilson CIs with a central EI Proportion estimate. CIs evaluation has been performed with samples from assumed models and contaminated samples.

All computations used 𝚁 software and some specific packages like 𝚎𝚡𝚝𝚁𝚎𝚖𝚎𝚜 and 𝚛𝚘𝚋𝚞𝚜𝚝𝚋𝚊𝚜𝚎.

Acknowledgments

We thank FCT - Fundação para a Ciência e Tecnologia, Portugal, through the CIDMA, within project UID/MAT/04106/2019, DOI:10.54499/UIDB/04106/ 2020 and DOI:10.54499/UIDP/04106/2020 and CEMAT within projects UIDB/ 04621/2020 and UIDP/04621/2020, DOI: 10.54499/UIDB/04621/2020 and CEAUL within project UIDB/00006/2020, DOI: 10.54499/UIDB/00006/2020.

References

  • Ferro and Segers [2003] C. A. T. Ferro and J. Segers. Inference for Clusters of Extreme Values. J. R. Statist. Soc. B, 65(2):545–556, 2003. doi: 10.1111/1467-9868.00401.
  • Leadbetter et al. [1983] M. R. Leadbetter, G. Lindgren, and H. Rootzén. Extremes and Related Properties of Random Sequences and Processes. Springer-Verlag, New-York, 1983. doi: 10.1007/BF00532484.
  • Miranda et al. [2024] M. C. Miranda, M. Souto de Miranda, and M. Ivette Gomes. New approaches to extremal index estimation. WSEAS Transactions on Systems, 23:223–231, 2024. ISSN 22242678. doi: 10.37394/23202.2024.23.25.
  • Ruckstuhl and Welsh [2001] A. F. Ruckstuhl and A. H. Welsh. Robust fitting of the binomial model. The Annals of Statistics, 29(4):1117 – 1136, 2001. doi: 10.1214/aos/1013699996.
  • Souto de Miranda et al. [2025] Manuela Souto de Miranda, M. Cristina Miranda, and M. Ivette Gomes. A direct approach in extremal index estimation. In New Frontiers in Statistics and Data Science (SPE 2023, Guimarães, Portugal), in press. Springer, 2025.
  • Zdravković [2020] Milan Zdravković. Pharma sales data, 2020. URL https://www.kaggle.com/ds/466126.