Anomaly detection under Benford’s law
-
Department of Economics and Management, University of Parma, Parma, Italy [andrea.cerioli@unipr.it]
-
Joint Research Centre, European Commission, Ispra, Italy [domenico.perrotta@ec.europa.eu; andrea.cerasa@ec.europa.eu]
-
Department of Economics and Statistics, University of Siena, Siena, Italy [lucio.barabesi@unisi.it]
1 Introduction
In this presentation we address the task of identifying fabricated declarations in customs data through the perspective of analyzing transaction digits instead of transaction values. Anomaly detection in the space of transaction digits can be a powerful companion to more classical outlier detection methods, as proven by Barabesi et al. [2018, 2022, 2023]. The approach that we follow relies on the availability of a suitable model for the digits of genuine transactions, i.e. of transactions that originate from regular trade flows. Cerioli et al. [2019] have shown that Benford’s law [Berger and Hill, 2015] can potentially provide such a model under fairly general and easily verifiable conditions of trade, related to the ratio between the number of transactions and the number of traded goods of the subject under investigation. An additional bonus of the digit approach is its ability to pinpoint serial misconduct, by focusing on the whole amount of transactions made by a single operator instead of looking at the potential anomaly of each transaction within the reference market, as done by the methods that work in the space of transaction values [Perrotta et al., 2020]. The spirit of anomaly detection in the space of transaction digits is then similar to the approach followed by Magnani et al. [2024], who focus on detecting the collective presence of outliers even if their precise identification may be challenging due to the weakness of individual signals.
The main goal of our work is to answer one major and still open question: how can we detect the behavior of economic operators who are aware of the prevalence of the Benford’s pattern in the digits of regular transactions and try to manipulate their data in such a way that the same pattern also holds after data fabrication? Such a challenging manipulation scheme undermines the available methodologies for fraud identification based on Benford’s law. [Lacasa, 2019] emphasizes this potential threat in the specific context of the analysis of international trade data, while similar concerns exist in other financial scenarios. More generally, the possibility that fraudsters adapt their approaches to state-of-the art analytical methods is known as “concept drift” in the machine learning literature since at least a decade and is now recognized as a crucial challenge in most disciplines [Bockel-Rickermann et al., 2023]. This appears to be especially true in our specific field: that Benford’s law has an almost immeasurable potential in the detection of various types of digit manipulations is becoming part of the collective imagination, with a rapidly increasing scientific literature [Berger et al., 2009] complemented by recent narrative and documentary production inspired from it [Nasser, 2020, Murtagh, 2023].
2 Summary of our results
We achieve our stated goal by means of four main methodological developments.
-
1.
We formalize the Benford-savvy behavior that we aim to contrast through a new contamination model for digits, baptized the “manipulated-Benford” scheme, which extends the one presented by Cerioli et al. [2019] to the Benford-savvy context.
-
2.
We study the distributional properties of the fractional part of the significand, which is the most informative random quantity under the manipulated-Benford model. This study opens the door to the construction of new nonparametric tests of the Benford hypothesis which are able to unmask the postulated Benford-savvy behavior.
-
3.
We obtain a general result which unveils the theoretical relationship between two well-known statistics for testing conformance to Benford’s law. The nature of this relationship turns out to be very peculiar and amenable to a surprising simplification. It then leads us to suggest a new test statistic which is again effective to detect data fabrication under the manipulated-Benford model.
-
4.
Since none of our new tests can be expected to dominate the others under all possible configurations of the manipulated-Benford model, we combine them into an exact test of conformance to Benford’s law which proves to be powerful under various specifications of the manipulated-Benford model.
We then investigate the empirical properties of the proposed tests through a simulation exercise and application to real customs data, concerning suspicious economic operators analyzed by investigators in a member state of the European Union. The values declared by these traders do conform to Benford’s law when only their first digit is considered, but they are picked as highly suspicious by our tests assuming a manipulated-Benford scheme. Although we cannot claim that our results are able to anticipate data manipulation with certainty, they surely point to situations where more substantial controls are needed in view of a possible serial and mathematically-informed illicit behavior.
The details of our work can be found in Barabesi et al. [2025], while our code and the anonymized significands of our empirical analysis are available at the GitHub repository:
https://github.com/AndreaCerioliUNIPR/Benford-savvy
References
- Goodness-of-fit testing for the Newcomb-Benford law with application to the detection of customs fraud. Journal of Business and Economic Statistics 36, pp. 346–358. Cited by: §1.
- On characterizations and tests of Benford’s law. Journal of the American Statistical Association 117, pp. 1887–1903. Cited by: §1.
- Robust inference under Benford’s law. Submitted , pp. . Cited by: §2.
- Statistical models and the Benford hypothesis: a unifying framework. TEST 32, pp. 1479–1507. Cited by: §1.
- Benford Online Bibliography. Note: http://www.benfordonline.netLast accessed on January 30, 2025 Cited by: §1.
- An introduction to benford’s law. Princeton Univ. Press, Princeton. Cited by: §1.
- Fraud analytics: a decade of research: organizing challenges and solutions in the field. Expert Systems with Applications 232, pp. 120605. Cited by: §1.
- Newcomb-Benford law and the detection of frauds in international trade. PNAS 116, pp. 106–115. Cited by: §1, item 1.
- Newcomb-Benford law helps customs officers to detect fraud in international trade. PNAS 116, pp. 11–13. Cited by: §1.
- Collective outlier detection and enumeration with conformalized closed testing. Technical report Technical Report 2308.05534, arXiv. Cited by: §1.
- This unexpected pattern of numbers is everywhere. Scientific American Magazine 329, pp. 82. Cited by: §1.
- Digits. Netflix series – Connected: The Hidden Science of Everything, Episode 4. Note: \urlhttps://www.netflix.com/title/81031737 Cited by: §1.
- The robust estimation of monthly prices of goods traded by the European Union. Technical report Technical Report JRC120407, EUR 30188 EN, Publications Office of the European Union, Luxembourg. Note: DOI:10.2760/635844DOI:10.2760/635844 Cited by: §1.