Our recent exploration of Benford's Law and its established use within the financial regulatory and services community has been highly insightful. Benford's Law is a powerful statistical tool employed to identify potential fraud or anomalies in numerical datasets. It stems from the observation that, in many naturally occurring datasets, the first digit of the numbers adheres to a specific distribution, with the digit 1 appearing most frequently, followed by 2, 3, and so forth, up to 9.
As we continue to identify sources and expand our data library, we are always looking for ways to validate our data, for decisioning accuracy and to detect fraud.
For risk professionals leveraging technology, the goal is efficiency and accuracy, while further enhancing the reliability of the data being consumed.
Our investigations, models' testing and recent integration of Benford's law into our own work has proven beneficial. And its wide acceptance and use among the financial regulatory and services communities is encouraging.
Benford's law is a statistical tool that can be used to detect potential fraud or anomalies in numerical datasets. It is based on the observation that in many naturally occurring datasets, the first digit of the numbers follows a specific pattern, with the digit 1 being the most common, followed by 2, 3, and so on, up to 9.
It was astronomer Simon Newcomb, in 1881, that first recognized the pattern (his paper is here1), however, it wasn’t until 1938, when physicist Frank Benford tested Newcomb's hypothesis against 20 sets of data and published a scholarly paper verifying the law (a great Abstract is here2), that knowledge and use of this law began to take hold. That happens sometimes, as it did here, and we have what is now commonly referred to Benford's Law.
As we began to build models to test our own datasets, we were encouraged to see this pattern recurring. At the borrower level, we found these models helpful in analyzing bank data, company financials and tax returns. At the macroeconomic level, running these models on industry data, economic health and population data proved valuable as a test of the accuracy of the data we use in our models.
In this magnified example, the point is to identify datasets that require extra attention.
Take care, as there are some limitations to its applicability, which include:
Overall, while Benford's law is a useful tool for detecting anomalies in datasets, it should be used with caution and in conjunction with other safeguards (chain-of-custody) and analytical methods (overlap and inferences among sources) to ensure its validity.
If you’re interested to dive deeper into the sources used to write this, please see here3 for an explanation on and derivation of the law, and here4 for a practical example using Microsoft Excel.
Footnotes
Sherif Hassan is the principal of Syh Strategies, a financial and technology services advisory firm based in New York City. Among the services they provide in Lending are business strategy, portfolio analysis, credit modeling, product and pricing optimization, and operations architecture for lenders and brokers of all sizes and at all stages of development. He can be contacted at sherif@syhstrategies.com.