When WikiLeaks released more than 90,000 classified documents relating to military operations in Afghanistan this past July, journalists, politicians and members of the public were eager to see the data. Some of them were looking for a true tally of IED attacks, others for the number of civilian casualties, and so on. One group, led by Drew Conway, a PhD candidate in political science at New York University, had a more unusual goal. They wanted to use an arcane statistical law to determine if the data in the reports was truly raw, or if anyone had tampered with it.
The law is Benford's Law, stated in 1938 by U.S. physicist Frank Benford. It posits that in lists of multidigit numbers drawn from a wide range of natural and man-made phenomena, the leading digits aren't distributed in a uniform way. You might expect the numbers 1 to 9 to appear with roughly equal frequency in the first slot. In fact, lower numbers are much more common in that position than higher numbers. The digit 1 appears first about a third of the time, 2 the next most often, and each subsequent number up to 9 appears with less frequency-9 is first in only about 1 in 20 cases. Data sets that adhere to the law are varied and surprising, and they include war casualties, lengths of rivers, the size of craters on the moon, even the byte sizes of files stored on your hard drive.
Benford's Law also appears to be pervasive in many areas of business and finance. Stock prices and index levels, and data from tax returns and expense claims all conform to the law if the numbers have not been fudged. That makes Benford's Law a powerful tool in forensic data analysis. If someone cooks books, the falsified numbers will likely reveal themselves. As a result, Benford experts such as Mark Nigrini, a business professor at the College of New Jersey, are in demand as sleuths. He's worked with law firms, corporations and government agencies, including the district attorney's office in Brooklyn and the Canada Revenue Agency. Nigrini's new book, Forensic Analytics and Forensic Investigation, due to be published in March, includes a detailed section on Benford.
In one recent case, Nigrini advised a major multinational consumer goods manufacturer on how to detect irregularities in the redemption of coupons by retailers and consumers. The area is ripe for corruption, and has been targeted by organized crime. Scam artists submit real or counterfeit coupons to the manufacturer for cash, or rely on corrupt store employees or associates within the manufacturer's offices. To complicate matters, manufacturers usually shred coupons within weeks of receiving them: "There aren't warehouses big enough to store them all," says Nigrini. Faced with that reality, manufacturers have to detect fraud quickly. Analyzing incoming coupon claims using a system that includes Benford can provide a critical early warning.
Other experts are trying to apply the law to more elaborate frauds. One of the great financial thrillers of our era was independent investigator Harry Markopolos's cracking of the Madoff Investment Securities scam. In a now-famous memo Markopolos submitted to the SEC in 2005-"The World's Largest Hedge Fund Is a Fraud"-he argued that the consistently profitable investment results that Madoff had reported for almost two decades were impossible to achieve. Markopolos used so-called Mosaic Theory to arrive at his conclusions. But Paul Kedrosky, a respected U.S. entrepreneur, academic and commentator who has analyzed Madoff's reported monthly fund returns, says they came so close to matching the predicted distribution of first digits under Benford's Law that they should have been suspect.
Kedrosky's analysis points out one possible shortcoming of Benford's Law: Fraudsters can avoid detection by adhering to it. The distribution of first digits can be found on Wikipedia and other online sources.
Another weakness is that the law works best on large data sets with lots of fudged numbers, but many frauds are based on just one or two. Benford is also generally only useful at the transactions level, not the portfolio level. As one money manager explained to me, when many individual transaction prices are fake, as they were in Enron's notorious "special purpose vehicles," Benford can shine new light. But only Enron's auditors had access to transaction data. Totals that appeared in its financial reports didn't contravene Benford's predicted distribution of first digits.
Nabbing a specific culprit may be difficult as well. In the case of the Afghanistan data released by WikiLeaks, Conway's team didn't find strong evidence of tampering. What if they had? I asked Conway. Would that have implied wrongdoing by military authorities, or by WikiLeaks in gathering and publishing the data? Conway wrote back: "There's no way of knowing." Timothy Taylor is a Vancouver-based author.