Biden has now been called the winner of the 2020 election. In response, Trump is claiming fraud, while obscure corners of the internet are discussing one angle to investiate potential fraud: violations of Benford's Law. Benford's Law is a statistical law concerning the leading digits in many real-life numerical data sets. The law posits a simple mathematical relationship between the leading digit and its frequency. It implies that in many naturally occurring collections of numbers, the leading digit is likely to be small. This law has been used in the past to detect scientific fraud (although note that the second and third digit might provide a more powerful test of fraud).
I decided to independently verify some of the claims made about violations of Benford's law in vote counts. It was this Saidit post that introduced the problem to me and which led me to vote count data sets for four locales:
*Allegheny County, PA
*Fulton County, PA
*Chicago, IL
*Milwaukee, WI
A comparison of Trump vs. Biden counts, by ward/precinct over all four municipalities, is shown in this plot. From this plot it is clear that vote counts for Trump and Biden are about the same order of magnitude.
Leading digit histograms are shown here (Allegheny), here (Fulton), here (Chicago) and here (Milwaukee). Except for Fulton County, GA, these histograms show substantial violation of Benford's law. For example, this plot shows the leading digit distribtion for Biden votes in Chicago (histogram), compared with the expected vote counts from Benford's Law (line). Here is the corresponding plot for Trump's votes in Chicago.
I used a Poisson regression model to compare the observed vote count distribution against the expected count under Benford's law. A summary of p-values for Biden, Trump, and Jorgensen appears here. A table of all p-values appears here. Except for Fulton County, GA, the p-values are exceedingly small.
It is intriguing to note that the largest distortions appear in counts for Biden. However, while it is tempting to conclude that a small p-value for a single candidate's vote count necessarily implies fraudulent behavior on behalf of that candidate, this is not necessarily the case: if fraudulent behavior is undertaken on behalf of one candidate, distortion is possible for other candidates' vote counts, depending upon the method used to modify counts. Consequently, it is impossible from this analysis to identify the intended beneficiary from any potential fraud.
Potential limitations of this analysis is that any conclusion of election fraud crucially depends on an assumption that vote counts follow Benford's laws. In order for this to be the case, the data generating mechanism must follow one or more of several other laws (e.g. scale invariance). More details appear here. However, at the very least, the order of magnitude of vote counts are similar for Biden and Trump, as seen here.
Code available by PM request.
EDIT: I remain concerned about whether vote counts should really follow Benford's law. However, I re-analyzed the data in base 4 instead of base 10. Similar results, with weaker p-values.
there doesn't seem to be anything here