"Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than 5% of the time."
The probabilities are distributed as demonstration here.
This seemed counter-intuitive and I wanted to validate it myself. Let's look at the leading digit of all the txt files in one of my directories. Enter PowerShell.....
# Explore Benford's Law
$array=@()
foreach ($item in (Get-ChildItem -Path p:\ -Filter *.txt -Recurse))
{
$array+= $item.length.toString()[0]
}
$array `
| Group-Object -NoElement `
| Sort-Object count -Descending `
| Format-Table @{label=”#”;expression={$_.Name}},
@{label=”Count”;expression={"{0:%##}" -f $($_.Count/$array.Count)}},
@{label=”Histogram”;expression={“▄” * $_.Count}} -autosize
I consider this a validation, but lets try one another example, this time looking at leading digits on the workingset of the processes on my desktop:
$array=@()
foreach($a in (Get-Process))
{
$array+= $a.WorkingSet.toString()[0]
}
$array `
| Group-Object -NoElement `
| Sort-Object count -Descending `
| Format-Table @{label=”#”;expression={$_.Name}},
@{label=”Count”;expression={"{0:%##}" -f $($_.Count/$array.Count)}},
@{label=”Histogram”;expression={“▄” * $_.Count}} -autosize
Again, this seems to hold true. Now that I have examples of Benford's law, I feel compelled to try and understand it. Wish me luck!