Monday, August 29, 2011

PowerShell and Benford's Law

Was reading through a statistics blog (R) the other day when I read a posting on Benford's law. The definition according to the blog is:

Benford's law, also called the first-digit law, states that in lists of numbers from many (but not all) real-life sources of data, the leading digit is distributed in a specific, non-uniform way. According to this law, the first digit is 1 about 30% of the time, and larger digits occur as the leading digit with lower and lower frequency, to the point where 9 as a first digit occurs less than 5% of the time."

The probabilities are distributed as demonstration here.

This seemed counter-intuitive and I wanted to validate it myself. Let's look at the leading digit of all the txt files in one of my directories. Enter PowerShell.....
# Explore Benford's Law 

foreach ($item in (Get-ChildItem -Path p:\ -Filter *.txt -Recurse))
$array+= $item.length.toString()[0]

$array `
| Group-Object -NoElement `
| Sort-Object count -Descending `
| Format-Table @{label=”#”;expression={$_.Name}},
@{label=”Count”;expression={"{0:%##}" -f $($_.Count/$array.Count)}},
@{label=”Histogram”;expression={“▄” * $_.Count}} -autosize

I consider this a validation, but lets try one another example, this time looking at leading digits on the workingset of the processes on my desktop:


foreach($a in (Get-Process))
$array+= $a.WorkingSet.toString()[0]

$array `
| Group-Object -NoElement `
| Sort-Object count -Descending `
| Format-Table @{label=”#”;expression={$_.Name}},
@{label=”Count”;expression={"{0:%##}" -f $($_.Count/$array.Count)}},
@{label=”Histogram”;expression={“▄” * $_.Count}} -autosize

Again, this seems to hold true. Now that I have examples of Benford's law, I feel compelled to try and understand it. Wish me luck!


jsnover said...

That's freaky.

Ashley McGlone said...

Too cool!

Lee Holmes said...

Interesting. Here's your data sorted by number (Benford's Law) as opposed to frequency. Still looks pretty reasonable.

PS E:\lee> gc C:\temp\stahler_1.txt | Convert-TextObject | sort P1 | Out-Graph P1 P2
1 ██████████████████████████████████████████████████████████████ 25
2 ███████████████████████████████████ 14
3 █████████████████████████████████████████████ 18
4 ██████████████████████████████ 12
5 ████████████████████ 8
6 █████████████████████████ 10
7 ███████████████ 6
8 ████████ 3
9 ███████████████ 6
PS E:\lee> gc C:\temp\stahler_2.txt | Convert-TextObject | sort P1 | Out-Graph P1 P2
1 ████████████████████████████████████████████████████████████████ 29
2 ████████████████████████████████████ 16
3 ████████████████████ 9
4 ██████████████████████ 10
5 ███████████████████████████ 12
6 ████████████████ 7
7 ████████████████████ 9
8 ███████ 3
9 ███████ 3