The odds of a disk failing in any given month are roughly one in 36. The odds of two different drives failing in the same month are roughly one in 36 squared, or 1 in about 1,300. The odds of three drives failing in the same month is 36 cubed or 1 in 46,656. The odds of seven different drives failing in the same month is 37 to the 7th power = 1 in 78,664,164,096.
Of course this is very simplified because disk failure modes are more at end-of-service-life rather than linearly spread over median life. So what if I am off by a factor of 4X? This crude calculation gets us into the same astronomical ballpark. You could insure against this event happening by buying lottery tickets. --theBuckWheat Comment at Doug Ross @ Journal: GEORGE WILL ON MIRACULOUS IRS COINCIDENCE OF CRASHED HARD DRIVES: "Religions Have Been Founded on Less"
Let the record reflect that I am not a mathematician and certainly not a statistician. But ISTM that the data center operator's math and methodology are incorrect. I think he has made a statistical error in treating the HDs failures as related events when they are independent events.
If the expected life span of an HD is 36 months, and for simplicity ignoring that failures occur nearer the end than the beginning, then each HD has a 1/36 chance of failing in any given month - regardless of what the HD one office away does. So the 1/36 odds per hard drive never change.
The incredulity therefore is not over failing HDs per se, but that the exact same people for whom the committee wants to read their emails are the ones whose HDs failed, and at the same time.
As Yogi Berra said in a different context, "It's too coincidental to be a coincidence."
To calculate the odds of that we have to place those HDs into the universe of possibles, which would be the total number of like workstations in the entire IRS.
Googling tells me that the IRS has 89,500 employees. Not all have email, of course, but let's be very generous and say only 50,000 do. That's 50K HDs, each with a 1/36 chance of failure in any given month.
That means that in any given month, 1/36 of the 50K drives will fail, or 1,388 drives each month.
But - and feel free to check my math - that means that the chances of specifically Lerner's HD failing in that particular month is 1/1,388. And the same odds for each of the other six drives.
That's where you start multiplying the odds together. Excel tells me that 1/1388 is 0.00072, or 0.072 percent chance. Now we calculate the odds of all seven specific HDs failing, which is .072 pc X .072 ... seven times.
And that makes the final odds 0.0000000000000000000000000000000000000000000000000000000000000001 percent.
Expressed in notation, it is 1.01E-66. (I'm letting Excel calculate all this.)
To what may we compare this? Well, how about the number of stars in the entire universe? According to space.com,
Kornreich used a very rough estimate of 10 trillion galaxies in the universe. Multiplying that by the Milky Way's estimated 100 billion stars results in a large number indeed: 100 octillion stars, or 100,000,000,000,000,000,000,000,000,000 stars, or a "1" with 29 zeros after it. Kornreich emphasized that number is likely a gross underestimation, as more detailed looks at the universe will show even more galaxies.I have never seen an estimate of 10 trillion galaxies before, the top number I have ever seen in "only" 500 billion. But let's leave it at 10 T:
Number of stars in 10 trillion galaxies: 1.00E+29
Odds of those seven particular IRS HDs failing the same month:
Please note that according to Universe Today, there are about 1.0E+80 atoms in the entire universe.
So the odds against those seven identified HDs failing at the same time is sensibly comparable to the inverse of the number of atoms in the entire universe.
Again, I would welcome math checking!
Update: I got an email from a long-time reader who signed his name but asked me to protect it. He has someone with three and a half decades of experience in this sort of thing. Here is is, unedited:
While I don't believe for a second the IRS's excuses, these putative spontaneous disk drive failures wouldn't necessarily be independent events. The phrase "common mode failure" strikes fear in the hearts of engineers, and it's been observed many times that a batch of disks fail at about the same time. Perhaps their shipping container got banged a little to much in transit from the Far East (best guess we had at one point for a common problem with Seagate drives). Or a common part or design flaw in the same or more than one lot of disks. IF these computers were all deployed at the same time from the same source it *could* happen with not quite so astronomical odds.... Also, a few years ago a couple of studies of massive installations of hard drives was done, one by Google (takeaways were that disk drive engineers seem to have a very good handle on heat, and 1/2 of those that fail will do so without any warning), and one of a number of huge supercomputers, which had thousands of drives. The relevant detail from that study is that disk drives don't follow the bathtub curve of failure. They almost always work out of the box, and start wearing down sometime in their 2nd year of service. I focus more on the timing, Lerner's drive supposedly failed 10 days after the letter from the Congressman got things rolling, they canceled their backup service 2 months after the letter, the other convenient failures, with the clearest sign of ill will being the IRS's discarding of her drive, instead of sending it to a recovery facility. You'resimply not allowed to do the latter once you're on notice, unless, of course, you're above the law. Which this crowd currently is. Anyway, the above nits aside, I've found your blog to be very worthwhile over the years, I'm glad you've back from your pause, and am looking forward to your essay on why the Republicans will never gain the presidency again.