In the great 1971 movie Willy Wonka and the Chocolate Factory, the reclusive owner of the Wonka Chocolate Factory decides to place golden tickets in five of his famous chocolate bars, and allow the winners of each to visit his factory with a guest. Since restarting production after three years of silence, no one has come in or gone out of the factory. Needless to say, there is enormous interest in finding a golden ticket!
Through a series of news reports we get an understanding that all over the world, kids are desperately purchasing and opening Wonka bars in an attempt to win. But just what were the odds? Unfortunately young Charlie Bucket's teacher is not particularly good at percentages and doesn't offer much help:
I hope I can be at least a little more useful. While the movie only vaguely suggests how many bars were actually being opened, we are provided with two data points. First, the spoiled, bratty, unlikable Veruca Salt's factory-owning father states that he's had his workers open 760,000 Wonka bars just before one of them finds a golden ticket:
Meanwhile the polite, likable Charlie Bucket—who is very poor—has received one Wonka Bar for his birthday and another from his Grandpa Joe. Neither bar was a winner, but Charlie finds some money on the street to buy a third:
In the movie, you can't help but feel that Charlie's odds must have been much, much higher than the nasty Veruca Salt's (or any of the other winners). But is there statistical evidence of that?
In Minitab Statistical Software, I set up a basic 2x2 table like this:
Often when practitioners have a 2x2 table the Chi-Square test immediately comes to mind. but the Chi-Square test is not accurate when any of the cell counts or expected cell counts are small, which is clearly the case here. But we can use Fisher's exact test without such a restriction, which is available in the "Other Stats" subdialog of Stat > Tables > Cross Tabulation and Chi-Square. The output looks like this:
For the Chi-Square portion of the output, Minitab not only refuses to provide a p-value but gives two warnings and a note. The Fisher's exact test can be performed, however, and tests whether the likelihood of a winning tickets was the same for both Charlie and Veruca. The p-value of 0.0000079 confirms what we all knew—karma was working for Charlie and against Veruca!
For fun, let's ignore this evidence that the odds were not equal for each child. Let's pretend that the odds are the same, and a really unlikely thing happened anyway because that's what makes the movie great. Aside from our two data points, we have reports from two children in the classroom that they have opened 100 and 150 bars, respectively, and neither won. So we have two golden tickets among 3 + 760,000 + 100 + 150 = 760,253 Wonka bars. This would be a proportion of 3/760,253 = 0.00000395 or 0.0000395%. Think those odds are low? That represents an inflated estimate! That is because rather than randomly sampling many children, our sample includes two known winners. Selecting four children at random would almost certainly produce four non-winners and the estimate would be 0%.
There is one additional data point that doesn't really make logical sense, but let's use it to come up with a low-end estimate by accepting that it is likely not a real number. At one point, a news reporter indicates that five tickets are hidden among the "countless billions of Wonka bars." Were there actually "countless billions" of unopened Wonka bars in the world? Consider that the most popular chocolate bar in the world—the famous Hershey bar—has annual sales of about 250 million units. And that's per year! It is very, very unlikely that there were countless billions of unopened Wonka bars from that single factory at any one time. Further, that news report is about the contest being announced, so the Wonka factory had not yet delivered the bars with the golden tickets inside. Suffice to say, this is not an accurate number.
But let's suppose that even 1 billion Wonka bars were produced in the run that contained the golden tickets. Then the odds of a single bar containing one would be 5/1,000,000,000 = 0.000000005 or 0.0000005%.
Either way, the chances of finding one were incredibly low...confirming again what grandpa Joe told Charlie:
CHARLIE: "I've got the same chance as anybody else, haven't I?"
GRANDPA JOE: "You've got more, Charlie, because you want it more! Go on, open it!"