Probability and Surprising Events

Observations or occurrences which are surprising or unexpected are typically thought to give high degrees of confirmation to the theories from which they are derived. Paul Horwich puts it as follows:

"Particularly powerful support for a theory is conveyed by the verification of its relatively surprising predictions. In other words, a theory gets a lot of credit for predicting something quite unexpected, or for explaining a bizarre and anomalous phenomenon; and it derives relatively little support from the prediction of something that we expected to occur anyway" [1].

The rest of this blog is a reiteration of Horwich's probabilistic assessment of surprise. 

Are surprising events improbable?

It might be thought that for some event E to be surprising it is sufficient that it is highly improbable. This is not the full story, though, since there are improbable events happening all the time that do not surprise us. Imagine I flip a coin 100 times and get 100 tails in a row (call this event A). This would be jaw dropping. However, absent other assumptions, A is no more improbable than flipping a coin 100 times and getting a random combination of 50 heads and 50 tails (call this event B). This can be demonstrated by the following.

P(A) = P(tails)₁ x P(tails)₂...x P(tails)₁₀₀

= P(tails)^100

Since P(tails) = .5, P(tails)^100 = (.5)^100. 

∴ P(A) = (.5)^100

Since, P(heads) = P(tails), it follows that P(B) = (.5)^100, as well.

∴ P(A) = P(B)

As isolated combinations, A is no less improbable than B. Thus, low probability may be necessary for surprisingness, Horwich notes, but it is not sufficient. In addition to low probability, Horwich maintains that two other conditions must be satisfied:

(1) Given our assessment of the circumstances C (background assumptions) surrounding E's occurrence, P(E|C) ≈ 0.

and

(2) P(C|E) << P(C).

Back to the coin toss, if our background assumption C is that the coin is fair and (I would add, although Horwich does not mention this) that we are interested in the total probability, then it is very improbable that we will get all tails on a sequence of 100 tosses. It is much more probable that 100 tosses of a fair coin will yield a 50/50 outcome of heads and tails. What about my proof above that P(A) = P(B)? This proof holds true only when one considers the probability of a single combination of 100 tosses, isolated from other possible combinations. When the total probability (i.e., the total possible outcomes of flipping a fair coin 100 times) is considered, it becomes immensely improbable that one would get 100 tails in a row [2].

Thus, P(A|C) ≈ 0. Also, P(C|A) << P(C), since it is less probable that the coin is fair given the outcomes described by A.

The evidential value of surprising predictions

As noted above, is is generally thought that the discovery of a surprising event E highly confirms the hypothesis from which it is deduced or to which it is probabilistically related (assuming a high probability). That is, P(H|E) > P(H), where H is the hypotheses from which E is derived. In support of this, Baye's theorem gives us the following:

P(H|E)/P(H) = P(E|H)/P(E)

This can be used as a measure of the confirmation that E gives to H. Notice that if P(E|H) = 1 (due to the fact that E is predicted by H)  [3], we have

P(H|E)/P(H) = 1/P(E)

The lower P(E) is, the greater the confirmation E gives to H. In sum, Horwich writes the following: 

"[I]n order to be surprising, it is not sufficient that E's probability be small. In addition, we must believe that there obtain circumstances C, by virtue of which P(E) is so low, whose probability is substantially diminished by the discovery of E, that is, P(C|E) << P(C)" [3].
________________________________________________________________________
Footnotes:

[1] Horwich, Paul. Probability and Evidence. Cambridge: Cambridge University Press, 1982. Print.
[2] With a sample size of 100 flips, we can be 95% sure that the result will lie somewhere around 50/50, with a margin of error of ± 10. Thus, we can only be 5% certain that it will land somewhere above/below 60/40 or 40/60.
[3] P(E|H) could give some high value, not necessarily 1.
[4] Horwich, Ibid. 

Comments

Popular Posts