Notes on the Bonferroni threshold#

import numpy as np
np.set_printoptions(precision=4)  # print to 4 decimal places

The Bonferroni threshold is a family-wise error threshold. That is, it treats a set of tests as one family, and the threshold is designed to control the probability of detecting any positive tests in the family (set) of tests, if the null hypothesis is true.

Family-wise error#

The Bonferroni correction uses a result from probability theory to estimate the probability of finding any p value below a threshold \(\theta\), given a set (family) of \(n\) p values.

When we have found a threshold \(\theta\) that gives a probability \(\le \alpha\) that any p value will be \(\lt \theta\), then the threshold \(\theta\) can be said to control the family-wise error rate at level \(\alpha\).

Not the Bonferroni correction#

The inequality used for the Bonferroni is harder to explain than a simpler but related correction, called the Šidák correction.

We will start with that, and then move on to the Bonferroni correction.

The probability that all \(n\) tests are above p value threshold \(\theta\), assuming tests are independent:

\[ (1 - \theta)^n \]

Chance that one or more p values are \(\le \theta\):

\[ 1 - (1 - \theta)^n \]

We want a uncorrected p value threshold \(\theta\) such that the expression above equals some desired family-wise error (FWE) rate \(\alpha_{fwe}\). For example we might want a p value threshold \(\theta\) such that there is probability (\(\alpha_{fwe}\)) of 0.05 that there is one or more test with \(p \le \theta\) in a family of \(n\) tests, on the null hypothesis:

\[ \alpha_{fwe} = 1 - (1 - \theta)^n \]

Solve for \(\theta\):

\[ \theta = 1 - (1 - \alpha_{fwe})^{1 / n} \]

So, if we have 10 tests, and we want the threshold \(\theta\) to control \(\alpha_{fwe}\) at \(0.05\):

def sidak_thresh(alpha_fwe, n):
    return 1 - (1 - alpha_fwe)**(1./n)

print(sidak_thresh(0.05, 10))
0.005116196891823743

The Bonferroni correction#

\(\newcommand{\P}{\mathbb P}\) The Bonferroni correction uses a result from probability theory, called Boole’s inequality. The result is by George Boole, of Boolean fame. Boole’s inequality applies to the situation where we have a set of events \(A_1, A_2, A_3, \ldots \), each with some probability of occurring \({P}(A_1), {P}(A_2), {P}(A_3) \ldots \). The inequality states that the probability of one or more of these events occurring is no greater than the sum of the probabilities of the individual events:

\[ \P\biggl(\bigcup_{i} A_i\biggr) \le \sum_i {\mathbb P}(A_i). \]

You can read the \(\cup\) symbol here as “or” or “union”. \(\P\biggl(\bigcup_{i} A_i\biggr)\) is the probability of the union of all events, and therefore the probability of one or more event occurring.

Boole’s inequality is true because:

\[ \P(A \cup B) = P(A) + P(B) - P(A \cap B) \]

where you can read \(\cap\) as “and” or “intersection”. Because \(P(A \cap B) \ge 0\):

\[ \P(A \cup B) \le P(A) + P(B) \]

In our case we have \(n\) tests (the family of tests). Each test that we label as significant is an event. Therefore the sum of the probabilities of all possible events is \(n\theta\). \({\mathbb P}\biggl(\bigcup_{i} A_i\biggr)\) is our probability of family-wise error \(\alpha_{fwe}\). To get a threshold \(\theta\) that controls family-wise error at \(\alpha\), we need:

\[ \frac{\alpha_{fwe}}{n} \le \theta \]

For \(n=10\) tests and an \(\alpha_{fwe}\) of 0.05:

def bonferroni_thresh(alpha_fwe, n):
    return alpha_fwe / n
bonferroni_thresh(0.05, 10)
0.005

The Bonferroni correction does not assume the tests are independent.

As we have seen, Boole’s inequality relies on:

\[\begin{split} \P(A \cup B) = P(A) + P(B) - P(A \cap B) \implies \\ \P(A \cup B) \le P(A) + P(B) \end{split}\]

This means that the Bonferroni correction will be conservative (the threshold will be too low) when the tests are positively dependent (\(P(A \cap B) \gg 0\)).

The Bonferroni \(\theta_{Bonferroni} = \alpha_{fwe} \space / \space n\) is always smaller (more conservative) than the Šidák correction \(\theta_{Šidák}\) for \(n \ge 1\), but it is close:

n_tests = np.arange(1, 11)  # n = 1 through 10
# The exact threshold for independent p values
sidak_thresh(0.05, n_tests)
array([0.05  , 0.0253, 0.017 , 0.0127, 0.0102, 0.0085, 0.0073, 0.0064,
       0.0057, 0.0051])
# The Bonferroni threshold for the same alpha, n
bonferroni_thresh(0.05, n_tests)
array([0.05  , 0.025 , 0.0167, 0.0125, 0.01  , 0.0083, 0.0071, 0.0063,
       0.0056, 0.005 ])