↔️

Calculate confidence intervals in Python

Category
Statistics
Published on
May 18, 2020

Introduction

Confidence intervals (CI) are a way of estimating a population parameter (e.g., mean, variance) from a sample of data. They provide a range of values within which the true population parameter is likely to fall, with a certain level of confidence. For example, a 95% confidence interval means that we can be 95% confident that the true population parameter lies within the specified range.

The confidence interval formula is:

where:

  • is the sample mean, or sample for probabilities
  • t is the t-value from the t-distribution corresponding to the desired confidence level (e.g., 1.96 for a 95% confidence interval)
  • s is the sample standard deviation, calculated as:
    • for continuous metrics:
    • for probabilities:
  • n is the sample size

Now let’s see how we can simply implement this in Python.

Generate random data

We begin by generating synthetic data, drawing a random sample of size 100 from a normal distribution with mean 40 and standard deviation 10.

value
count
100.00
mean
40.26
std
9.15

Calculate confidence interval

Since the formula is straightforward, we can easily compute the confidence interval without any additional library:

But more conveniently, we can compute it as a one-liner with the scipy package:

The intervals are almost identical, and the 2nd decimal difference is explained by the fact that we have approximated the t-value to 1.96 in the first “manual” method.

Plot distribution and confidence interval

Finally, let’s plot a histogram of the the sample distribution with the population mean, sample mean, and confidence interval of the sample mean.

image

The plot above shows the sample distribution with a population mean of 40, as well as the sample mean 40.26 and a 95% confidence interval of [38.46, 42.05].