↔️

Category
Statistics
Published on
May 18, 2020

# Introduction

Confidence intervals (CI) are a way of estimating a population parameter (e.g., mean, variance) from a sample of data. They provide a range of values within which the true population parameter is likely to fall, with a certain level of confidence. For example, a 95% confidence interval means that we can be 95% confident that the true population parameter lies within the specified range.

The confidence interval formula is:

where:

• is the sample mean, or sample
for probabilities
• t is the t-value from the t-distribution corresponding to the desired confidence level (e.g., 1.96 for a 95% confidence interval)
• s is the sample standard deviation, calculated as:
• for continuous metrics:
• for probabilities:
• n is the sample size

Now let’s see how we can simply implement this in Python.

# Generate random data

We begin by generating synthetic data, drawing a random sample of size 100 from a normal distribution with mean 40 and standard deviation 10.

 value count 100.00 mean 40.26 std 9.15

# Calculate confidence interval

Since the formula is straightforward, we can easily compute the confidence interval without any additional library:

But more conveniently, we can compute it as a one-liner with the scipy package:

The intervals are almost identical, and the 2nd decimal difference is explained by the fact that we have approximated the t-value to 1.96 in the first “manual” method.

# Plot distribution and confidence interval

Finally, let’s plot a histogram of the the sample distribution with the population mean, sample mean, and confidence interval of the sample mean. The plot above shows the sample distribution with a population mean of 40, as well as the sample mean 40.26 and a 95% confidence interval of [38.46, 42.05].