Calculating cumulative sums or running totals is a common task in data analysis, especially when dealing with time series data or analyzing sequential patterns. In SQL, this can be achieved using window functions, providing a powerful way to perform complex aggregations. Here is how to calculate cumulative sums, percentages, and group-wise cumulative sums in SQL.
Create a sample table
Let’s start by creating a small example table from a BigQuery public dataset, namely an extract of Google Analytics data from July 2017.
CREATE OR REPLACE TABLE `training.sql_cumsum` AS (
SELECT
PARSE_DATE("%Y%m%d", date) AS day,
EXTRACT(WEEK FROM PARSE_DATE("%Y%m%d", date)) AS week,
SUM(totals.visits) AS sessions
FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*`
GROUP BY 1,2
ORDER BY 1
LIMIT 10
);
SELECT * FROM `training.sql_cumsum`;
day | week | sessions | |
0 | 2017-07-01 | 26 | 2048 |
1 | 2017-07-02 | 27 | 1895 |
2 | 2017-07-03 | 27 | 2046 |
3 | 2017-07-04 | 27 | 1938 |
4 | 2017-07-05 | 27 | 2885 |
5 | 2017-07-06 | 27 | 2658 |
6 | 2017-07-07 | 27 | 2450 |
7 | 2017-07-08 | 27 | 1859 |
8 | 2017-07-09 | 28 | 1921 |
9 | 2017-07-10 | 28 | 2769 |
Running total
A running total, or cumulative sum, is the summation of a sequence of numbers that is updated each time a new number is added to the sequence. It's a common operation in time series analysis to understand trends over time.
To compute the running sum in SQL, use the window clause SUM(metric) OVER (ORDER BY dimension)
:
SELECT
day,
sessions,
-- Running total of daily sessions
SUM(sessions) OVER (ORDER BY day) AS sessions_cumtd
FROM `training.sql_cumsum`
ORDER BY day
LIMIT 10
day | sessions | sessions_cumtd | |
0 | 2017-07-01 | 2048 | 2048 |
1 | 2017-07-02 | 1895 | 3943 |
2 | 2017-07-03 | 2046 | 5989 |
3 | 2017-07-04 | 1938 | 7927 |
4 | 2017-07-05 | 2885 | 10812 |
5 | 2017-07-06 | 2658 | 13470 |
6 | 2017-07-07 | 2450 | 15920 |
7 | 2017-07-08 | 1859 | 17779 |
8 | 2017-07-09 | 1921 | 19700 |
9 | 2017-07-10 | 2769 | 22469 |
This query will return the daily sessions along with the cumulative sum, ordered by the date.
Running percentage of total
Sometimes, it's useful to understand the cumulative percentage of a total, especially when analyzing proportions or contributions of parts to a whole. This can be achieved by dividing the cumulative sum by the total with SUM(metric) OVER()
:
SELECT
day,
sessions,
-- Cumulated percentage of daily sessions
SUM(sessions) OVER (ORDER BY day) / SUM(sessions) OVER() AS sessions_pct_cumtd,
FROM `training.sql_cumsum`
ORDER BY day
day | sessions | sessions_pct_cumtd | |
0 | 2017-07-01 | 2048 | 0.091148 |
1 | 2017-07-02 | 1895 | 0.175486 |
2 | 2017-07-03 | 2046 | 0.266545 |
3 | 2017-07-04 | 1938 | 0.352797 |
4 | 2017-07-05 | 2885 | 0.481196 |
5 | 2017-07-06 | 2658 | 0.599493 |
6 | 2017-07-07 | 2450 | 0.708532 |
7 | 2017-07-08 | 1859 | 0.791268 |
8 | 2017-07-09 | 1921 | 0.876764 |
9 | 2017-07-10 | 2769 | 1.000000 |
Cumulated sum, by group
In many scenarios, you may want to calculate the cumulative sum within specific groups or categories. This can be done using the PARTITION BY
clause, allowing you to segment the data and calculate the cumulative sum within each segment.
SELECT
day,
sessions,
week,
-- Cumulated sum of daily sessions
SUM(sessions) OVER (ORDER BY day) AS sessions_cumtd,
-- Cumulated sum of sessions, partitioned by week
SUM(sessions) OVER (PARTITION BY week ORDER BY day) AS sessions_cumtd_week
FROM `training.sql_cumsum`
ORDER BY day
day | sessions | week | sessions_cumtd | sessions_cumtd_week | |
0 | 2017-07-01 | 2048 | 26 | 2048 | 2048 |
1 | 2017-07-02 | 1895 | 27 | 3943 | 1895 |
2 | 2017-07-03 | 2046 | 27 | 5989 | 3941 |
3 | 2017-07-04 | 1938 | 27 | 7927 | 5879 |
4 | 2017-07-05 | 2885 | 27 | 10812 | 8764 |
5 | 2017-07-06 | 2658 | 27 | 13470 | 11422 |
6 | 2017-07-07 | 2450 | 27 | 15920 | 13872 |
7 | 2017-07-08 | 1859 | 27 | 17779 | 15731 |
8 | 2017-07-09 | 1921 | 28 | 19700 | 1921 |
9 | 2017-07-10 | 2769 | 28 | 22469 | 4690 |
Calculating cumulative sums in SQL is a powerful technique that allows you to analyze trends, patterns, and sequences in your data. By leveraging window functions and the PARTITION BY
clause, you can create complex aggregations tailored to your specific needs.