Calculate cumulative sum in SQL

Calculating cumulative sums or running totals is a common task in data analysis, especially when dealing with time series data or analyzing sequential patterns. In SQL, this can be achieved using window functions, providing a powerful way to perform complex aggregations. Here is how to calculate cumulative sums, percentages, and group-wise cumulative sums in SQL.

Create a sample table

Let’s start by creating a small example table from a BigQuery public dataset, namely an extract of Google Analytics data from July 2017.

CREATE OR REPLACE TABLE `training.sql_cumsum`  AS (
    SELECT 
        PARSE_DATE("%Y%m%d", date) AS day,
        EXTRACT(WEEK FROM PARSE_DATE("%Y%m%d", date)) AS week,
        SUM(totals.visits) AS sessions
    FROM `bigquery-public-data.google_analytics_sample.ga_sessions_201707*` 
    GROUP BY 1,2
    ORDER BY 1
    LIMIT 10
);

SELECT * FROM `training.sql_cumsum`;
day
week
sessions
0
2017-07-01
26
2048
1
2017-07-02
27
1895
2
2017-07-03
27
2046
3
2017-07-04
27
1938
4
2017-07-05
27
2885
5
2017-07-06
27
2658
6
2017-07-07
27
2450
7
2017-07-08
27
1859
8
2017-07-09
28
1921
9
2017-07-10
28
2769

Running total

A running total, or cumulative sum, is the summation of a sequence of numbers that is updated each time a new number is added to the sequence. It's a common operation in time series analysis to understand trends over time.

To compute the running sum in SQL, use the window clause SUM(metric) OVER (ORDER BY dimension):

SELECT 
    day,
    sessions,
    -- Running total of daily sessions
    SUM(sessions) OVER (ORDER BY day) AS sessions_cumtd
FROM `training.sql_cumsum`
ORDER BY day
LIMIT 10
day
sessions
sessions_cumtd
0
2017-07-01
2048
2048
1
2017-07-02
1895
3943
2
2017-07-03
2046
5989
3
2017-07-04
1938
7927
4
2017-07-05
2885
10812
5
2017-07-06
2658
13470
6
2017-07-07
2450
15920
7
2017-07-08
1859
17779
8
2017-07-09
1921
19700
9
2017-07-10
2769
22469

This query will return the daily sessions along with the cumulative sum, ordered by the date.

Running percentage of total

Sometimes, it's useful to understand the cumulative percentage of a total, especially when analyzing proportions or contributions of parts to a whole. This can be achieved by dividing the cumulative sum by the total with SUM(metric) OVER():

SELECT 
    day,
    sessions,
    -- Cumulated percentage of daily sessions
    SUM(sessions) OVER (ORDER BY day) / SUM(sessions) OVER() AS sessions_pct_cumtd, 
FROM `training.sql_cumsum`
ORDER BY day
day
sessions
sessions_pct_cumtd
0
2017-07-01
2048
0.091148
1
2017-07-02
1895
0.175486
2
2017-07-03
2046
0.266545
3
2017-07-04
1938
0.352797
4
2017-07-05
2885
0.481196
5
2017-07-06
2658
0.599493
6
2017-07-07
2450
0.708532
7
2017-07-08
1859
0.791268
8
2017-07-09
1921
0.876764
9
2017-07-10
2769
1.000000

Cumulated sum, by group

In many scenarios, you may want to calculate the cumulative sum within specific groups or categories. This can be done using the PARTITION BY clause, allowing you to segment the data and calculate the cumulative sum within each segment.

SELECT 
    day,
    sessions,
    week,
    -- Cumulated sum of daily sessions
    SUM(sessions) OVER (ORDER BY day) AS sessions_cumtd,
    -- Cumulated sum of sessions, partitioned by week
    SUM(sessions) OVER (PARTITION BY week ORDER BY day) AS sessions_cumtd_week
FROM `training.sql_cumsum`
ORDER BY day
day
sessions
week
sessions_cumtd
sessions_cumtd_week
0
2017-07-01
2048
26
2048
2048
1
2017-07-02
1895
27
3943
1895
2
2017-07-03
2046
27
5989
3941
3
2017-07-04
1938
27
7927
5879
4
2017-07-05
2885
27
10812
8764
5
2017-07-06
2658
27
13470
11422
6
2017-07-07
2450
27
15920
13872
7
2017-07-08
1859
27
17779
15731
8
2017-07-09
1921
28
19700
1921
9
2017-07-10
2769
28
22469
4690

Calculating cumulative sums in SQL is a powerful technique that allows you to analyze trends, patterns, and sequences in your data. By leveraging window functions and the PARTITION BY clause, you can create complex aggregations tailored to your specific needs.