To get the percentiles of a numeric column in Redshift, use PERCENTILE_CONT()
for continuous values and PERCENTILE_DISC()
for discrete values, along with a window clause WITHIN GROUP
. Grouping by a dimension requires to add a window clause OVER (PARTITION BY)
.
Here is an example for getting the 90th percentile of column ‘price’ grouped by ‘country’:
WITH country_revenue AS (
SELECT
country,
revenue,
PERCENTILE_DISC(0.9) WITHIN GROUP (ORDER BY revenue)
OVER (PARTITION BY country) AS p90_revenue
FROM sales
)
SELECT DISTINCT
dimension,
p90_revenue
FROM country_revenue
ORDER BY dimension
Note that to get the median, you can either use MEDIAN()
or PERCENTILE_DISC(0.5)