BigQuery’s killer functions

Google Cloud Platform
Published on
April 5, 2023

Select all columns except some

SELECT * EXCEPT(): this syntax allows you to select all columns from a table except for specified one(s). This is very handy when you have a ton of columns in your source table, but just want to exclude some of them from your query results.


SELECT * EXCEPT(email, name)
FROM employees;

Replace values in a selected column

SELECT * REPLACE syntax allows you to select all columns from a table while replacing the value of one or more columns with a new expression or value, without listing all the columns explicitly.

For example, imagine you have a table called products with the following columns: product_id, product_name, price, and currency.

If you want to select all columns but convert the price from USD to EUR using an exchange rate (let's say 0.85), you would use the following query:

SELECT * REPLACE(price * 0.85 AS price)
FROM products;

Generate arrays of dates and timestamps

GENERATE_DATE_ARRAY(): this function generates an array of dates between a start_date and end_date.

For example:

SELECT GENERATE_DATE_ARRAY('2023-01-01', '2023-01-15') AS date;

You can even specify a custom interval between two dates, e.g.:

SELECT GENERATE_DATE_ARRAY('2023-01-01', '2023-01-15', INTERVAL 1 WEEK);

A similar function to generate more precise timestamps is GENERATE_TIMESTAMP_ARRAY()

Use time ranges with window functions

Finally a SQL dialect where you can easily aggregate on time windows! In BigQuery, you can use the RANGE window frame in conjunction with window functions to compute metrics based on a specific time range, rather than a number of rows, which is all you can do on PostgreSQL or Redshift for example.

To do this, you'll need to use a time-based window frame with the RANGE BETWEEN clause and express the range in time units such as seconds, minutes, or hours.

Here's an example. Imagine you have a table called website_visits with the following columns: visit_id, user_id, visit_timestamp, and page_views.

If you want to calculate the sum of page_views for each user within a 1-hour time range before each visit, you would use the following query:

    SUM(page_views) OVER (
				PARTITION BY user_id
				ORDER BY UNIX_SECONDS(visit_timestamp)
		) AS sum_page_views_in_last_hour
FROM website_visits;

Count values only if…

COUNTIF() works like the Excel and Google Sheet’s identical formula: it only counts values that satisfy the specified conditions.


SELECT COUNTIF(page_views >= 5) FROM website_visits;

“If” only it were always that simple

Why don’t all SQL dialects have a simple IF() function, instead of forcing us to use a convoluted CASE WHEN THEN ELSE END syntax for basic cases ?

It works just as you would expect:

  IF(A < B, 'true', 'false') AS result
FROM numbers;

Safe cast from a data type to another

Isn’t it annoying when you get an error because just one occurence of your 50M rows dataset is an improperly formatted string, that cannot be cast to a number? To avoid errors that breaks your query, use the BigQuery SAFE_CAST() function that tries to convert to a data type, but simply returns NULL when the cast is unsuccessful.


SELECT SUM(SAFE_CAST(value AS FLOAT64)) as sum_values
FROM mixed_data;