Select all columns except some
SELECT * EXCEPT(): this syntax allows you to select all columns from a table except for specified one(s). This is very handy when you have a ton of columns in your source table, but just want to exclude some of them from your query results.
SELECT * EXCEPT(email, name) FROM employees;
Replace values in a selected column
SELECT * REPLACE syntax allows you to select all columns from a table while replacing the value of one or more columns with a new expression or value, without listing all the columns explicitly.
For example, imagine you have a table called
products with the following columns:
If you want to select all columns but convert the
price from USD to EUR using an exchange rate (let's say 0.85), you would use the following query:
SELECT * REPLACE(price * 0.85 AS price) FROM products;
Generate arrays of dates and timestamps
GENERATE_DATE_ARRAY(): this function generates an array of dates between a
SELECT GENERATE_DATE_ARRAY('2023-01-01', '2023-01-15') AS date;
You can even specify a custom interval between two dates, e.g.:
SELECT GENERATE_DATE_ARRAY('2023-01-01', '2023-01-15', INTERVAL 1 WEEK);
A similar function to generate more precise timestamps is
Use time ranges with window functions
Finally a SQL dialect where you can easily aggregate on time windows! In BigQuery, you can use the
RANGE window frame in conjunction with window functions to compute metrics based on a specific time range, rather than a number of rows, which is all you can do on PostgreSQL or Redshift for example.
To do this, you'll need to use a time-based window frame with the
RANGE BETWEEN clause and express the range in time units such as seconds, minutes, or hours.
Here's an example. Imagine you have a table called
website_visits with the following columns:
If you want to calculate the sum of
page_views for each user within a 1-hour time range before each visit, you would use the following query:
SELECT visit_id, user_id, visit_timestamp, page_views, SUM(page_views) OVER ( PARTITION BY user_id ORDER BY UNIX_SECONDS(visit_timestamp) RANGE BETWEEN 3600 PRECEDING AND CURRENT ROW ) AS sum_page_views_in_last_hour FROM website_visits;
Count values only if…
COUNTIF() works like the Excel and Google Sheet’s identical formula: it only counts values that satisfy the specified conditions.
SELECT COUNTIF(page_views >= 5) FROM website_visits;
“If” only it were always that simple
Why don’t all SQL dialects have a simple
IF() function, instead of forcing us to use a convoluted
CASE WHEN THEN ELSE END syntax for basic cases ?
It works just as you would expect:
SELECT A, B, IF(A < B, 'true', 'false') AS result FROM numbers;
Safe cast from a data type to another
Isn’t it annoying when you get an error because just one occurence of your 50M rows dataset is an improperly formatted string, that cannot be cast to a number? To avoid errors that breaks your query, use the BigQuery
SAFE_CAST() function that tries to convert to a data type, but simply returns NULL when the cast is unsuccessful.
SELECT SUM(SAFE_CAST(value AS FLOAT64)) as sum_values FROM mixed_data;