Select all columns except some
SELECT * EXCEPT()
: this syntax allows you to select all columns from a table except for specified one(s). This is very handy when you have a ton of columns in your source table, but just want to exclude some of them from your query results.
Example:
SELECT * EXCEPT(email, name)
FROM employees;
Replace values in a selected column
SELECT * REPLACE
syntax allows you to select all columns from a table while replacing the value of one or more columns with a new expression or value, without listing all the columns explicitly.
For example, imagine you have a table called products
with the following columns: product_id
, product_name
, price
, and currency
.
If you want to select all columns but convert the price
from USD to EUR using an exchange rate (let's say 0.85), you would use the following query:
SELECT * REPLACE(price * 0.85 AS price)
FROM products;
Generate arrays of dates and timestamps
GENERATE_DATE_ARRAY()
: this function generates an array of dates between a start_date
and end_date
.
For example:
SELECT GENERATE_DATE_ARRAY('2023-01-01', '2023-01-15') AS date;
You can even specify a custom interval between two dates, e.g.:
SELECT GENERATE_DATE_ARRAY('2023-01-01', '2023-01-15', INTERVAL 1 WEEK);
A similar function to generate more precise timestamps is GENERATE_TIMESTAMP_ARRAY()
Use time ranges with window functions
Finally a SQL dialect where you can easily aggregate on time windows! In BigQuery, you can use the RANGE
window frame in conjunction with window functions to compute metrics based on a specific time range, rather than a number of rows, which is all you can do on PostgreSQL or Redshift for example.
To do this, you'll need to use a time-based window frame with the RANGE BETWEEN
clause and express the range in time units such as seconds, minutes, or hours.
Here's an example. Imagine you have a table called website_visits
with the following columns: visit_id
, user_id
, visit_timestamp
, and page_views
.
If you want to calculate the sum of page_views
for each user within a 1-hour time range before each visit, you would use the following query:
SELECT
visit_id,
user_id,
visit_timestamp,
page_views,
SUM(page_views) OVER (
PARTITION BY user_id
ORDER BY UNIX_SECONDS(visit_timestamp)
RANGE BETWEEN 3600 PRECEDING AND CURRENT ROW
) AS sum_page_views_in_last_hour
FROM website_visits;
Count values only if…
COUNTIF()
works like the Excel and Google Sheet’s identical formula: it only counts values that satisfy the specified conditions.
Example:
SELECT COUNTIF(page_views >= 5) FROM website_visits;
“If” only it were always that simple
Why don’t all SQL dialects have a simple IF()
function, instead of forcing us to use a convoluted CASE WHEN THEN ELSE END
syntax for basic cases ?
It works just as you would expect:
SELECT
A,
B,
IF(A < B, 'true', 'false') AS result
FROM numbers;
Safe cast from a data type to another
Isn’t it annoying when you get an error because just one occurence of your 50M rows dataset is an improperly formatted string, that cannot be cast to a number? To avoid errors that breaks your query, use the BigQuery SAFE_CAST()
function that tries to convert to a data type, but simply returns NULL when the cast is unsuccessful.
Example:
SELECT SUM(SAFE_CAST(value AS FLOAT64)) as sum_values
FROM mixed_data;