🪄

Use Jupyter magic commands for BigQuery

Category
Google Cloud Platform
Published on
May 8, 2020

Introduction

To avoid boilerplate code in Jupyter notebooks, it is possible to use magic commands with the BigQuery library.

This simplifies the code from this:

sql = """SELECT
    source_year AS year,
    COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
"""

(
    bqclient
    .query(sql)
    .result()
    .to_dataframe()
)

to this:

%%bigquery
SELECT
    source_year AS year,
    COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5

Load module

If you run a cloud-based JupyterLab notebook with AI Platform, you won’t need this step, as the BigQuery module is already installed and loaded.

Otherwise, open a Jupyter notebook. Make sure you have installed the google-cloud-bigquery Python library. Load the magic commands module with the following:

# Load magic commands from the library
%load_ext google.cloud.bigquery

Run query in a magic cell

Create a magic cell with %%bigquery on the first line (don’t add anything above, not even commented lines) and directly enter your query below:

%%bigquery
SELECT
    source_year AS year,
    COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
year
birth_count
0
2008
4255156
1
2007
4324008
2
2006
4273225
3
2005
4145619
4
2004
4118907

Magic command options

To save the results to a pandas DataFrame instead of displaying them directly, add a variable name after %%bigquery

%%bigquery df
SELECT
    source_year AS year,
    COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
# Print the resulting DataFrame
df
year
birth_count
0
2008
4255156
1
2007
4324008
2
2006
4273225
3
2005
4145619
4
2004
4118907

It is possible to set parameters with --params, and use them in the query with @param:

%%bigquery --params {"lim": 10}
# Use a parameter for query limit
SELECT
    source_year AS year,
    COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT @lim
year
birth_count
0
2008
4255156
1
2007
4324008
2
2006
4273225
3
2005
4145619
4
2004
4118907
5
2003
4096092
6
2002
4027376
7
2001
4031531
8
2000
4063823
9
1999
3963465