Introduction
To avoid boilerplate code in Jupyter notebooks, it is possible to use magic commands with the BigQuery library.
This simplifies the code from this:
sql = """SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
"""
(
bqclient
.query(sql)
.result()
.to_dataframe()
)
to this:
%%bigquery
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
Load module
If you run a cloud-based JupyterLab notebook with AI Platform, you won’t need this step, as the BigQuery module is already installed and loaded.
Otherwise, open a Jupyter notebook. Make sure you have installed the google-cloud-bigquery Python library. Load the magic commands module with the following:
# Load magic commands from the library
%load_ext google.cloud.bigquery
Run query in a magic cell
Create a magic cell with %%bigquery
on the first line (don’t add anything above, not even commented lines) and directly enter your query below:
%%bigquery
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
year | birth_count | |
0 | 2008 | 4255156 |
1 | 2007 | 4324008 |
2 | 2006 | 4273225 |
3 | 2005 | 4145619 |
4 | 2004 | 4118907 |
Magic command options
To save the results to a pandas DataFrame instead of displaying them directly, add a variable name after %%bigquery
%%bigquery df
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT 5
# Print the resulting DataFrame
df
year | birth_count | |
0 | 2008 | 4255156 |
1 | 2007 | 4324008 |
2 | 2006 | 4273225 |
3 | 2005 | 4145619 |
4 | 2004 | 4118907 |
It is possible to set parameters with --params
, and use them in the query with @param
:
%%bigquery --params {"lim": 10}
# Use a parameter for query limit
SELECT
source_year AS year,
COUNT(is_male) AS birth_count
FROM `bigquery-public-data.samples.natality`
GROUP BY year
ORDER BY year DESC
LIMIT @lim
year | birth_count | |
0 | 2008 | 4255156 |
1 | 2007 | 4324008 |
2 | 2006 | 4273225 |
3 | 2005 | 4145619 |
4 | 2004 | 4118907 |
5 | 2003 | 4096092 |
6 | 2002 | 4027376 |
7 | 2001 | 4031531 |
8 | 2000 | 4063823 |
9 | 1999 | 3963465 |