👨‍🎓

Training: introduction to SQL

Category
Data Science
Published on
August 5, 2023
👉
I have delivered SQL beginners training to groups of students, in sessions of 2 hours. Here is the notebook that I used (on Google Colab), along with BigQuery public dataset of Google Analytics stats, and a list of countries populations. At each step, the question for students is the line starting with ### ✅, and a solution is provided just below.

Setup

Google Analytics data

### Check first rows

SELECT *
FROM analytics
LIMIT 5
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
0
2164638929322683039
1493626407
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Germany
1
9074761762186671912
1493675816
2017-05-01
1
1
0
0
0.0
Referral
(direct)
(none)
False
Belgium
2
5878935167128316072
1493661606
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
Germany
3
544195034374079989
1493707065
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Indonesia
4
3886456664395401293
1493652207
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
United Kingdom

Countries data

### Check first rows

SELECT *
FROM countries
LIMIT 5
country
population
0
China
1407794600
1
India
1376519831
2
United States
331611270
3
Indonesia
271350000
4
Pakistan
225200000

Tutorial

1. Select - From - Limit

### ✅ 1.1. Select all rows from table

SELECT *
FROM analytics
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
0
2164638929322683039
1493626407
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Germany
1
9074761762186671912
1493675816
2017-05-01
1
1
0
0
0.0
Referral
(direct)
(none)
False
Belgium
2
5878935167128316072
1493661606
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
Germany
3
544195034374079989
1493707065
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Indonesia
4
3886456664395401293
1493652207
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
United Kingdom
...
...
...
...
...
...
...
...
...
...
...
...
...
...
65366
35306732662012160
1496273148
2017-05-31
1
5
191
0
0.0
Organic Search
google
organic
False
New Zealand
65367
12257497755898671
1496237032
2017-05-31
1
5
167
0
0.0
Organic Search
google
organic
False
United States
65368
8586928431096348181
1496238116
2017-05-31
1
5
87
0
0.0
Organic Search
google
organic
False
Sri Lanka
65369
1686155039274733791
1496226311
2017-05-31
1
5
454
0
0.0
Organic Search
google
organic
False
Spain
65370
1656427356117709372
1496232410
2017-05-31
1
5
1440
0
0.0
Affiliates
Partners
affiliate
True
United Kingdom
### ✅ 1.2. Show first 5 rows

SELECT *
FROM analytics
LIMIT 5
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
0
2164638929322683039
1493626407
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Germany
1
9074761762186671912
1493675816
2017-05-01
1
1
0
0
0.0
Referral
(direct)
(none)
False
Belgium
2
5878935167128316072
1493661606
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
Germany
3
544195034374079989
1493707065
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Indonesia
4
3886456664395401293
1493652207
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
United Kingdom
### ✅ 1.3. Show only visitor_id, date, channel for first 10 rows

SELECT
    visitor_id,
    date,
    channel
FROM analytics
LIMIT 10
visitor_id
date
channel
0
2164638929322683039
2017-05-01
Organic Search
1
9074761762186671912
2017-05-01
Referral
2
5878935167128316072
2017-05-01
Organic Search
3
544195034374079989
2017-05-01
Organic Search
4
3886456664395401293
2017-05-01
Organic Search
5
3239258634987357945
2017-05-01
Organic Search
6
2753462331953041591
2017-05-01
Organic Search
7
7290710066638006462
2017-05-01
Referral
8
9270977386349035863
2017-05-01
Organic Search
9
455733085502866030
2017-05-01
Direct

2. Conditions

### ✅ 2.1. Select rows with more than 100 pageviews

SELECT *
FROM analytics
WHERE pageviews > 100
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
0
7498695963354635199
1493771413
2017-05-02
1
132
2107
0
0.00
Social
plus.google.com
referral
False
United States
1
485196789418021564
1495359379
2017-05-21
1
118
5287
1
199.18
Referral
(direct)
(none)
False
United States
2
9703644597836409304
1495568120
2017-05-23
1
136
7007
0
0.00
Direct
(direct)
(none)
True
India
3
567111192024013209
1495989701
2017-05-28
1
102
4022
0
0.00
Referral
(direct)
(none)
False
United States
4
3855473804472998317
1496085413
2017-05-29
1
118
4389
1
129.37
Referral
(direct)
(none)
False
United States
5
8637158932301658318
1496250783
2017-05-31
1
139
3595
15
25.00
Referral
(direct)
(none)
False
United States
### ✅ 2.2. Select 10 visits with exactly 2 transactions

SELECT *
FROM analytics
WHERE transactions = 2
LIMIT 10
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
0
965001845233995898
1493735547
2017-05-02
1
24
2464
2
67.18
Direct
(direct)
(none)
False
United States
1
8532593488057645523
1493742119
2017-05-02
1
31
2017
2
33.59
Direct
(direct)
(none)
False
United States
2
4750855305420445022
1493779878
2017-05-02
1
32
2499
2
17.58
Referral
(direct)
(none)
False
United States
3
7528605281471957603
1493741153
2017-05-02
1
40
1929
2
92.91
Referral
(direct)
(none)
False
United States
4
9503612993189278427
1493916464
2017-05-04
1
21
843
2
33.59
Direct
(direct)
(none)
False
United States
5
3704081881488905199
1494031524
2017-05-05
1
55
1168
2
1521.51
Direct
(direct)
(none)
False
Japan
6
5985368233376980363
1494272415
2017-05-08
1
29
926
2
55.80
Referral
(direct)
(none)
False
United States
7
3835534637736706073
1494284184
2017-05-08
1
31
663
2
43.98
Referral
(direct)
(none)
False
United States
8
7033370926590252579
1494370353
2017-05-09
1
54
1422
2
85.58
Referral
(direct)
(none)
False
United States
9
7867882247088163383
1494539436
2017-05-11
1
50
1553
2
62.60
Direct
(direct)
(none)
False
Sweden
### ✅ 2.3. Select visits with 3 to 5 transactions

SELECT *
FROM analytics
WHERE transactions BETWEEN 3 AND 5
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
0
2402527199731150932
1493739587
2017-05-02
1
75
2893
4
62.27
Direct
(direct)
(none)
False
United Kingdom
1
4921413806940784676
1494285229
2017-05-08
1
64
1232
4
52.07
Organic Search
google
organic
False
United States
2
4988517937139937145
1495039631
2017-05-17
1
26
557
3
18.99
Organic Search
google
organic
False
United States
3
9026840718082010040
1495199725
2017-05-19
1
51
2222
3
272.79
Direct
(direct)
(none)
False
United States

3. Functions and columns renaming

### ✅ 3.1. Show number of rows
### Hint: use COUNT()

SELECT COUNT(*)
FROM analytics
count
0
65371
### ✅ 3.2. Show sum of visits, transactions, and revenue
### Hint: use SUM()

SELECT
    SUM(visits),
    SUM(transactions),
    SUM(revenue)
FROM analytics
sum
sum
sum
0
65371.0
1160.0
121711.48
### ✅ 3.3. Show sum of visits, transactions, and revenue, and rename the columns properly
### Hint: use AS

SELECT
    SUM(visits) AS visits,
    SUM(transactions) AS transactions,
    SUM(revenue) AS revenue
FROM analytics
visits
transactions
revenue
0
65371.0
1160.0
121711.48
### ✅ 3.4. Show first and last date, and rename them

SELECT
    MIN(date) AS min_date,
    MAX(date) AS max_date
FROM analytics
min_date
max_date
0
2017-05-01
2017-05-31
### ✅ 3.5. Show conversion rate overall (transactions / visits)

SELECT
    SUM(transactions) / SUM(visits) AS conv_rate
FROM analytics
conv_rate
0
0.017745
### ✅ 3.6. Show conversion rate between 1st and 10 May

SELECT
    SUM(transactions) / SUM(visits) AS conv_rate
FROM analytics
WHERE date BETWEEN '2017-05-01' AND '2017-05-10'
conv_rate
0
0.021916
### ✅ 3.7. Show conversion rate on week-ends
### Hint: use DATE_PART('ISODOW', date) to get the weekday

SELECT
    SUM(transactions) / SUM(visits) AS conv_rate
FROM analytics
WHERE DATE_PART('ISODOW', date) > 5
conv_rate
0
0.011282
### ✅ 3.8. Show number of transactions, total revenue and avg basket
### for 'Organic Search' between 10/05 and 15/05

SELECT
    SUM(transactions) AS transactions,
    SUM(revenue) AS revenue,
    SUM(revenue) / SUM(transactions) AS avg_basket
FROM analytics
WHERE
    channel = 'Organic Search'
    AND date BETWEEN '2017-05-10' AND '2017-05-15'
transactions
revenue
avg_basket
0
66.0
5198.67
78.767727

4. Grouping and ordering

### ✅ 4.1. Show number of visits per day

SELECT
    date,
    SUM(visits)
FROM analytics
GROUP BY 1
date
sum
0
2017-05-31
2292.0
1
2017-05-30
2199.0
2
2017-05-26
2009.0
3
2017-05-18
2715.0
4
2017-05-14
1290.0
5
2017-05-19
2223.0
6
2017-05-29
1745.0
7
2017-05-08
2025.0
8
2017-05-28
1463.0
9
2017-05-25
2228.0
10
2017-05-20
1855.0
11
2017-05-17
3117.0
12
2017-05-24
2406.0
13
2017-05-12
1932.0
14
2017-05-22
2248.0
15
2017-05-11
2218.0
16
2017-05-13
1251.0
17
2017-05-23
2129.0
18
2017-05-21
1645.0
19
2017-05-01
2588.0
20
2017-05-10
2024.0
21
2017-05-16
3098.0
22
2017-05-07
1400.0
23
2017-05-27
1502.0
24
2017-05-04
2806.0
25
2017-05-05
1960.0
26
2017-05-03
2588.0
27
2017-05-06
1383.0
28
2017-05-02
2564.0
29
2017-05-15
2360.0
30
2017-05-09
2108.0
### ✅ 4.2. Show number of visits per day, ordered by day, for May 1 to 10

SELECT
    date,
    SUM(visits) AS visits
FROM analytics
WHERE date BETWEEN '2017-05-01' AND '2017-05-10'
GROUP BY 1
ORDER BY 1
date
visits
0
2017-05-01
2588.0
1
2017-05-02
2564.0
2
2017-05-03
2588.0
3
2017-05-04
2806.0
4
2017-05-05
1960.0
5
2017-05-06
1383.0
6
2017-05-07
1400.0
7
2017-05-08
2025.0
8
2017-05-09
2108.0
9
2017-05-10
2024.0
### ✅ 4.3. Show revenue per day, ordered by revenue descending, for top 5 days

SELECT
    date,
    SUM(revenue) AS revenue
FROM analytics
GROUP BY 1
ORDER BY 2 DESC
LIMIT 5
date
revenue
0
2017-05-31
10423.59
1
2017-05-09
7329.01
2
2017-05-01
7126.08
3
2017-05-05
7099.86
4
2017-05-12
6677.05
### ✅ 4.4. Show distribution of number of transactions
### i.e. how many visits with 1, 2, 3... etc. transactions

SELECT
    transactions,
    COUNT(*)
FROM analytics
GROUP BY 1
ORDER BY 1
transactions
count
0
0
64256
1
1
1089
2
2
21
3
3
2
4
4
2
5
15
1
### ✅ 4.5. Show visits and pageviews per visit by channel, ordered by pages per visit descending

SELECT
    channel,
    SUM(visits) AS visits,
    SUM(pageviews) / SUM(visits) AS pageviews_per_visit
FROM analytics
GROUP BY 1
ORDER BY 3 DESC
channel
visits
pageviews_per_visit
0
Referral
9251.0
5.514863
1
Paid Search
1887.0
5.246953
2
Display
427.0
4.224824
3
Direct
15599.0
3.998077
4
(Other)
5.0
3.600000
5
Organic Search
33199.0
3.535378
6
Affiliates
1317.0
2.658314
7
Social
3686.0
2.468258

5. Joins

### ✅ 5.1. Show first 5 rows for `country` table

SELECT *
FROM countries
LIMIT 5
country
population
0
China
1407794600
1
India
1376519831
2
United States
331611270
3
Indonesia
271350000
4
Pakistan
225200000
### ✅ 5.2. Join `analytics` table with `countries` table for 10 rows

SELECT *
FROM analytics
LEFT JOIN countries
ON analytics.country = countries.country
LIMIT 10
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
country
country
population
0
2164638929322683039
1493626407
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Germany
Germany
83190556
1
9074761762186671912
1493675816
2017-05-01
1
1
0
0
0.0
Referral
(direct)
(none)
False
Belgium
Belgium
11560220
2
5878935167128316072
1493661606
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
Germany
Germany
83190556
3
544195034374079989
1493707065
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Indonesia
Indonesia
271350000
4
3886456664395401293
1493652207
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
United Kingdom
United Kingdom
66796807
5
3239258634987357945
1493669447
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
Canada
Canada
38443464
6
2753462331953041591
1493657802
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
Sri Lanka
Sri Lanka
21919000
7
7290710066638006462
1493635945
2017-05-01
1
1
0
0
0.0
Referral
docs.google.com
referral
False
India
India
1376519831
8
9270977386349035863
1493633549
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
India
India
1376519831
9
455733085502866030
1493643390
2017-05-01
1
1
0
0
0.0
Direct
(direct)
(none)
True
United States
United States
331611270
### ✅ 5.3. Join `analytics` with `countries` using `USING`

SELECT *
FROM analytics
LEFT JOIN countries
USING (country)
LIMIT 10
country
visitor_id
visit_start_time
date
visits
pageviews
time_on_site
transactions
revenue
channel
source
medium
is_mobile
population
0
Germany
2164638929322683039
1493626407
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
83190556
1
Belgium
9074761762186671912
1493675816
2017-05-01
1
1
0
0
0.0
Referral
(direct)
(none)
False
11560220
2
Germany
5878935167128316072
1493661606
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
83190556
3
Indonesia
544195034374079989
1493707065
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
271350000
4
United Kingdom
3886456664395401293
1493652207
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
66796807
5
Canada
3239258634987357945
1493669447
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
38443464
6
Sri Lanka
2753462331953041591
1493657802
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
True
21919000
7
India
7290710066638006462
1493635945
2017-05-01
1
1
0
0
0.0
Referral
docs.google.com
referral
False
1376519831
8
India
9270977386349035863
1493633549
2017-05-01
1
1
0
0
0.0
Organic Search
google
organic
False
1376519831
9
United States
455733085502866030
1493643390
2017-05-01
1
1
0
0
0.0
Direct
(direct)
(none)
True
331611270

6. Window functions

### ✅ 6.1. Visits by day vs visits on previous day

SELECT
    date,
    visits,
    LAG(visits) OVER (ORDER BY date) AS prev_visits
FROM (
    SELECT
        date,
        SUM(visits) AS visits
    FROM analytics
    GROUP BY 1
) AS sub1
ORDER BY date
date
visits
prev_visits
0
2017-05-01
2588.0
NaN
1
2017-05-02
2564.0
2588.0
2
2017-05-03
2588.0
2564.0
3
2017-05-04
2806.0
2588.0
4
2017-05-05
1960.0
2806.0
5
2017-05-06
1383.0
1960.0
6
2017-05-07
1400.0
1383.0
7
2017-05-08
2025.0
1400.0
8
2017-05-09
2108.0
2025.0
9
2017-05-10
2024.0
2108.0
10
2017-05-11
2218.0
2024.0
11
2017-05-12
1932.0
2218.0
12
2017-05-13
1251.0
1932.0
13
2017-05-14
1290.0
1251.0
14
2017-05-15
2360.0
1290.0
15
2017-05-16
3098.0
2360.0
16
2017-05-17
3117.0
3098.0
17
2017-05-18
2715.0
3117.0
18
2017-05-19
2223.0
2715.0
19
2017-05-20
1855.0
2223.0
20
2017-05-21
1645.0
1855.0
21
2017-05-22
2248.0
1645.0
22
2017-05-23
2129.0
2248.0
23
2017-05-24
2406.0
2129.0
24
2017-05-25
2228.0
2406.0
25
2017-05-26
2009.0
2228.0
26
2017-05-27
1502.0
2009.0
27
2017-05-28
1463.0
1502.0
28
2017-05-29
1745.0
1463.0
29
2017-05-30
2199.0
1745.0
30
2017-05-31
2292.0
2199.0

Exercises

Exercise 1

👉 Question: Show pageviews per visit by channel, rounded at 2 decimals, in descending order.

💡 Solution:

SELECT
    channel,
    ROUND(SUM(pageviews) / SUM(visits), 2) AS pageviews_per_visit
FROM analytics
GROUP BY 1
ORDER BY 1 DESC
channel
pageviews_per_visit
0
Social
2.47
1
Referral
5.51
2
Paid Search
5.25
3
(Other)
3.60
4
Organic Search
3.54
5
Display
4.22
6
Direct
4.00
7
Affiliates
2.66

Exercise 2

👉 Question: Show average number of daily visits.

💡 Solution:

SELECT
    AVG(visits) AS avg_visits_1,
    SUM(visits) / COUNT(date) AS avg_visits_2
FROM (
    SELECT
        date,
        SUM(visits) AS visits
    FROM analytics
    GROUP BY 1
) AS sub1
avg_visits_1
avg_visits_2
0
2108.741935
2108.741935

Exercise 3

👉 Question: Show visits, revenue and conversion rate by country, with their population, ordered by revenue decreasing, with non-zero revenue.

💡 Solution:

SELECT *
FROM (
    SELECT
        country,
        SUM(visits) AS visits,
        SUM(revenue) AS revenue,
        SUM(transactions) / SUM(visits) AS conv_rate
    FROM analytics
    GROUP BY 1
) AS sub1
LEFT JOIN countries
USING (country)
WHERE revenue > 0
ORDER BY revenue DESC
country
visits
revenue
conv_rate
population
0
United States
30401.0
116916.97
0.036545
331611270
1
Canada
1992.0
1787.29
0.008534
38443464
2
Japan
1860.0
1723.41
0.003226
125410000
3
Mexico
610.0
301.45
0.004918
126014024
4
Poland
615.0
136.04
0.001626
38230000
5
United Kingdom
3160.0
122.24
0.001582
66796807
6
Argentina
343.0
108.63
0.005831
45808747
7
Greece
245.0
100.77
0.004082
10718565
8
Taiwan
1135.0
95.14
0.001762
23525623
9
Saudi Arabia
116.0
85.15
0.008621
34218169
10
Ireland
577.0
67.18
0.003466
4977400
11
Sweden
413.0
62.60
0.004843
10385347
12
Brazil
793.0
49.98
0.001261
213088674
13
Thailand
378.0
42.93
0.002646
66642442
14
Colombia
302.0
34.99
0.003311
51049498
15
Guatemala
55.0
25.18
0.018182
17109746
16
Germany
1556.0
23.95
0.000643
83190556
17
Switzerland
482.0
13.99
0.002075
8667100
18
South Africa
136.0
13.59
0.007353
59622350

Exercise 4

👉 Question: Show daily variation of visits, in absolute and relative value.

💡 Solution:

SELECT
    date,
    visits,
    LEAD(visits) OVER (ORDER BY date) AS prev_visits,
    visits - LAG(visits) OVER (ORDER BY date) AS daily_var,
    (visits - LAG(visits) OVER (ORDER BY date)) / (LAG(visits) OVER (ORDER BY date)) AS daily_var_pct
FROM (
    SELECT
        date,
        SUM(visits) AS visits
    FROM analytics
    GROUP BY 1
) AS sub1
ORDER BY date
date
visits
prev_visits
daily_var
daily_var_pct
0
2017-05-01
2588.0
2564.0
NaN
NaN
1
2017-05-02
2564.0
2588.0
-24.0
-0.009274
2
2017-05-03
2588.0
2806.0
24.0
0.009360
3
2017-05-04
2806.0
1960.0
218.0
0.084235
4
2017-05-05
1960.0
1383.0
-846.0
-0.301497
5
2017-05-06
1383.0
1400.0
-577.0
-0.294388
6
2017-05-07
1400.0
2025.0
17.0
0.012292
7
2017-05-08
2025.0
2108.0
625.0
0.446429
8
2017-05-09
2108.0
2024.0
83.0
0.040988
9
2017-05-10
2024.0
2218.0
-84.0
-0.039848
10
2017-05-11
2218.0
1932.0
194.0
0.095850
11
2017-05-12
1932.0
1251.0
-286.0
-0.128945
12
2017-05-13
1251.0
1290.0
-681.0
-0.352484
13
2017-05-14
1290.0
2360.0
39.0
0.031175
14
2017-05-15
2360.0
3098.0
1070.0
0.829457
15
2017-05-16
3098.0
3117.0
738.0
0.312712
16
2017-05-17
3117.0
2715.0
19.0
0.006133
17
2017-05-18
2715.0
2223.0
-402.0
-0.128970
18
2017-05-19
2223.0
1855.0
-492.0
-0.181215
19
2017-05-20
1855.0
1645.0
-368.0
-0.165542
20
2017-05-21
1645.0
2248.0
-210.0
-0.113208
21
2017-05-22
2248.0
2129.0
603.0
0.366565
22
2017-05-23
2129.0
2406.0
-119.0
-0.052936
23
2017-05-24
2406.0
2228.0
277.0
0.130108
24
2017-05-25
2228.0
2009.0
-178.0
-0.073982
25
2017-05-26
2009.0
1502.0
-219.0
-0.098294
26
2017-05-27
1502.0
1463.0
-507.0
-0.252364
27
2017-05-28
1463.0
1745.0
-39.0
-0.025965
28
2017-05-29
1745.0
2199.0
282.0
0.192755
29
2017-05-30
2199.0
2292.0
454.0
0.260172
30
2017-05-31
2292.0
NaN
93.0
0.042292