Codebook
💾

Codebook

icon
Code chunks that may be useful in the daily work of data scientists. Most of them are related to Python for data analysis and SQL. New notes are added regularly, every time I discover useful bits of code.

Pandas 🐼
Convert categorical variables to numeric with pandas
Forward- or back-fill NA values in DataFrames
Format dates in pandas bar plots
Cross-join two pandas DataFrames
Create a DataFrame of zeros or nulls
Use date indexes with pandas DataFrames
Handle dates in pandas
Use all options of pandas DataFrame info()
Get differences between two pandas DataFrames
Explode and regroup lists into rows in a pandas DataFrame
Flatten JSON files with pandas
Flatten multi-index columns in pandas DataFrames
Aggregate a pandas DataFrame on multiple functions
Display more DataFrame cells in Jupyter notebooks
Read multiple CSV files into a pandas DataFrame
Use a column value if not Null, or another column from a pandas DataFrame
Pivot pandas Dataframes with non-numeric values
Rename pandas DataFrame columns with method chaining
Calculate moving averages and rolling windows with pandas
Select rows with NA values in any pandas DataFrame column
Split values into columns of a pandas DataFrame
Get top rows or random rows within groups in a DataFrame
Apply computation within groups in a pandas DataFrame
Go up a hierarchical tree with pandas
Make full use of pandas value_counts()
Get first row(s) of each group
Hash values in Python
Floor datetimes and dates
Python 🐍
Convert nested string field to dict in Python
Get today and yesterday date in Python
List files in a folder with Python
Format numbers the right way in Python
Read and write to Google Sheets with Python
Handle infinity values in Python
Get common elements between lists in Python
Open and read/write files in Python
Rearrange layers with Matplotlib and Seaborn plots in Python
Plot beautiful time series with Python seaborn
Train Machine Learning algorithms with Python scikit-learn
Plot great categorical distribution graphs with Python seaborn
Get the perfect color palette with Seaborn
Manage markers for lineplots with Python seaborn
Split ML data into training and test sets in Python
Manually create a legend on a Matplotlib plot
Simply loop through multi-axes figures in Matplotlib
SQL 🧮
Query on time window in SQL
Convert rows to columns with PIVOT in BigQuery
Use CASE in PostgreSQL
Calculate period-to-period change in SQL
Count words and characters in a string with PostgreSQL
Calculate cumulative sum in SQL
Substract days to a date in SQL
Compute time intervals in PostgreSQL
Import foreign PostgreSQL database
Filter with HAVING clause in SQL
Join on multiple SQL tables with fallback on first non-Null value
Join tables in SQL with multiple ON conditions
Avoid division by zero errors in SQL
Rank results in SQL with RANK, DENSE_RANK and ROW_NUMBER
Convert between epoch and datetime in Redshift
Query new vs returning users in SQL
Get rolling average in SQL
Round timestamp to the nearest minute, hour… in PostgreSQL
Compute sessions and streaks in SQL
Count the next occurences within a time window
Get percentiles in Redshift
Compute percent of column
Get first or last value in a group
Other 🧑‍💻
Combine ‘cd’ & ‘ls’ into a single command
Compress and decompress files on Linux
Batch convert files encoding in Terminal
Delete all .DS_Store files on MacOS
Find files matching pattern in Terminal
Resize picture zone in MacOS Preview