Hashing values, for example to anonymise personal information, can easily be done using the Python hashlib module:
# Import library
import hashlib
# Apply SHA1 hash
email = "lara@croft.org"
hashlib.sha1(email.encode()).hexdigest()
8104be5f19b5be4cec9185173a26aa121935d656
If you work with a pandas dataframe, you can apply the function to a full column:
# Create dataframe with plain text emails
df = pd.DataFrame({
'email': [
'harry@potter.com',
'sherlock@holmes.co.uk',
'lara@croft.org',
'frodo@baggins.com',
'arsene@lupin.fr'
]
})
# Hash email column
df['hash'] = df['email'].apply(lambda x: hashlib.sha1(x.encode()).hexdigest())
email | hash | |
0 | harry@potter.com | 964721fca55c89f80dc59a86f077bffb6e14ffc4 |
1 | sherlock@holmes.co.uk | 76cb17778847d74fb4558d453e67aedbac573c88 |
2 | lara@croft.org | 8104be5f19b5be4cec9185173a26aa121935d656 |
3 | frodo@baggins.net | 6589497c3ad91249aa51ddf9f5165fb1becf95d0 |
4 | arsene@lupin.fr | 84edb980256dad67a693ce96ba8caef8eadb9a13 |