I always like to understand concepts well before I use them (which is good because it's the right thing to do, but bad because it slows me down a lot!), so it was with great excitement that I came across Matt Brems' article A One-Stop Shop for Principal Component Analysis recently. If you read this... Continue Reading →
If you buy one book in 2020…
...make it Grokking Deep Learning by Andrew Trask! This gem of a book breaks deep learning down to its smallest component parts and then builds up your understanding from there. It's the equivalent of stripping your car down to nuts and bolts and then re-building it: at the end, you will know to a certainty... Continue Reading →
Tutorial: BigQuery arrays and structs
The first time I encountered the BigQuery export schema this year my heart sank: arrays and structs were not something covered in my SQL intro course! But having spent a few months extracting data like this I've come to appreciate the logic. These are all the 'notes to self' I wish I'd had at the... Continue Reading →
Finding relationships between words
This week I did something a bit different and rather fun! My colleague Carel phoned to say he was bringing his 11-year old daughter, Lisa-Marie, to work the next day and did I have anything interesting to share with her about the world of data science? As it happens I've spent the past couple of... Continue Reading →
Avoiding for loops in Pandas
There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results - and the lesson I keep learning is Don't do it! Every time I'm tempted to write a for loop with Pandas data I find myself clock watching and cursing... 9 times out of 10 there... Continue Reading →
Emerging from Data Science Intensive
In September & October I was fortunate enough to attend the Data Science Intensive Program (DSI) in Cape Town. In a word: WOW! The program brought together 16 students from 7 African countries for 8 very intensive weeks with an ambitious goal: To ensure that anyone who completes the DSI is able to contribute significant... Continue Reading →
Data Science for Good Challenge – Kaggle
I picked Center for Policing Equity challenge on Kaggle for three reasons: I love maps and I love the idea that data scientists can significantly improve our world, in addition to improving the bottom lines of big corporates. And this is exactly the type of messy data one would get in the real world so... Continue Reading →
Pandas dataframe styling – cool!
I always like to visualize data and see the detail if possible so it was with great joy that I stumbled across DataFrame.style this morning. Here is an example of how it helps us to visualize some Titanic survival rates by sex and passenger class: The Pandas documentation itself is pretty comprehensive, but if you're looking... Continue Reading →
Regex can do amazing things with data cleanups - basically mandatory must use. But also tricky to retain in brain if not used frequently... Here are 3 great reference and test resources that can help: https://docs.python.org/3/howto/regex.html https://regexr.com/ https://regexone.com/
I've just discovered the awesome Brandon Rohrer and his blog while trying to find an intelligible article on Bayesian inference. What a goldmine - this guy is a born educator! Thank you for sharing your knowledge - it is well-appreciated!