How to – KMeans clustering

Clustering is a type of unsupervised learning. Us humans would think of it as 'categorization' perhaps. For example, if I gave you a bag of red, blue and white balls and asked you to sort them (without telling you how) you would probably naturally gravitate towards sorting them by colour as this would be the... Continue Reading →

If you buy one book in 2020…

...make it Grokking Deep Learning by Andrew Trask! This gem of a book breaks deep learning down to its smallest component parts and then builds up your understanding from there. It's the equivalent of stripping your car down to nuts and bolts and then re-building it: at the end, you will know to a certainty... Continue Reading →

Avoiding for loops in Pandas

There will be times when you are tempted to loop through rows or columns in Pandas to achieve your results - and the lesson I keep learning is Don't do it! Every time I'm tempted to write a for loop with Pandas data I find myself clock watching and cursing... 9 times out of 10 there... Continue Reading →

Emerging from Data Science Intensive

In September & October I was fortunate enough to attend the Data Science Intensive Program (DSI) in Cape Town. In a word: WOW! The program brought together 16 students from 7 African countries for 8 very intensive weeks with an ambitious goal: To ensure that anyone who completes the DSI is able to contribute significant... Continue Reading →

Pandas dataframe styling – cool!

I always like to visualize data and see the detail if possible so it was with great joy that I stumbled across DataFrame.style this morning. Here is an example of how it helps us to visualize some Titanic survival rates by sex and passenger class: The Pandas documentation itself is pretty comprehensive, but if you're looking... Continue Reading →

Create a website or blog at WordPress.com

Up ↑