Sho't left to data science

How to – Principal Component Analysis

Jan 3, 2020

—

in data preparation, machine learning, python

download

I always like to understand concepts well before I use them (which is good because it’s the right thing to do, but bad because it slows me down a lot!), so it was with great excitement that I came across Matt Brems’ article A One-Stop Shop for Principal Component Analysis recently.

If you read this article I promise you will get it :). Once I’d read it I was inspired to create myself a notebook for future reference so that I would be able to see the theory in action and understand some use cases and how to implement them, so the main purpose of this notebook is to:

look at some examples to illustrate the difference between feature selection and feature extraction, using a simple scenario with the iris dataset taken from the sklearn documentation on PCA
and then to look at a practical application of PCA from the world of NLP (visualizing high-dimensional vectors in 3D space) using GloVe word embeddings

Remember, whether we’re talking feature selection or feature extraction, our goal is to take a dataset that has many variables and reduce it to a dataset that has fewer variables but remains strongly representative of our data.

My suggestion is to start with Matt Brem’s article and then have a look at the notebook as a follow-on activity. Enjoy! How it works – Principal Component Analysis

how to PCA Principal Component Analysis python