Sho't left to data science

Co-variance, Correlation & Linear Regression

May 3, 2018

—

in machine learning, mathematics, python, statistics

Screen Shot 2018-05-03 at 07.21.01 Typically we have 2 sets of values and we want to find out if these 2 sets of values are related, and if so how, and by how much? Could height be indicative of weight? Could hours of practice be related to how many errors are made in a mathematical test paper?

Co-variance is a start – if this number is positive then we know that as one variable increases so does the other (e.g. heights and weights); if it’s negative then as one variable increases the other decreases (e.g. practice hours and math test errors – hopefully!). The problem with co-variance though, is that it isn’t normalized – and how often will we find 2 sets of values with the same unit of measure? So if we’re comparing heights and weights and we get a co-variance of, say, 7: how should we evaluate this: it’s an enormous number in terms of height, but tiny in terms of weight.

The Correlation co-efficient is a much more reliable indicator as it normalizes the data and gives you a number between -1 and 1, with -1 being a perfect negative correlation (our math test example), 1 being a perfect positive correlation (our height and weight example) and 0 being no correlation whatsoever.

Having hopefully first established, by finding the correlation co-efficient, that there is a definite relationship between 2 values, you’d want to find the equation for the line that best fits and describes that relationship. Why? Because then, given the height of any other random future person, we could predict their probable weight, and vice-versa. The process of finding the equation for this line is what we call Linear Regression!

A picture speaks 1000 words, so take a look at the sample Python code with a more in-depth explanation of key concepts and calculations in How it works – Covariance, Correlation & Linear Regression on Github.

how to mathematics probability python statistics

Comments

3 responses to “Co-variance, Correlation & Linear Regression”

Co-variance, Correlation & Linear Regression – Sho’t left to data science – Sampy

May 3, 2018

[…] https://shotlefttodatascience.com/2018/05/03/co-variance-correlation-linear-regression/ […]

LikeLike
Polynomial regression – Sho't left to data science

May 4, 2018

[…] regression is a considered a special case of linear regression where higher order powers (x2, x3, etc.) of an independent variable are included. It’s […]

LikeLike
Multivariate regression – Sho't left to data science

May 7, 2018

[…] learned, and for some reason I had a burning desire to satisfy myself that concepts covered in Co-variance, Correlation & Linear Regression would map to the next level of multivariate regression, so using Adi’s tutorial, I offer here […]

LikeLike