It’s all very well downloading complex datasets from Kaggle and similar sources to play with – they’re amazing for learners because the data is always less clean than you would have hoped, more complex than you anticipated, and every bit as interesting as promised.
BUT if you’re learning a new concept it’s easier to have a simple, small, contained dataset so that you can really see what is going via the results in your code, and then relate it back to the raw data you started with.
So I felt like I discovered a small seam of gold this morning with these Linear Regression Datasets. Most are fewer than 100 rows long, and each has well-described pre-ambles about the columns of data included.
Herewith a couple of practice samples, using these datasets:
- Practice Run – Linear Regression Age vs Blood Pressure (simple linear regression)
- Practice Run – Linear Regression Fish Sizes (multi-variate linear regression)
Have fun!