Getting Started with lasso regression in R

Do you have a large set of potentially valuable predictors, and you’d like to narrow it down to just the most important ones for your outcome of interest? Lasso regression may be the right tool for you.

Lasso regression

Lasso (which stands for “least absolute shrinkage and selection operator”) regression is a statistical technique built on standard linear regression models, but unlike regular regression, it allows you to start with a large set of possible predictors and narrow it down to just those with the strongest relationships to your outcome.

Statistician Robert Tibshirani, who first described lasso regression as it is implemented today, has also put a tremendous amount of work into making high quality educational materials available for those wishing to learn about lasso regression and related techniques and implement them in R. He is one of the authors of the popular glmnet R package, and co-teaches the free online course Statistical Learning along with fellow influential statistician Trevor Hastie.

Resources for learning lasso regression in R

If you are already comfortable with lasso regression as a statistical technique and simply want to learn how to implement it in R, then I recommend you start with the help materials for the glmnet package, especially the quickstart guide. If you need a little more background on how to use R first, check out our R 101 guide.

If you want more of the statistical background behind lasso regression, not just the R code to run it, then I recommend you start with either the free online Statistical Learning course offered by Stanford, or the textbook (also freely available online) the course is based on: Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. In either case, you’ll find they cover a lot more than just lasso regression. They start with the basics of ordinary least squared linear regression and build up to a wide range of important machine learning techniques. You can work through as much or as little of the material as you like (if you want to jump to lasso regression, you’ll find it’s in Chapter 6). Both the course and textbook use R for all of their practical examples and exercises.