Multicollinearity simply occurs when independent variables in the dataset are correlated. Well, that’s putting it rather simply. The truth is that it’s a nuanced concept, and can have a serious impact on your dataset if not dealt with at the right moment.
As a bad analogy, It always reminds me of that one terrible party crasher that claims to know someone important at the party, but just ruins the party’s outcome, making everyone wish they hadn’t come.
So how do we get rid of him/it?
Not so fast. You see, many data science newbies like myself have succumbed to the…
What happens under the hood of linear regression models? And why are they so powerful?
The terminology and definitions can get a bit fuzzy and confusing for beginners in data science like myself. And when that happens, it’s always useful to look at some ugly mathematical notation to remind ourselves of the beauty of regression models.
So in the first component of this three part series, let’s focus on the correlation coefficient and it’s place in the LR model process. Many of us will know our df.corr() call method in Pandas that spits out a full range of exquisite continuous…