top of page

Looks Right, Feels Wrong

How Multicollinearity Destroys Trust in Your Regression Model — and What PCA Can Do About It

📘 This is Part 2 in our regression (PCA) series. If you're unfamiliar with multicollinearity, start with Part 1: The Silent Killer of Your Regression Model


🚨 The Setup: Retail Marketing Spend

Imagine you're a retail analyst building a regression model to understand what drives monthly sales. You include:

  • Email Campaign Budget

  • Social Media Ads Budget

  • Search Engine Ads Budget

  • Store Footfall


Seems like a solid list, right? Let’s run a regression.



🔢 The Problem: Your Model Doesn’t Know Who to Credit

Here’s the raw regression result:


Regression Before PCA


R²: 0.440
Adjusted R²: 0.417

P-values:
Email_Spend: 0.234 ❌
Social_Spend: 0.186 ❌
Search_Spend: 0.240 ❌
Footfall: 0.011 ✅

Despite a decent R², none of the marketing variables are significant. Why?


Let’s take a look under the hood.



🔍 Multicollinearity in Action


🔄 Correlation Heatmap

Email, Social, and Search spends are highly correlated — above 0.9. This means they are essentially repeating the same information.



🔢 VIF Scores

Variable

VIF

Email_Spend

95.58

Social_Spend

61.34

Search_Spend

35.45

Footfall

1.06

❗️ When VIF > 10, multicollinearity is serious. Here, it’s screaming high multicollinearity. The model is confused who deserves credit.


🧹 The Fix: Principal Component Analysis (PCA)

PCA creates new, uncorrelated variables (called principal components) by combining the original predictors.

Think of it like reorganizing your messy closet into neat drawers:


🧵 PC1: Overall Marketing Activity

Combines Email, Social, and Search into a single, powerful signal of digital spend intensity.


🛍️ PC2: Channel Mix — Offline vs Online

Differentiates between heavy store footfall and online channels. Helps us understand balance in strategy.


PCA Loadings

Component

Email

Social

Search

Footfall

PC1

-0.58

-0.58

-0.57

0.09

PC2

0.04

0.05

0.07

1.00



📊 Regression After PCA

Now we run regression again, this time using PC1 and PC2.

Regression After PCA

R²: 0.421
Adjusted R²: 0.409

P-values:
PC1: < 0.001 ✅
PC2:  0.003 ✅

🚀 Both components are statistically significant. We now have a model that is cleaner, clearer, and no longer confused by overlapping variables.



💪 Takeaway: Fixing the Story Behind the Numbers

This time, your model isn’t just technically right — it feels right too.

  • PC1 gives credit to combined digital effort

  • PC2 adds insight into strategic channel balance


No variable was dropped. No signal was lost. Just smarter math.



 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
Featured Posts
Recent Posts

Copyright by FYT CONSULTING PTE LTD - All rights reserved

  • LinkedIn App Icon
bottom of page