The Impact of Outliers on Linear Regression Models: Detection and Correction Strategies
DOI:
https://doi.org/10.58840/fzbcv732Keywords:
Outliers, Linear Regression, Robust Regression, Huber Regression, RANSAC, Model Accuracy, Mean Squared Error (MSE)Abstract
Outliers can significantly distort the results of linear regression models, leading to misleading conclusions and reduced predictive performance. This study investigates the impact of outliers on three regression techniques—Ordinary Least Squares (OLS), Huber Regression, and RANSAC—by comparing their behavior on both clean and contaminated datasets. Using simulated data with a 5% contamination rate, we evaluated each model based on key metrics, including slope, intercept, Mean Squared Error (MSE), and Coefficient of Determination (R2R^2). Our findings reveal that OLS is highly sensitive to outliers, showing a dramatic increase in MSE and a substantial drop in R2R^2 despite minimal changes in coefficients. Huber Regression offers slightly improved resilience by down-weighting the influence of extreme values, but still suffers from performance decline. RANSAC demonstrates the most robustness, with the smallest drop in R2R^2 and an active reconfiguration of regression parameters to exclude outlier influence. The results underscore the necessity of incorporating outlier detection and correction strategies in regression modeling, especially in real-world datasets prone to noise. The study provides practical insights for researchers and practitioners seeking to improve the reliability and interpretability of regression models under non-ideal data conditions.