data:image/s3,"s3://crabby-images/ab03b/ab03b86061fc950414fe553f366b32df15ca196d" alt="Hands-On Automated Machine Learning"
What is linear regression?
It is the traditional and most-used regression analysis. It is studied rigorously and used widely for practical purposes. Linear regression is a method for determining the relationship between a dependent variable (y) and one or more independent variables (x). This derived relationship can be used to predict an unexplained y from observed x's. Mathematically, if x is an independent variable (commonly known as the predictor) and y is a dependent variable (also known as the target), the relationship is expressed as follows:
Where m is the slope of line, b is the intercept of the best-fit regression line, and ε is the error term that is a deviation of the actual and predicted values.
This is the equation for simple linear regression, as it involves only one predictor (x) and one target (y). When there are multiple predictors involved to predict a target, it is known as multiple linear regression. The term linear suggests there is a fundamental assumption that the underlying data exhibits a linear relationship.
Let's create a scatter plot between two variables: Quantity Sold and Revenue of a product. We can infer from the plot that there is some positive relationship between these two variables, that is when the quantity of the products sold surged, the revenue went up. However, we can't establish a relationship between them to predict revenue from the quantity sold:
data:image/s3,"s3://crabby-images/f67a5/f67a5c707274840f985a52db7875c6627c908124" alt=""
If we extend our previous scatter plot and add a trend line to it, we see the line of best fit. Any data points that lie on this line are flawlessly predicted values. As we move away from this line, the reliability of the prediction decreases:
data:image/s3,"s3://crabby-images/24a17/24a17a439b1037a3ca606edc31d91c8519499b42" alt=""
So, how do we find the best fit line? The most common and widely used technique is the ordinary least square (OLS) estimate.