Prediction with Linear Regression

Let's pick up where we left off in the previous lesson. We have the coefficients of the best-fit-line, slope = 1.8 and intercept = -7.1. We can use these values to make predictions about the data. In the following equation we add a hat above the estimated coefficients to indicate that they are estimations. We then also add a hat above the y to indicate that it is a prediction. Further, since we are making predictions we drop the error term \varepsilon from the prediction equation. This is because the error term represents the unobservable, random deviation from the line of best fit in the actual data, and when predicting new values, we only use the deterministic part of our model - the calculated slope and intercept - to estimate the expected value of y based on given x values.

\hat{y} = \hat{m} * x + \hat{b}

Let's see what the predicted value of y is when x=6 and add this point to the plot.

\begin{aligned} \hat{y} = \hat{m} * 6 + \hat{b} \\ \hat{y} = 1.8 * 6 - 7.1 \\ \hat{y} = 3.7 \end{aligned}
A placeholder until the user runs the code

Predicting for every value of x

When we have a simple prediction equation for y we can get the predicted every value for all x using the line: y_vals = intercept + slope * x_vals as we did. However, sometimes the predict equation is either more complicated, or is coming from a library, and we may want to make these predictions using a function instead. We will show how to do this two ways. First, we will use a for loop, which is the more intuitive way to do this. Then we will use a method called list comprehension, which also uses a for loop, but has some optimizations under the hood to not only make it a more concise way to write this code but also a more efficient application, in terms of computation time.

We collect all the predicted \hat{y} values into a dataframe to make it easier to compare them. Since we see the two new methods of getting the predicted values of y are equivalent, it's up to you as to which one makes sense. In terms of computation time, the first and third are the most efficient, but sometimes the second one needs to be used due to the complexity of the prediction going on.

Evaluating the model

Now that we have a model, we can evaluate how well it fits the data. We can do this by calculating the mean squared error (MSE), which is the sum of the squared differences between the actual y values and the predicted \hat{y} values, divided by the number of observations. Given by the following equation:

\begin{aligned} MSE = \sum_{i=1}^{n} (y_i - \hat{y_i})^2 \end{aligned}

The MSE is a measure of how close the data is to the fitted regression line, the smaller the better. Since we calculate the MSE for the linear regression model of 0.86, we can say that the model fits the data well. And we will keep this in our minds for the next section where we will do this same estimation but using a regression tree, then we can make a proper comparison of how the two performed.