
Generalization/true error
This is the second and more important type of error in data science. The whole purpose of building learning systems is the ability to get a smaller generalization error on the test set; in other words, to get the model to work well on a set of observation/samples that haven't been used in the training phase. If you still consider the class scenario from the previous section, you can think of generalization error as the ability to solve exam problems that weren’t necessarily similar to the problems you solved in the classroom to learn and get familiar with the subject. So, generalization performance is the model's ability to use the skills (parameters) that it learned in the training phase in order to correctly predict the outcome/output of unseen data.
In Figure 13, the light blue line represents the generalization error. You can see that as you increase the model complexity, the generalization error will be reduced, until some point when the model will start to lose its increasing power and the generalization error will decrease. This part of the curve where you get the generalization error to lose its increasing generalization power, is called overfitting.
The takeaway message from this section is to minimize the generalization error as much as you can.