Deep Learning By Example
上QQ阅读APP看书,第一时间看更新

Data cleaning

Most datasets require this step, in which you get rid of errors, noise, and redundancies. We need our data to be accurate, complete, reliable, and unbiased, as there are lots of problems that may arise from using bad knowledge base, such as:

  • Inaccurate and biased conclusions
  • Increased error
  • Reduced generalizability, which is the model's ability to perform well over the unseen data that it didn't train on previously