Practical Predictive Analytics
上QQ阅读APP看书,第一时间看更新

Inputting and Exploring Data

"On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?" ... I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."
-Charles Babbage

In this chapter, we will cover inputting and exploring data. In the first two chapters, we covered processing some datasets that already reside within R packages. We purposefully avoided reading any external data sources. However, now we will. The inputting data section will cover various mechanisms for reading your own data into R.

The exploring data section covers some techniques you can use to implement successful completion of the second and third of the data understanding and data preparation steps of the CRISP-DM process we covered in the last chapter.

The topics we will cover include the following:

  • Getting data into R
  • Generating your own data
  • Munging and joining data
  • Data cleaning techniques
  • Data transformations
  • Analyzing missing values and outliers
  • Variable reduction techniques