After reading about the Anscombe’s quartet, I felt like playing around with R.
Using the datasets package, it actually takes just a few seconds to check that the traditional statistical properties of the four datasets are quite similar:
The datasets, however, don’t look similar when we graph them and we can clearly identify the outliers.
What is the takeaway message from that? We should appreciate the value of data visualization and deep analytics.