I am not, nor have I ever been a member of a data-mining discipline Clinton A. Greene Abstract This paper argues classical statistics and standard econometrics are based on a desire to meet scientific standards for accumulating reliable knowledge. Science requires two inputs, mining of existing data for inspiration and new or ‘out-of-sample ’ data for predictive testing. Avoidance of data-mining is neither possible nor desirable. In economics out-of-sample data is relatively scarce, so the production process should intensively exploit the existing data. But the two inputs should be thought of as complements rather than substitutes. And we neglect the importance of out-of-sample testing in the production of reliable knowledge. Avoidance of data-mining is not a substitute for tests conducted in new samples. The problem is not that data-mining corrupts the process, the problem is our collective neglect of out-of-sample encompassing, stability and forecast tests. So the data-mining issue diverts us from the crucial margin
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.