Using remote access to big datasets efficiently with Stata
- Publication date
- Publisher
In this talk, I discuss problems experienced and solutions developed with Stata, using remote access to a big dataset (around 10GB) of the Institute for Employment Research (IAB). I focus on two topics. The first problem is that of not directly controlling the data. The solution here is to implement good pre-documentation into the do-files to structure and improve the communication with the people hosting the remote access. Second, there are memory and running-time problems with using such a large dataset; I discuss this problem in relation to the first one. The solution here is the extensive use of sampling techniques. I present routines for entering such sampling procedures into remote-access do-files.