Process-generated and administrative datasets have become increasingly important for labor market research over the past ten years. Their major advantages are large sample sizes and the absence of retrospective gaps and unit non-response. Nevertheless, the quality and validity of these types of data remains unclear, and a great deal of preparation and data cleansing is necessary before the data can be analyzed. Unfortunately, few researchers explicitly describe the cleansing procedures or coding decisions used for this purpose, thus leaving their impact on the results unclear. The present paper focuses on the variation in research results resulting from different cleansing and coding procedures. The paper uses the framework of data preparation proposed by Wunsch/Lechner (2008) as a benchmark, and induces variation by developing different cleansing procedures and coding decisions for overlapping and parallel observations. The descriptive results show that the data sets (resulting from the different procedures) show varying ranges of difference for some attributes related to time and personal characteristics. Similar results emerge from the subsequent analysis of treatment effects, which do not vary in overall shape but in magnitude, especially during the lock-in effect. In sum, the results indicate that the empirical findings of evaluation studies based on matching algorithms are fairly robust to variations in the underlying method of data preparation.