1 research outputs found
Recommended from our members
Modeling Human Learning of Data Quality Rules
The process of cleaning data is quite laborious and can be complicated by the presence of inconsistencies and other issues in the data, leading many data scientists to utilize interactive data cleaning software. However, many current state-of-the-art cleaning solutions fail to account for the reality that the human needs to explore the data to learn and as a result may periodically provide noisy feedback. Therefore, we need to understand how humans fundamentally learn data quality rules over the data and how that translates to identifying violations of these rules in the data they work with. In this thesis, we perform an empirical analysis of how humans understand and iteratively learn data quality rules, working to both quantify and model this learning process. We find that a Bayesian learning model replicates user behavior well when the model success definition is strict, while a hypothesis testing model also performs well when the success definition is loosened