Big data just seems to get bigger all the time, but that doesn’t mean it gets any less messy. Even large, carefully cultivated government datasets suffer from irregularities like acronyms, open response items, and misused categories. Steadfast librarians have the patience for such inaccuracies, but undergraduate students are often unprepared for the realities of the big data they crave. Teaching data cleaning and collaboration can help students better understand and use large datasets but also illustrate the importance of library-cultivated data, as it often has fewer of these problems than datasets found on the open web. At a high level, library data and open datasets may be seem comparable, but when we give students the tools to slog through the data on their own, the small things start to add up
Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.