1 research outputs found
DFS: A Dataset File System for Data Discovering Users
Many research questions can be answered quickly and efficiently using data
already collected for previous research. This practice is called secondary data
analysis (SDA), and has gained popularity due to lower costs and improved
research efficiency. In this paper we propose DFS, a file system to standardize
the metadata representation of datasets, and DDU, a scalable architecture based
on DFS for semi-automated metadata generation and data recommendation on the
cloud. We discuss how DFS and DDU lays groundwork for automatic dataset
aggregation, how it integrates with existing data wrangling and machine
learning tools, and explores their implications on datasets stored in digital
libraries