732 research outputs found
Curriculum Guidelines for Undergraduate Programs in Data Science
The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program
met for the purpose of composing guidelines for undergraduate programs in Data
Science. The group consisted of 25 undergraduate faculty from a variety of
institutions in the U.S., primarily from the disciplines of mathematics,
statistics and computer science. These guidelines are meant to provide some
structure for institutions planning for or revising a major in Data Science
Recommended from our members
Scholarly insight Spring 2018: a Data wrangler perspective
In the movie classic Back to the Future a young Michael J. Fox is able to explore the past by a time machine developed by the slightly bizarre but exquisite Dr Brown. Unexpectedly by some small intervention the course of history was changed a bit along Fox’s adventures. In this fourth Scholarly Insight Report we have explored two innovative approaches to learn from OU data of the past, which hopefully in the future will make a large difference in how we support our students and design and implement our teaching and learning practices. In Chapter 1, we provide an in-depth analysis of 50 thousands comments expressed by students through the Student Experience on a Module (SEAM) questionnaire. By analysing over 2.5 million words using big data approaches, our Scholarly insights indicate that not all student voices are heard. Furthermore, our big data analysis indicate useful potential insights to explore how student voices change over time, and for which particular modules emergent themes might arise.
In Chapter 2 we provide our second innovative approach of a proof-of-concept of qualification path way using graph approaches. By exploring existing data of one qualification (i.e., Psychology), we show that students make a range of pathway choices during their qualification, some of which are more successful than others. As highlighted in our previous Scholarly Insight Reports, getting data from a qualification perspective within the OU is a difficult and challenging process, and the proof-of-concept provided in Chapter 2 might provide a way forward to better understand and support the complex choices our students make.
In Chapter 3, we provide a slightly more practically-oriented and perhaps down to earth approach focussing on the lessons-learned with Analytics4Action. Over the last four years nearly a hundred modules have worked with more active use of data and insights into module presentation to support their students. In Chapter 3 several good-practices are described by the LTI/TEL learning design team, as well as three innovative case-studies which we hope will inspire you to try something new as well.
Working organically in various Faculty sub-group meetings and LTI Units and in a google doc with various key stakeholders in the Faculties, we hope that our Scholarly insights can help to inform our staff, but also spark some ideas how to further improve our module designs and qualification pathways. Of course we are keen to hear what other topics require Scholarly insight. We hope that you see some potential in the two innovative approaches, and perhaps you might want to try some new ideas in your module. While a time machine has not really been invented yet, with the increasing rich and fine-grained data about our students and our learning practices we are getting closer to understand what really drives our students
Towards information profiling: data lake content metadata management
There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft
Managing data through the lens of an ontology
Ontology-based data management aims at managing data through the lens of an ontology, that is, a conceptual representation of the domain of interest in the underlying information system. This new paradigm provides several interesting features, many of which have already been proved effective in managing complex information systems. This article introduces the notion of ontology-based data management, illustrating the main ideas underlying the paradigm, and pointing out the importance of knowledge representation and automated reasoning for addressing the technical challenges it introduces
A Data Science Course for Undergraduates: Thinking with Data
Data science is an emerging interdisciplinary field that combines elements of
mathematics, statistics, computer science, and knowledge in a particular
application domain for the purpose of extracting meaningful information from
the increasingly sophisticated array of data available in many settings. These
data tend to be non-traditional, in the sense that they are often live, large,
complex, and/or messy. A first course in statistics at the undergraduate level
typically introduces students with a variety of techniques to analyze small,
neat, and clean data sets. However, whether they pursue more formal training in
statistics or not, many of these students will end up working with data that is
considerably more complex, and will need facility with statistical computing
techniques. More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data
science. The course emphasizes modern, practical, and useful skills that cover
the full data analysis spectrum, from asking an interesting question to
acquiring, managing, manipulating, processing, querying, analyzing, and
visualizing data, as well communicating findings in written, graphical, and
oral forms.Comment: 21 pages total including supplementary material
DFS: A Dataset File System for Data Discovering Users
Many research questions can be answered quickly and efficiently using data
already collected for previous research. This practice is called secondary data
analysis (SDA), and has gained popularity due to lower costs and improved
research efficiency. In this paper we propose DFS, a file system to standardize
the metadata representation of datasets, and DDU, a scalable architecture based
on DFS for semi-automated metadata generation and data recommendation on the
cloud. We discuss how DFS and DDU lays groundwork for automatic dataset
aggregation, how it integrates with existing data wrangling and machine
learning tools, and explores their implications on datasets stored in digital
libraries
- …