Search CORE

732 research outputs found

Curriculum Guidelines for Undergraduate Programs in Data Science

Author: Agarwal Mahesh
Averett Maia
Baumer Benjamin
Bray Andrew
Bressoud Thomas
Bryant Lance
Cheng Lei
De Veaux Richard
Francis Amanda
Gould Robert
Kim Albert Y.
Kretchmar Matt
Lu Qin
Moskol Ann
Nolan Deborah
Pelayo Roberto
Raleigh Sean
Sethi Ricky J.
Sondjaja Mutiara
Tiruviluamala Neelesh
Uhlig Paul
Washington Talitha
Wesley Curtis
White David
Ye Ping
Publication venue: 'Annual Reviews'
Publication date: 01/01/2017
Field of study

The Park City Math Institute (PCMI) 2016 Summer Undergraduate Faculty Program met for the purpose of composing guidelines for undergraduate programs in Data Science. The group consisted of 25 undergraduate faculty from a variety of institutions in the U.S., primarily from the disciplines of mathematics, statistics and computer science. These guidelines are meant to provide some structure for institutions planning for or revising a major in Data Science

arXiv.org e-Print Archive

Smith College: Smith ScholarWorks

Recommended from our members

Scholarly insight Spring 2018: a Data wrangler perspective

Author: Calder Kathleen
Clow Doug
Coughlan Tim
Cross Simon
Edwards Chris
Evans Gerald
Gaved Mark
Herodotou Christothea
Hidalgo Rafael
Jones Edwina
Lay Stephanie
Lowe Sue
Mangafa Chrysoula
Rienties Bart
Ullmann Thomas
Publication venue: Open University UK
Publication date: 01/09/2018
Field of study

In the movie classic Back to the Future a young Michael J. Fox is able to explore the past by a time machine developed by the slightly bizarre but exquisite Dr Brown. Unexpectedly by some small intervention the course of history was changed a bit along Fox’s adventures. In this fourth Scholarly Insight Report we have explored two innovative approaches to learn from OU data of the past, which hopefully in the future will make a large difference in how we support our students and design and implement our teaching and learning practices. In Chapter 1, we provide an in-depth analysis of 50 thousands comments expressed by students through the Student Experience on a Module (SEAM) questionnaire. By analysing over 2.5 million words using big data approaches, our Scholarly insights indicate that not all student voices are heard. Furthermore, our big data analysis indicate useful potential insights to explore how student voices change over time, and for which particular modules emergent themes might arise. In Chapter 2 we provide our second innovative approach of a proof-of-concept of qualification path way using graph approaches. By exploring existing data of one qualification (i.e., Psychology), we show that students make a range of pathway choices during their qualification, some of which are more successful than others. As highlighted in our previous Scholarly Insight Reports, getting data from a qualification perspective within the OU is a difficult and challenging process, and the proof-of-concept provided in Chapter 2 might provide a way forward to better understand and support the complex choices our students make. In Chapter 3, we provide a slightly more practically-oriented and perhaps down to earth approach focussing on the lessons-learned with Analytics4Action. Over the last four years nearly a hundred modules have worked with more active use of data and insights into module presentation to support their students. In Chapter 3 several good-practices are described by the LTI/TEL learning design team, as well as three innovative case-studies which we hope will inspire you to try something new as well. Working organically in various Faculty sub-group meetings and LTI Units and in a google doc with various key stakeholders in the Faculties, we hope that our Scholarly insights can help to inform our staff, but also spark some ideas how to further improve our module designs and qualification pathways. Of course we are keen to hear what other topics require Scholarly insight. We hope that you see some potential in the two innovative approaches, and perhaps you might want to try some new ideas in your module. While a time machine has not really been invented yet, with the increasing rich and fine-grained data about our students and our learning practices we are getting closer to understand what really drives our students

Open Research Online (The Open University)

Towards information profiling: data lake content metadata management

Author: Abelló Gamazo Alberto
Al-serafi Ayman Mounir Mohamed
Calders Toon
Romero Moral Óscar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

There is currently a burst of Big Data (BD) processed and stored in huge raw data repositories, commonly called Data Lakes (DL). These BD require new techniques of data integration and schema alignment in order to make the data usable by its consumers and to discover the relationships linking their content. This can be provided by metadata services which discover and describe their content. However, there is currently a lack of a systematic approach for such kind of metadata discovery and management. Thus, we propose a framework for the profiling of informational content stored in the DL, which we call information profiling. The profiles are stored as metadata to support data analysis. We formally define a metadata management process which identifies the key activities required to effectively handle this.We demonstrate the alternative techniques and performance of our process using a prototype implementation handling a real-life case-study from the OpenML DL, which showcases the value and feasibility of our approach.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

Managing data through the lens of an ontology

Author: Lenzerini Maurizio
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 01/01/2018
Field of study

Ontology-based data management aims at managing data through the lens of an ontology, that is, a conceptual representation of the domain of interest in the underlying information system. This new paradigm provides several interesting features, many of which have already been proved effective in managing complex information systems. This article introduces the notion of ontology-based data management, illustrating the main ideas underlying the paradigm, and pointing out the importance of knowledge representation and automated reasoning for addressing the technical challenges it introduces

Archivio della ricerca- Università di Roma La Sapienza

Association for the Advancement of Artificial Intelligence: AAAI Publications

A Data Science Course for Undergraduates: Thinking with Data

Author: Baumer Ben
Publication venue
Publication date: 18/03/2015
Field of study

Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.Comment: 21 pages total including supplementary material

arXiv.org e-Print Archive

Smith College: Smith ScholarWorks

DFS: A Dataset File System for Data Discovering Users

Author: Jayarathna Sampath
Jayawardana Yasith
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2019
Field of study

Many research questions can be answered quickly and efficiently using data already collected for previous research. This practice is called secondary data analysis (SDA), and has gained popularity due to lower costs and improved research efficiency. In this paper we propose DFS, a file system to standardize the metadata representation of datasets, and DDU, a scalable architecture based on DFS for semi-automated metadata generation and data recommendation on the cloud. We discuss how DFS and DDU lays groundwork for automatic dataset aggregation, how it integrates with existing data wrangling and machine learning tools, and explores their implications on datasets stored in digital libraries

arXiv.org e-Print Archive

Crossref