116,932 research outputs found
Teaching Data Science
We describe an introductory data science course, entitled Introduction to
Data Science, offered at the University of Illinois at Urbana-Champaign. The
course introduced general programming concepts by using the Python programming
language with an emphasis on data preparation, processing, and presentation.
The course had no prerequisites, and students were not expected to have any
programming experience. This introductory course was designed to cover a wide
range of topics, from the nature of data, to storage, to visualization, to
probability and statistical analysis, to cloud and high performance computing,
without becoming overly focused on any one subject. We conclude this article
with a discussion of lessons learned and our plans to develop new data science
courses.Comment: 10 pages, 4 figures, International Conference on Computational
Science (ICCS 2016
Data Science and Ebola
Data Science---Today, everybody and everything produces data. People produce
large amounts of data in social networks and in commercial transactions.
Medical, corporate, and government databases continue to grow. Sensors continue
to get cheaper and are increasingly connected, creating an Internet of Things,
and generating even more data. In every discipline, large, diverse, and rich
data sets are emerging, from astrophysics, to the life sciences, to the
behavioral sciences, to finance and commerce, to the humanities and to the
arts. In every discipline people want to organize, analyze, optimize and
understand their data to answer questions and to deepen insights. The science
that is transforming this ocean of data into a sea of knowledge is called data
science. This lecture will discuss how data science has changed the way in
which one of the most visible challenges to public health is handled, the 2014
Ebola outbreak in West Africa.Comment: Inaugural lecture Leiden Universit
Indonesia embraces the Data Science
The information era is the time when information is not only largely
generated, but also vastly processed in order to extract and generated more
information. The complex nature of modern living is represented by the various
kind of data. Data can be in the forms of signals, images, texts, or manifolds
resembling the horizon of observation. The task of the emerging data sciences
are to extract information from the data, for people gain new insights of the
complex world. The insights may came from the new way of the data
representation, be it a visualizations, mapping, or other. The insights may
also come from the implementation of mathematical analysis and or computational
processing giving new insights of what the states of the nature represented by
the data. Both ways implement the methodologies reducing the dimensionality of
the data. The relations between the two functions, representation and analysis
are the heart of how information in data is transformed mathematically and
computationally into new information. The paper discusses some practices, along
with various data coming from the social life in Indonesia to gain new insights
about Indonesia in the emerging data sciences. The data sciences in Indonesia
has made Indonesian Data Cartograms, Indonesian Celebrity Sentiment Mapping,
Ethno-Clustering Maps, social media community detection, and a lot more to
come, become possible. All of these are depicted as the exemplifications on how
Data Science has become integral part of the technology bringing data closer to
people.Comment: Paper presented in South East Asian Mathematical Society (SEAMS) 7th
Conference, 10 pages, 7 figure
Teaching Stats for Data Science
“Data science” is a useful catchword for methods and concepts original to the field of statistics, but typically being applied to large, multivariate, observational records. Such datasets call for techniques not often part of an introduction to statistics: modeling, consideration of covariates, sophisticated visualization, and causal reasoning. This article re-imagines introductory statistics as an introduction to data science and proposes a sequence of 10 blocks that together compose a suitable course for extracting information from contemporary data. Recent extensions to the mosaic packages for R together with tools from the “tidyverse” provide a concise and readable notation for wrangling, visualization, model-building, and model interpretation: the fundamental computational tasks of data science
- …