9,133,466 research outputs found
Data Science and Ebola
Data Science---Today, everybody and everything produces data. People produce
large amounts of data in social networks and in commercial transactions.
Medical, corporate, and government databases continue to grow. Sensors continue
to get cheaper and are increasingly connected, creating an Internet of Things,
and generating even more data. In every discipline, large, diverse, and rich
data sets are emerging, from astrophysics, to the life sciences, to the
behavioral sciences, to finance and commerce, to the humanities and to the
arts. In every discipline people want to organize, analyze, optimize and
understand their data to answer questions and to deepen insights. The science
that is transforming this ocean of data into a sea of knowledge is called data
science. This lecture will discuss how data science has changed the way in
which one of the most visible challenges to public health is handled, the 2014
Ebola outbreak in West Africa.Comment: Inaugural lecture Leiden Universit
Mechanism Design for Data Science
Good economic mechanisms depend on the preferences of participants in the
mechanism. For example, the revenue-optimal auction for selling an item is
parameterized by a reserve price, and the appropriate reserve price depends on
how much the bidders are willing to pay. A mechanism designer can potentially
learn about the participants' preferences by observing historical data from the
mechanism; the designer could then update the mechanism in response to learned
preferences to improve its performance. The challenge of such an approach is
that the data corresponds to the actions of the participants and not their
preferences. Preferences can potentially be inferred from actions but the
degree of inference possible depends on the mechanism. In the optimal auction
example, it is impossible to learn anything about preferences of bidders who
are not willing to pay the reserve price. These bidders will not cast bids in
the auction and, from historical bid data, the auctioneer could never learn
that lowering the reserve price would give a higher revenue (even if it would).
To address this impossibility, the auctioneer could sacrifice revenue
optimality in the initial auction to obtain better inference properties so that
the auction's parameters can be adapted to changing preferences in the future.
This paper develops the theory for optimal mechanism design subject to good
inferability
Teaching Stats for Data Science
“Data science” is a useful catchword for methods and concepts original to the field of statistics, but typically being applied to large, multivariate, observational records. Such datasets call for techniques not often part of an introduction to statistics: modeling, consideration of covariates, sophisticated visualization, and causal reasoning. This article re-imagines introductory statistics as an introduction to data science and proposes a sequence of 10 blocks that together compose a suitable course for extracting information from contemporary data. Recent extensions to the mosaic packages for R together with tools from the “tidyverse” provide a concise and readable notation for wrangling, visualization, model-building, and model interpretation: the fundamental computational tasks of data science
Data Science and Big Data in Energy Forecasting
This editorial summarizes the performance of the special issue entitled Data Science and Big Data in Energy Forecasting, which was published at MDPI’s Energies journal. The special issue took place in 2017 and accepted a total of 13 papers from 7 different countries. Electrical, solar and wind energy forecasting were the most analyzed topics, introducing new methods with applications of utmost relevance.Ministerio de Competitividad TIN2014-55894-C2-RMinisterio de Competitividad TIN2017-88209-C2-
Using Data in Undergraduate Science Classrooms
Provides pedagogical insight concerning the skill of using data The resource being annotated is: http://www.dlese.org/dds/catalog_DATA-CLASS-000-000-000-007.htm
Managing Research Data in Big Science
The project which led to this report was funded by JISC in 2010--2011 as part of its 'Managing Research Data' programme, to examine the way in which Big Science data is managed, and produce any recommendations which may be appropriate. Big science data is different: it comes in large volumes, and it is shared and exploited in ways which may differ from other disciplines. This project has explored these differences using as a case-study Gravitational Wave data generated by the LSC, and has produced recommendations intended to be useful variously to JISC, the funding council (STFC) and the LSC community. In Sect. 1 we define what we mean by 'big science', describe the overall data culture there, laying stress on how it necessarily or contingently differs from other disciplines. In Sect. 2 we discuss the benefits of a formal data-preservation strategy, and the cases for open data and for well-preserved data that follow from that. This leads to our recommendations that, in essence, funders should adopt rather light-touch prescriptions regarding data preservation planning: normal data management practice, in the areas under study, corresponds to notably good practice in most other areas, so that the only change we suggest is to make this planning more formal, which makes it more easily auditable, and more amenable to constructive criticism. In Sect. 3 we briefly discuss the LIGO data management plan, and pull together whatever information is available on the estimation of digital preservation costs. The report is informed, throughout, by the OAIS reference model for an open archive
- …
