88,538 research outputs found
A Data Science Course for Undergraduates: Thinking with Data
Data science is an emerging interdisciplinary field that combines elements of
mathematics, statistics, computer science, and knowledge in a particular
application domain for the purpose of extracting meaningful information from
the increasingly sophisticated array of data available in many settings. These
data tend to be non-traditional, in the sense that they are often live, large,
complex, and/or messy. A first course in statistics at the undergraduate level
typically introduces students with a variety of techniques to analyze small,
neat, and clean data sets. However, whether they pursue more formal training in
statistics or not, many of these students will end up working with data that is
considerably more complex, and will need facility with statistical computing
techniques. More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data
science. The course emphasizes modern, practical, and useful skills that cover
the full data analysis spectrum, from asking an interesting question to
acquiring, managing, manipulating, processing, querying, analyzing, and
visualizing data, as well communicating findings in written, graphical, and
oral forms.Comment: 21 pages total including supplementary material
Scalable visualisation methods for modern Generalized Additive Models
In the last two decades the growth of computational resources has made it
possible to handle Generalized Additive Models (GAMs) that formerly were too
costly for serious applications. However, the growth in model complexity has
not been matched by improved visualisations for model development and results
presentation. Motivated by an industrial application in electricity load
forecasting, we identify the areas where the lack of modern visualisation tools
for GAMs is particularly severe, and we address the shortcomings of existing
methods by proposing a set of visual tools that a) are fast enough for
interactive use, b) exploit the additive structure of GAMs, c) scale to large
data sets and d) can be used in conjunction with a wide range of response
distributions. All the new visual methods proposed in this work are implemented
by the mgcViz R package, which can be found on the Comprehensive R Archive
Network
Neutrino Mass from Laboratory: Contribution of Double Beta Decay to the Neutrino Mass Matrix
Double beta decay is indispensable to solve the question of the neutrino mass
matrix together with oscillation experiments. The most sensitive
experiment - since eight years the HEIDELBERG-MOSCOW experiment in Gran-Sasso -
already now, with the experimental limit of eV practically
excludes degenerate mass scenarios allowing neutrinos as hot dark matter
in the universe for the smallangle MSW solution of the solar neutrino problem.
It probes cosmological models including hot dark matter already now on the
level of future satellite experiments MAP and PLANCK. It further probes many
topics of beyond SM physics at the TeV scale. Future experiments should give
access to the multi-TeV range and complement on many ways the search for new
physics at future colliders like LHC and NLC. For neutrino physics some of them
(GENIUS) will allow to test almost all neutrino mass scenarios allowed by the
present neutrino oscillation experiments.Comment: 5 pages, revtex, 6 figures, Talk was presented at International
Europhysics Neutrino Oscillation Workshop, Conca Specchiulla (Otranto,
Italy), September 9-16, 2000, to be published in Nucl. Phys. B (2001), Home
Page of Heidelberg-Moscow Experiment: http://www.mpi-hd.mpg.de/non_acc
Recommended from our members
The dynamics of computerization in a social science research team : a case study of infrastructure, strategies, and skills
This paper examines the dynamics of Computerization in a PC-oriented research group through a case study. The time and skill in integrating computing into the labor processes of research are often significant "hidden costs" of computerization. Computing infrastructure plays a key role in reducing these costs may be enhanced by careful organization. We illustrate computerization strategies that we have found to be productive and unproductive. Appropriate computerization strategies depend as much on the structuring of resources and interests in the larger social setting, as on a technical characterization of tasks
Selection of Statistical Software for Solving Big Data Problems: A Guide for Businesses, Students, and Universities
The need for analysts with expertise in big data software is becoming more apparent in today’s society. Unfortunately, the demand for these analysts far exceeds the number available. A potential way to combat this shortage is to identify the software taught in colleges or universities. This article will examine four data analysis software—Excel add-ins, SPSS, SAS, and R—and we will outline the cost, training, and statistical methods/tests/uses for each of these software. It will further explain implications for universities and future students
- …