26 research outputs found
Hack Weeks as a model for Data Science Education and Collaboration
Across almost all scientific disciplines, the instruments that record our
experimental data and the methods required for storage and data analysis are
rapidly increasing in complexity. This gives rise to the need for scientific
communities to adapt on shorter time scales than traditional university
curricula allow for, and therefore requires new modes of knowledge transfer.
The universal applicability of data science tools to a broad range of problems
has generated new opportunities to foster exchange of ideas and computational
workflows across disciplines. In recent years, hack weeks have emerged as an
effective tool for fostering these exchanges by providing training in modern
data analysis workflows. While there are variations in hack week
implementation, all events consist of a common core of three components:
tutorials in state-of-the-art methodology, peer-learning and project work in a
collaborative environment. In this paper, we present the concept of a hack week
in the larger context of scientific meetings and point out similarities and
differences to traditional conferences. We motivate the need for such an event
and present in detail its strengths and challenges. We find that hack weeks are
successful at cultivating collaboration and the exchange of knowledge.
Participants self-report that these events help them both in their day-to-day
research as well as their careers. Based on our results, we conclude that hack
weeks present an effective, easy-to-implement, fairly low-cost tool to
positively impact data analysis literacy in academic disciplines, foster
collaboration and cultivate best practices.Comment: 15 pages, 2 figures, submitted to PNAS, all relevant code available
at https://github.com/uwescience/HackWeek-Writeu
Classification of Stellar Spectra with LLE
We investigate the use of dimensionality reduction techniques for the
classification of stellar spectra selected from the SDSS. Using local linear
embedding (LLE), a technique that preserves the local (and possibly non-linear)
structure within high dimensional data sets, we show that the majority of
stellar spectra can be represented as a one dimensional sequence within a three
dimensional space. The position along this sequence is highly correlated with
spectral temperature. Deviations from this "stellar locus" are indicative of
spectra with strong emission lines (including misclassified galaxies) or broad
absorption lines (e.g. Carbon stars). Based on this analysis, we propose a
hierarchical classification scheme using LLE that progressively identifies and
classifies stellar spectra in a manner that requires no feature extraction and
that can reproduce the classic MK classifications to an accuracy of one type.Comment: 15 pages, 13 figures; accepted for publication in The Astronomical
Journa
SNANA: A Public Software Package for Supernova Analysis
We describe a general analysis package for supernova (SN) light curves,
called SNANA, that contains a simulation, light curve fitter, and cosmology
fitter. The software is designed with the primary goal of using SNe Ia as
distance indicators for the determination of cosmological parameters, but it
can also be used to study efficiencies for analyses of SN rates, estimate
contamination from non-Ia SNe, and optimize future surveys. Several SN models
are available within the same software architecture, allowing technical
features such as K-corrections to be consistently used among multiple models,
and thus making it easier to make detailed comparisons between models. New and
improved light-curve models can be easily added. The software works with
arbitrary surveys and telescopes and has already been used by several
collaborations, leading to more robust and easy-to-use code. This software is
not intended as a final product release, but rather it is designed to undergo
continual improvements from the community as more is learned about SNe. Below
we give an overview of the SNANA capabilities, as well as some of its
limitations. Interested users can find software downloads and more detailed
information from the manuals at http://www.sdss.org/supernova/SNANA.html .Comment: Accepted for publication in PAS
Tests of Modified Gravity with Dwarf Galaxies
In modified gravity theories that seek to explain cosmic acceleration, dwarf
galaxies in low density environments can be subject to enhanced forces. The
class of scalar-tensor theories, which includes f(R) gravity, predict such a
force enhancement (massive galaxies like the Milky Way can evade it through a
screening mechanism that protects the interior of the galaxy from this "fifth"
force). We study observable deviations from GR in the disks of late-type dwarf
galaxies moving under gravity. The fifth-force acts on the dark matter and HI
gas disk, but not on the stellar disk owing to the self-screening of main
sequence stars. We find four distinct observable effects in such disk galaxies:
1. A displacement of the stellar disk from the HI disk. 2. Warping of the
stellar disk along the direction of the external force. 3. Enhancement of the
rotation curve measured from the HI gas compared to that of the stellar disk.
4. Asymmetry in the rotation curve of the stellar disk. We estimate that the
spatial effects can be up to 1 kpc and the rotation velocity effects about 10
km/s in infalling dwarf galaxies. Such deviations are measurable: we expect
that with a careful analysis of a sample of nearby dwarf galaxies one can
improve astrophysical constraints on gravity theories by over three orders of
magnitude, and even solar system constraints by one order of magnitude. Thus
effective tests of gravity along the lines suggested by Hui et al (2009) and
Jain (2011) can be carried out with low-redshift galaxies, though care must be
exercised in understanding possible complications from astrophysical effects.Comment: 26 pages, 9 figure
API design for machine learning software: experiences from the scikit-learn project
Scikit-learn is an increasingly popular machine learning li- brary. Written
in Python, it is designed to be simple and efficient, accessible to
non-experts, and reusable in various contexts. In this paper, we present and
discuss our design choices for the application programming interface (API) of
the project. In particular, we describe the simple and elegant interface shared
by all learning and processing units in the library and then discuss its
advantages in terms of composition and reusability. The paper also comments on
implementation details specific to the Python ecosystem and analyzes obstacles
faced by users and developers of the library
Scikit-learn: Machine Learning in Python
International audienceScikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. This package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. Emphasis is put on ease of use, performance, documentation, and API consistency. It has minimal dependencies and is distributed under the simplified BSD license, encouraging its use in both academic and commercial settings. Source code, binaries, and documentation can be downloaded from http://scikit-learn.sourceforge.net