3,466 research outputs found
A Data Science Course for Undergraduates: Thinking with Data
Data science is an emerging interdisciplinary field that combines elements of
mathematics, statistics, computer science, and knowledge in a particular
application domain for the purpose of extracting meaningful information from
the increasingly sophisticated array of data available in many settings. These
data tend to be non-traditional, in the sense that they are often live, large,
complex, and/or messy. A first course in statistics at the undergraduate level
typically introduces students with a variety of techniques to analyze small,
neat, and clean data sets. However, whether they pursue more formal training in
statistics or not, many of these students will end up working with data that is
considerably more complex, and will need facility with statistical computing
techniques. More importantly, these students require a framework for thinking
structurally about data. We describe an undergraduate course in a liberal arts
environment that provides students with the tools necessary to apply data
science. The course emphasizes modern, practical, and useful skills that cover
the full data analysis spectrum, from asking an interesting question to
acquiring, managing, manipulating, processing, querying, analyzing, and
visualizing data, as well communicating findings in written, graphical, and
oral forms.Comment: 21 pages total including supplementary material
A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data
Many interesting data sets available on the Internet are of a medium
size---too big to fit into a personal computer's memory, but not so large that
they won't fit comfortably on its hard disk. In the coming years, data sets of
this magnitude will inform vital research in a wide array of application
domains. However, due to a variety of constraints they are cumbersome to
ingest, wrangle, analyze, and share in a reproducible fashion. These
obstructions hamper thorough peer-review and thus disrupt the forward progress
of science. We propose a predictable and pipeable framework for R (the
state-of-the-art statistical computing environment) that leverages SQL (the
venerable database architecture and query language) to make reproducible
research on medium data a painless reality.Comment: 30 pages, plus supplementary material
Camera systematics and three-point correlations in modern photometric galaxy surveys
The goal of modern cosmology, broadly speaking, is to understand the behavior of the Universe at large scales, including the evolution of dark matter and dark energy over cosmic time. In the context of the modern paradigm of a universe dominated by dark energy and cold dark matter (LCDM), the goal is to detect deviations from LCDM predictions (new physics), and in the absence of those, to infer the value of the LCDM parameters. Advances this endeavor will require both improved constraints on systematic errors in raw astronomical data as well as improved statistical methods for extracting cosmological information from galaxy catalogs. Toward these ends, the first half of this thesis discusses methods for improving our ability to make precise and accurate measurements of galaxies in the universe using astronomical CCD imaging cameras. The second half of this thesis discusses a novel application of a statistical probe of the cosmic web of dark matter, the galaxy three-point correlation function, to photometric galaxy surveys, that allows us to extract more information of cosmological interest from the observed galaxy distribution. Both lines of research discussed in this thesis will be useful in future analyses of data from upcoming optical galaxy surveys, including the Large Synoptic Survey Telescope
Determining the effects of methanol, ethanol, isopropanol, and glycerol on both thermal stability and catalytic activity of Rv0045c, an enzyme from M. tuberculosis
Tuberculosis (TB) is a highly infectious respiratory disease contracted through the inhalation of Mycobacterium tuberculosis. Serine hydrolases are abundant in M. tuberculosis and serve as a model for studying the inhibition of TB. Rv0045c is an example of such with little known regarding its biological function. Rv0045c was exposed to methanol, ethanol, isopropanol, or glycerol and the effects of varying concentration of these alcohols on the catalytic efficiency and thermal stability of the enzyme was determined. The thermal stability of Rv0045c was found to decrease with concentration of methanol, ethanol, or isopropanol. The opposite was true of the thermal stability when exposed to increasing concentrations of glycerol. The effect of the alcohols on enzyme kinetics, however, were much less straightforward. Data suggests that a concentration of 10% alcohol by volume is optimal for catalytic activity
Bioenergy Crops for Ecosystem Health and Sustainability
The growing of crops for bioenergy has been subject to much recent criticism, as taking away land which could be used for food production or biodiversity conservation. This book challenges some commonly-held ideas about biofuels, bioenergy and energy cropping, particularly that energy crops pose an inherent threat to ecosystems, which must be mitigated.
The book recognises that certain energy crops (e.g. oil palm for biodiesel) have generated sustainability concerns, but also asks the question ""is there a better way?"" of using energy crops to strategically enhance ecosystem functions. It draws on numerous case studies, including where energy crops have had negative outcomes as well as well as cases where energy crops have produced benefits for ecosystem health, such as soil and water protection from the cropping of willow and poplar in Europe and the use of mallee eucalypts to fight salinity in Western Australia. While exploring this central argument, the volume also provides a systematic overview of the socio-economic sustainability issues surrounding bioenergy
Lessons from Between the White Lines for Isolated Data Scientists
Many current and future data scientists will be “isolated”—working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports industry and teaching in academia, I discuss troubled waters likely to be encountered by newly minted data scientists and offer advice about how to navigate them. Neither the issues raised nor the advice given are particular to sports and should be applicable to a wide range of knowledge domains
Christians, Pagans, and Death
It is commonly believed that Christianity was a new and original religion, but in fact, many pagan religions contributed to Christianity\u27s ideology. In my talk, I will focus on how these religions influenced Christian views of the afterlife, and with that, their views on what the soul is and what it means to be a good person
- …