3,466 research outputs found

    A Data Science Course for Undergraduates: Thinking with Data

    Get PDF
    Data science is an emerging interdisciplinary field that combines elements of mathematics, statistics, computer science, and knowledge in a particular application domain for the purpose of extracting meaningful information from the increasingly sophisticated array of data available in many settings. These data tend to be non-traditional, in the sense that they are often live, large, complex, and/or messy. A first course in statistics at the undergraduate level typically introduces students with a variety of techniques to analyze small, neat, and clean data sets. However, whether they pursue more formal training in statistics or not, many of these students will end up working with data that is considerably more complex, and will need facility with statistical computing techniques. More importantly, these students require a framework for thinking structurally about data. We describe an undergraduate course in a liberal arts environment that provides students with the tools necessary to apply data science. The course emphasizes modern, practical, and useful skills that cover the full data analysis spectrum, from asking an interesting question to acquiring, managing, manipulating, processing, querying, analyzing, and visualizing data, as well communicating findings in written, graphical, and oral forms.Comment: 21 pages total including supplementary material

    A Grammar for Reproducible and Painless Extract-Transform-Load Operations on Medium Data

    Get PDF
    Many interesting data sets available on the Internet are of a medium size---too big to fit into a personal computer's memory, but not so large that they won't fit comfortably on its hard disk. In the coming years, data sets of this magnitude will inform vital research in a wide array of application domains. However, due to a variety of constraints they are cumbersome to ingest, wrangle, analyze, and share in a reproducible fashion. These obstructions hamper thorough peer-review and thus disrupt the forward progress of science. We propose a predictable and pipeable framework for R (the state-of-the-art statistical computing environment) that leverages SQL (the venerable database architecture and query language) to make reproducible research on medium data a painless reality.Comment: 30 pages, plus supplementary material

    Camera systematics and three-point correlations in modern photometric galaxy surveys

    Get PDF
    The goal of modern cosmology, broadly speaking, is to understand the behavior of the Universe at large scales, including the evolution of dark matter and dark energy over cosmic time. In the context of the modern paradigm of a universe dominated by dark energy and cold dark matter (LCDM), the goal is to detect deviations from LCDM predictions (new physics), and in the absence of those, to infer the value of the LCDM parameters. Advances this endeavor will require both improved constraints on systematic errors in raw astronomical data as well as improved statistical methods for extracting cosmological information from galaxy catalogs. Toward these ends, the first half of this thesis discusses methods for improving our ability to make precise and accurate measurements of galaxies in the universe using astronomical CCD imaging cameras. The second half of this thesis discusses a novel application of a statistical probe of the cosmic web of dark matter, the galaxy three-point correlation function, to photometric galaxy surveys, that allows us to extract more information of cosmological interest from the observed galaxy distribution. Both lines of research discussed in this thesis will be useful in future analyses of data from upcoming optical galaxy surveys, including the Large Synoptic Survey Telescope

    Determining the effects of methanol, ethanol, isopropanol, and glycerol on both thermal stability and catalytic activity of Rv0045c, an enzyme from M. tuberculosis

    Get PDF
    Tuberculosis (TB) is a highly infectious respiratory disease contracted through the inhalation of Mycobacterium tuberculosis. Serine hydrolases are abundant in M. tuberculosis and serve as a model for studying the inhibition of TB. Rv0045c is an example of such with little known regarding its biological function. Rv0045c was exposed to methanol, ethanol, isopropanol, or glycerol and the effects of varying concentration of these alcohols on the catalytic efficiency and thermal stability of the enzyme was determined. The thermal stability of Rv0045c was found to decrease with concentration of methanol, ethanol, or isopropanol. The opposite was true of the thermal stability when exposed to increasing concentrations of glycerol. The effect of the alcohols on enzyme kinetics, however, were much less straightforward. Data suggests that a concentration of 10% alcohol by volume is optimal for catalytic activity

    Bioenergy Crops for Ecosystem Health and Sustainability

    Get PDF
    The growing of crops for bioenergy has been subject to much recent criticism, as taking away land which could be used for food production or biodiversity conservation. This book challenges some commonly-held ideas about biofuels, bioenergy and energy cropping, particularly that energy crops pose an inherent threat to ecosystems, which must be mitigated. The book recognises that certain energy crops (e.g. oil palm for biodiesel) have generated sustainability concerns, but also asks the question ""is there a better way?"" of using energy crops to strategically enhance ecosystem functions. It draws on numerous case studies, including where energy crops have had negative outcomes as well as well as cases where energy crops have produced benefits for ecosystem health, such as soil and water protection from the cropping of willow and poplar in Europe and the use of mallee eucalypts to fight salinity in Western Australia. While exploring this central argument, the volume also provides a systematic overview of the socio-economic sustainability issues surrounding bioenergy

    Lessons from Between the White Lines for Isolated Data Scientists

    Get PDF
    Many current and future data scientists will be “isolated”—working alone or in small teams within a larger organization. This isolation brings certain challenges as well as freedoms. Drawing on my considerable experience both working in the professional sports industry and teaching in academia, I discuss troubled waters likely to be encountered by newly minted data scientists and offer advice about how to navigate them. Neither the issues raised nor the advice given are particular to sports and should be applicable to a wide range of knowledge domains

    Christians, Pagans, and Death

    Get PDF
    It is commonly believed that Christianity was a new and original religion, but in fact, many pagan religions contributed to Christianity\u27s ideology. In my talk, I will focus on how these religions influenced Christian views of the afterlife, and with that, their views on what the soul is and what it means to be a good person
    • …
    corecore