7 research outputs found

    Big Universe, Big Data: Machine Learning and Image Analysis for Astronomy

    Get PDF
    Astrophysics and cosmology are rich with data. The advent of wide-area digital cameras on large aperture telescopes has led to ever more ambitious surveys of the sky. Data volumes of entire surveys a decade ago can now be acquired in a single night and real-time analysis is often desired. Thus, modern astronomy requires big data know-how, in particular it demands highly efficient machine learning and image analysis algorithms. But scalability is not the only challenge: Astronomy applications touch several current machine learning research questions, such as learning from biased data and dealing with label and measurement noise. We argue that this makes astronomy a great domain for computer science research, as it pushes the boundaries of data analysis. In the following, we will present this exciting application area for data scientists. We will focus on exemplary results, discuss main challenges, and highlight some recent methodological advancements in machine learning and image analysis triggered by astronomical applications

    Universets big data

    No full text

    Recycling on a Cosmic Scale:Extracting New Information from Old Data Sets

    No full text

    Instructions for setting up and running the experiments

    No full text
    Instructions for setting up and running the experiments, as well as how to interpret the output

    Script to run experiments from Stensbo-Smidt et al. (2016)

    No full text
    Script to run the experiments from Stensbo-Smidt et al. (2016), estimating photometric redshifts and specific star formation rates for galaxies in SDSS using only magnitudes as inputs.<div><br></div><div>The script requires <i>numpy</i>, <i>pandas</i> and <i>scikit-learn</i> to run. Also, for feature selection, you will need the <i>speedynn</i> package: https://github.com/gieseke/speedynn</div

    Adaptive Cholesky Gaussian Processes

    Full text link
    We present a method to approximate Gaussian process regression models for large datasets by considering only a subset of the data. Our approach is novel in that the size of the subset is selected on the fly during exact inference with little computational overhead. From an empirical observation that the log-marginal likelihood often exhibits a linear trend once a sufficient subset of a dataset has been observed, we conclude that many large datasets contain redundant information that only slightly affects the posterior. Based on this, we provide probabilistic bounds on the full model evidence that can identify such subsets. Remarkably, these bounds are largely composed of terms that appear in intermediate steps of the standard Cholesky decomposition, allowing us to modify the algorithm to adaptively stop the decomposition once enough data have been observed
    corecore