217 research outputs found

    Eigengalaxies: describing galaxy morphology using principal components in image space

    Get PDF
    This article has been accepted for publication in Monthly Notices of the Royal Astronomical Society © 2020 The Author(s) Published by Oxford University Press on behalf of the Royal Astronomical SocietyWe demonstrate how galaxy morphologies can be represented by weighted sums of "eigengalaxies" and how eigengalaxies can be used in a probabilistic framework to enable principled and simplified approaches in a variety of applications. Eigengalaxies can be derived from a Principal Component Analysis (PCA) of sets of single- or multi-band images. They encode the image space equivalent of basis vectors that can be combined to describe the structural properties of large samples of galaxies in a massively reduced manner. As an illustration, we show how a sample of 10,243 galaxies in the Hubble Space Telescope CANDELS survey can be represented by just 12 eigengalaxies. We show in some detail how this image space may be derived and tested. We also describe a probabilistic extension to PCA (PPCA) which enables the eigengalaxy framework to assign probabilities to galaxies. We present four practical applications of the probabilistic eigengalaxy framework that are particularly relevant for the next generation of large imaging surveys: we (i) show how low likelihood galaxies make for natural candidates for outlier detection (ii) demonstrate how missing data can be predicted (iii) show how a similarity search can be performed on exemplars (iv) demonstrate how unsupervised clustering of objects can be implemented.Peer reviewe

    Transient-optimized real-bogus classification with Bayesian convolutional neural networks - sifting the GOTO candidate stream

    Get PDF
    Large-scale sky surveys have played a transformative role in our understanding of astrophysical transients, only made possible by increasingly powerful machine learning-based filtering to accurately sift through the vast quantities of incoming data generated. In this paper, we present a new real-bogus classifier based on a Bayesian convolutional neural network that provides nuanced, uncertainty-aware classification of transient candidates in difference imaging, and demonstrate its application to the datastream from the GOTO wide-field optical survey. Not only are candidates assigned a well-calibrated probability of being real, but also an associated confidence that can be used to prioritize human vetting efforts and inform future model optimization via active learning. To fully realize the potential of this architecture, we present a fully automated training set generation method which requires no human labelling, incorporating a novel data-driven augmentation method to significantly improve the recovery of faint and nuclear transient sources. We achieve competitive classification accuracy (FPR and FNR both below 1 percent) compared against classifiers trained with fully human-labelled data sets, while being significantly quicker and less labour-intensive to build. This data-driven approach is uniquely scalable to the upcoming challenges and data needs of next-generation transient surveys. We make our data generation and model training codes available to the community

    Computational studies of genome evolution and regulation

    Get PDF
    This thesis takes on the challenge of extracting information from large volumes of biological data produced with newly established experimental techniques. The different types of information present in a particular dataset have been carefully identified to maximise the information gained from the data. This also precludes the attempts to infer the types of information that are not present in the data. In the first part of the thesis I examined the evolutionary origins of de novo taxonomically restricted genes (TRGs) in Drosophila subgenus. De novo TRGs are genes that have originated after the speciation of a particular clade from previously non-coding regions - functional ncRNA, within introns or alternative frames of older protein-coding genes, or from intergenic sequences. TRGs are clade-specific tool-kits that are likely to contain proteins with yet undocumented functions and new protein folds that are yet to be discovered. One of the main challenges in studying de novo TRGs is the trade-off between false positives (non-functional open reading frames) and false negatives (true TRGs that have properties distinct from well established genes). Here I identified two de novo TRG families in Drosophila subgenus that have not been previously reported as de novo originated genes, and to our knowledge they are the best candidates identified so far for experimental studies aimed at elucidating the properties of de novo genes. In the second part of the thesis I examined the information contained in single cell RNA sequencing (scRNA-seq) data and propose a method for extracting biological knowledge from this data using generative neural networks. The main challenge is the noisiness of scRNA-seq data - the number of transcripts sequenced is not proportional to the number of mRNAs present in the cell. I used an autoencoder to reduce the dimensionality of the data without making untestable assumptions about the data. This embedding into lower dimensional space alongside the features learned by an autoencoder contains information about the cell populations, differentiation trajectories and the regulatory relationships between the genes. Unlike most methods currently used, an autoencoder does not assume that these regulatory relationships are the same in all cells in the data set. The main advantages of our approach is that it makes minimal assumptions about the data, it is robust to noise and it is possible to assess its performance. In the final part of the thesis I summarise lessons learnt from analysing various types of biological data and make suggestions for the future direction of similar computational studies

    Transient-optimised real-bogus classification with Bayesian Convolutional Neural Networks -- sifting the GOTO candidate stream

    Get PDF
    Large-scale sky surveys have played a transformative role in our understanding of astrophysical transients, only made possible by increasingly powerful machine learning-based filtering to accurately sift through the vast quantities of incoming data generated. In this paper, we present a new real-bogus classifier based on a Bayesian convolutional neural network that provides nuanced, uncertainty-aware classification of transient candidates in difference imaging, and demonstrate its application to the datastream from the GOTO wide-field optical survey. Not only are candidates assigned a well-calibrated probability of being real, but also an associated confidence that can be used to prioritise human vetting efforts and inform future model optimisation via active learning. To fully realise the potential of this architecture, we present a fully-automated training set generation method which requires no human labelling, incorporating a novel data-driven augmentation method to significantly improve the recovery of faint and nuclear transient sources. We achieve competitive classification accuracy (FPR and FNR both below 1%) compared against classifiers trained with fully human-labelled datasets, whilst being significantly quicker and less labour-intensive to build. This data-driven approach is uniquely scalable to the upcoming challenges and data needs of next-generation transient surveys. We make our data generation and model training codes available to the community

    EuCAPT White Paper: Opportunities and Challenges for Theoretical Astroparticle Physics in the Next Decade

    Get PDF
    Astroparticle physics is undergoing a profound transformation, due to a series of extraordinary new results, such as the discovery of high-energy cosmic neutrinos with IceCube, the direct detection of gravitational waves with LIGO and Virgo, and many others. This white paper is the result of a collaborative effort that involved hundreds of theoretical astroparticle physicists and cosmologists, under the coordination of the European Consortium for Astroparticle Theory (EuCAPT). Addressed to the whole astroparticle physics community, it explores upcoming theoretical opportunities and challenges for our field of research, with particular emphasis on the possible synergies among different subfields, and the prospects for solving the most fundamental open questions with multi-messenger observations.Comment: White paper of the European Consortium for Astroparticle Theory (EuCAPT). 135 authors, 400 endorsers, 133 pages, 1382 reference

    Accelerating inference in cosmology and seismology with generative models

    Get PDF
    Statistical analyses in many physical sciences require running simulations of the system that is being examined. Such simulations provide complementary information to the theoretical analytic models, and represent an invaluable tool to investigate the dynamics of complex systems. However, running simulations is often computationally expensive, and the high number of required mocks to obtain sufficient statistical precision often makes the problem intractable. In recent years, machine learning has emerged as a possible solution to speed up the generation of scientific simulations. Machine learning generative models usually rely on iteratively feeding some true simulations to the algorithm, until it learns the important common features and is capable of producing accurate simulations in a fraction of the time. In this thesis, advanced machine learning algorithms are explored and applied to the challenge of accelerating physical simulations. Various techniques are applied to problems in cosmology and seismology, showing benefits and limitations of such an approach through a critical analysis. The algorithms are applied to compelling problems in the fields, including surrogate models for the seismic wave equation, the emulation of cosmological summary statistics, and the fast generation of large simulations of the Universe. These problems are formulated within a relevant statistical framework, and tied to real data analysis pipelines. In the conclusions, a critical overview of the results is provided, together with an outlook over possible future expansions of the work presented in the thesis

    Gravitational Lensing

    Full text link
    Gravitational lensing has developed into one of the most powerful tools for the analysis of the dark universe. This review summarises the theory of gravitational lensing, its main current applications and representative results achieved so far. It has two parts. In the first, starting from the equation of geodesic deviation, the equations of thin and extended gravitational lensing are derived. In the second, gravitational lensing by stars and planets, galaxies, galaxy clusters and large-scale structures is discussed and summarised.Comment: Invited review article to appear in Classical and Quantum Gravity, 85 pages, 15 figure

    Deep Learning, Shallow Dips: Transit light curves have never been so trendy

    Get PDF
    At the crossroad between photometry and time-domain astronomy, light curves are invaluable data objects to study distant events and sources of light even when they can not be spatially resolved. In particular, the field of exoplanet sciences has tremendously benefited from acquired stellar light curves to detect and characterise a majority of the outer worlds that we know today. Yet, their analysis is challenged by the astrophysical and instrumental noise often diluting the signals of interest. For instance, the detection of shallow dips caused by transiting exoplanets in stellar light curves typically require a precision of the order of 1 ppm to 100 ppm in units of stellar flux, and their very study directly depends upon our capacity to correct for instrumental and stellar trends. The increasing number of light curves acquired from space and ground-based telescopes—of the order of billions—opens up the possibility for global, efficient, automated processing algorithms to replace individual, parametric and hard-coded ones. Luckily, the field of deep learning is also progressing fast, revolutionising time series problems and applications. This reinforces the incentive to develop data-driven approaches hand-in-hand with existing scientific models and expertise. With the study of exoplanetary transits in focus, I developed automated approaches to learn and correct for the time-correlated noise in and across light curves. In particular, I present (i) a deep recurrent model trained via a forecasting objective to detrend individual transit light curves (e.g. from the Spitzer space telescope); (ii) the power of a Transformer-based model leveraging whole datasets of light curves (e.g. from large transit surveys) to learn the trend via a masked objective; (iii) a hybrid and flexible framework to combine neural networks with transit physics
    corecore