13,663 research outputs found

    The IPAC Image Subtraction and Discovery Pipeline for the intermediate Palomar Transient Factory

    Get PDF
    We describe the near real-time transient-source discovery engine for the intermediate Palomar Transient Factory (iPTF), currently in operations at the Infrared Processing and Analysis Center (IPAC), Caltech. We coin this system the IPAC/iPTF Discovery Engine (or IDE). We review the algorithms used for PSF-matching, image subtraction, detection, photometry, and machine-learned (ML) vetting of extracted transient candidates. We also review the performance of our ML classifier. For a limiting signal-to-noise ratio of 4 in relatively unconfused regions, "bogus" candidates from processing artifacts and imperfect image subtractions outnumber real transients by ~ 10:1. This can be considerably higher for image data with inaccurate astrometric and/or PSF-matching solutions. Despite this occasionally high contamination rate, the ML classifier is able to identify real transients with an efficiency (or completeness) of ~ 97% for a maximum tolerable false-positive rate of 1% when classifying raw candidates. All subtraction-image metrics, source features, ML probability-based real-bogus scores, contextual metadata from other surveys, and possible associations with known Solar System objects are stored in a relational database for retrieval by the various science working groups. We review our efforts in mitigating false-positives and our experience in optimizing the overall system in response to the multitude of science projects underway with iPTF.Comment: 66 pages, 21 figures, 7 tables, accepted by PAS

    A method for tailoring the information content of a software process model

    Get PDF
    The framework is defined for a general method for selecting a necessary and sufficient subset of a general software life cycle's information products, to support new software development process. Procedures for characterizing problem domains in general and mapping to a tailored set of life cycle processes and products is presented. An overview of the method is shown using the following steps: (1) During the problem concept definition phase, perform standardized interviews and dialogs between developer and user, and between user and customer; (2) Generate a quality needs profile of the software to be developed, based on information gathered in step 1; (3) Translate the quality needs profile into a profile of quality criteria that must be met by the software to satisfy the quality needs; (4) Map the quality criteria to set of accepted processes and products for achieving each criterion; (5) Select the information products which match or support the accepted processes and product of step 4; and (6) Select the design methodology which produces the information products selected in step 5

    Bayesian optimization for materials design

    Full text link
    We introduce Bayesian optimization, a technique developed for optimizing time-consuming engineering simulations and for fitting machine learning models on large datasets. Bayesian optimization guides the choice of experiments during materials design and discovery to find good material designs in as few experiments as possible. We focus on the case when materials designs are parameterized by a low-dimensional vector. Bayesian optimization is built on a statistical technique called Gaussian process regression, which allows predicting the performance of a new design based on previously tested designs. After providing a detailed introduction to Gaussian process regression, we introduce two Bayesian optimization methods: expected improvement, for design problems with noise-free evaluations; and the knowledge-gradient method, which generalizes expected improvement and may be used in design problems with noisy evaluations. Both methods are derived using a value-of-information analysis, and enjoy one-step Bayes-optimality

    The Zwicky Transient Facility: Data Processing, Products, and Archive

    Get PDF
    The Zwicky Transient Facility (ZTF) is a new robotic time-domain survey currently in progress using the Palomar 48-inch Schmidt Telescope. ZTF uses a 47 square degree field with a 600 megapixel camera to scan the entire northern visible sky at rates of ~3760 square degrees/hour to median depths of g ~ 20.8 and r ~ 20.6 mag (AB, 5sigma in 30 sec). We describe the Science Data System that is housed at IPAC, Caltech. This comprises the data-processing pipelines, alert production system, data archive, and user interfaces for accessing and analyzing the products. The realtime pipeline employs a novel image-differencing algorithm, optimized for the detection of point source transient events. These events are vetted for reliability using a machine-learned classifier and combined with contextual information to generate data-rich alert packets. The packets become available for distribution typically within 13 minutes (95th percentile) of observation. Detected events are also linked to generate candidate moving-object tracks using a novel algorithm. Objects that move fast enough to streak in the individual exposures are also extracted and vetted. The reconstructed astrometric accuracy per science image with respect to Gaia is typically 45 to 85 milliarcsec. This is the RMS per axis on the sky for sources extracted with photometric S/N >= 10. The derived photometric precision (repeatability) at bright unsaturated fluxes varies between 8 and 25 millimag. Photometric calibration accuracy with respect to Pan-STARRS1 is generally better than 2%. The products support a broad range of scientific applications: fast and young supernovae, rare flux transients, variable stars, eclipsing binaries, variability from active galactic nuclei, counterparts to gravitational wave sources, a more complete census of Type Ia supernovae, and Solar System objects.Comment: 30 pages, 16 figures, Published in PASP Focus Issue on the Zwicky Transient Facility (doi: 10.1088/1538-3873/aae8ac

    Visualisation of bioinformatics datasets

    Get PDF
    Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset
    corecore