1,816,247 research outputs found

    The LSST Data Mining Research Agenda

    Full text link
    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night); multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.Comment: 5 pages, Presented at the "Classification and Discovery in Large Astronomical Surveys" meeting, Ringberg Castle, 14-17 October, 200

    Large-Scale Plant Classification with Deep Neural Networks

    Full text link
    This paper discusses the potential of applying deep learning techniques for plant classification and its usage for citizen science in large-scale biodiversity monitoring. We show that plant classification using near state-of-the-art convolutional network architectures like ResNet50 achieves significant improvements in accuracy compared to the most widespread plant classification application in test sets composed of thousands of different species labels. We find that the predictions can be confidently used as a baseline classification in citizen science communities like iNaturalist (or its Spanish fork, Natusfera) which in turn can share their data with biodiversity portals like GBIF.Comment: 5 pages, 3 figures, 1 table. Published at Proocedings of ACM Computing Frontiers Conference 201

    Construction of a Pragmatic Base Line for Journal Classifications and Maps Based on Aggregated Journal-Journal Citation Relations

    Full text link
    A number of journal classification systems have been developed in bibliometrics since the launch of the Citation Indices by the Institute of Scientific Information (ISI) in the 1960s. These systems are used to normalize citation counts with respect to field-specific citation patterns. The best known system is the so-called "Web-of-Science Subject Categories" (WCs). In other systems papers are classified by algorithmic solutions. Using the Journal Citation Reports 2014 of the Science Citation Index and the Social Science Citation Index (n of journals = 11,149), we examine options for developing a new system based on journal classifications into subject categories using aggregated journal-journal citation data. Combining routines in VOSviewer and Pajek, a tree-like classification is developed. At each level one can generate a map of science for all the journals subsumed under a category. Nine major fields are distinguished at the top level. Further decomposition of the social sciences is pursued for the sake of example with a focus on journals in information science (LIS) and science studies (STS). The new classification system improves on alternative options by avoiding the problem of randomness in each run that has made algorithmic solutions hitherto irreproducible. Limitations of the new system are discussed (e.g. the classification of multi-disciplinary journals). The system's usefulness for field-normalization in bibliometrics should be explored in future studies.Comment: accepted for publication in the Journal of Informetrics, 20 July 201

    Classification methods for noise transients in advanced gravitational-wave detectors

    Get PDF
    Noise of non-astrophysical origin will contaminate science data taken by the Advanced Laser Interferometer Gravitational-wave Observatory (aLIGO) and Advanced Virgo gravitational-wave detectors. Prompt characterization of instrumental and environmental noise transients will be critical for improving the sensitivity of the advanced detectors in the upcoming science runs. During the science runs of the initial gravitational-wave detectors, noise transients were manually classified by visually examining the time-frequency scan of each event. Here, we present three new algorithms designed for the automatic classification of noise transients in advanced detectors. Two of these algorithms are based on Principal Component Analysis. They are Principal Component Analysis for Transients (PCAT), and an adaptation of LALInference Burst (LIB). The third algorithm is a combination of an event generator called Wavelet Detection Filter (WDF) and machine learning techniques for classification. We test these algorithms on simulated data sets, and we show their ability to automatically classify transients by frequency, SNR and waveform morphology

    A Two-stage Classification Method for High-dimensional Data and Point Clouds

    Full text link
    High-dimensional data classification is a fundamental task in machine learning and imaging science. In this paper, we propose a two-stage multiphase semi-supervised classification method for classifying high-dimensional data and unstructured point clouds. To begin with, a fuzzy classification method such as the standard support vector machine is used to generate a warm initialization. We then apply a two-stage approach named SaT (smoothing and thresholding) to improve the classification. In the first stage, an unconstraint convex variational model is implemented to purify and smooth the initialization, followed by the second stage which is to project the smoothed partition obtained at stage one to a binary partition. These two stages can be repeated, with the latest result as a new initialization, to keep improving the classification quality. We show that the convex model of the smoothing stage has a unique solution and can be solved by a specifically designed primal-dual algorithm whose convergence is guaranteed. We test our method and compare it with the state-of-the-art methods on several benchmark data sets. The experimental results demonstrate clearly that our method is superior in both the classification accuracy and computation speed for high-dimensional data and point clouds.Comment: 21 pages, 4 figure

    Data-Rich Astronomy: Mining Sky Surveys with PhotoRApToR

    Full text link
    In the last decade a new generation of telescopes and sensors has allowed the production of a very large amount of data and astronomy has become a data-rich science. New automatic methods largely based on machine learning are needed to cope with such data tsunami. We present some results in the fields of photometric redshifts and galaxy classification, obtained using the MLPQNA algorithm available in the DAMEWARE (Data Mining and Web Application Resource) for the SDSS galaxies (DR9 and DR10). We present PhotoRApToR (Photometric Research Application To Redshift): a Java based desktop application capable to solve regression and classification problems and specialized for photo-z estimation.Comment: proceedings of the IAU Symposium, Vol. 306, Cambridge University Pres
    • …
    corecore