23,036 research outputs found

    An application of data mining to fruit and vegetable sample identification using Gas Chromatography-Mass Spectrometry

    Get PDF
    One of the uses of Gas Chromatography-Mass Spectrometry (GC-MS) is in the detection of pesticide residues in fruit and vegetables. In a high throughput laboratory there is the potential for sample swaps or mislabelling, as once a sample has been pre-processed to be injected into the GC-MS analyser, it is no longer distinguishable by eye. Possible consequences of such mistakes can be the destruction of large amounts of actually safe produce or pesticide-contaminated produce reaching the consumer. For the purposes of food safety and traceability, it can also be extremely valuable to know the source (country of origin) of a food product. This can help uncover fraudulent attempts of trying to sell food originating from countries deemed unsafe. In this study, we use the workflow environment ADAMS to examine whether we can determine the fruit/vegetable, and the country of origin of a sample from a GC-MS chromatogram. A workflow is used to generate data sets using different data pre-processing methods, and data representations from a database of over 8000 GC-MS chromatograms, consisting of more than 100 types of fruit and vegetables from more than 120 countries. A variety of classification algorithms are evaluated using the WEKA data mining workbench. We demonstrate excellent results, both for the determination of fruit/vegetable type and for the country of origin, using a histogram of ion counts, and Classification by Regression using Random Regression Forest with PLS-transformed data

    A Method to Distinguish Quiescent and Dusty Star-forming Galaxies with Machine Learning

    Get PDF
    Large photometric surveys provide a rich source of observations of quiescent galaxies, including a surprisingly large population at z > 1. However, identifying large, but clean, samples of quiescent galaxies has proven difficult because of their near-degeneracy with interlopers such as dusty, star-forming galaxies. We describe a new technique for selecting quiescent galaxies based upon t-distributed stochastic neighbor embedding (t-SNE), an unsupervised machine-learning algorithm for dimensionality reduction. This t-SNE selection provides an improvement both over UVJ, removing interlopers that otherwise would pass color selection, and over photometric template fitting, more strongly toward high redshift. Due to the similarity between the colors of high- and low-redshift quiescent galaxies, under our assumptions, t-SNE outperforms template fitting in 63% of trials at redshifts where a large training sample already exists. It also may be able to select quiescent galaxies more efficiently at higher redshifts than the training sample

    Finding rare objects and building pure samples: Probabilistic quasar classification from low resolution Gaia spectra

    Full text link
    We develop and demonstrate a probabilistic method for classifying rare objects in surveys with the particular goal of building very pure samples. It works by modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects. We demonstrate our method using the Discrete Source Classifier, a supervised classifier currently based on Support Vector Machines, which we are developing in preparation for the Gaia data analysis. DSC classifies objects using their very low resolution optical spectra. We look in detail at the problem of quasar classification, because identification of a pure quasar sample is necessary to define the Gaia astrometric reference frame. By varying a posterior probability threshold in DSC we can trade off sample completeness and contamination. We show, using our simulated data, that it is possible to achieve a pure sample of quasars (upper limit on contamination of 1 in 40,000) with a completeness of 65% at magnitudes of G=18.5, and 50% at G=20.0, even when quasars have a frequency of only 1 in every 2000 objects. The star sample completeness is simultaneously 99% with a contamination of 0.7%. Including parallax and proper motion in the classifier barely changes the results. We further show that not accounting for class priors in the target population leads to serious misclassifications and poor predictions for sample completeness and contamination. (Truncated)Comment: MNRAS accepte

    A Data Science Approach to Understanding Residential Water Contamination in Flint

    Full text link
    When the residents of Flint learned that lead had contaminated their water system, the local government made water-testing kits available to them free of charge. The city government published the results of these tests, creating a valuable dataset that is key to understanding the causes and extent of the lead contamination event in Flint. This is the nation's largest dataset on lead in a municipal water system. In this paper, we predict the lead contamination for each household's water supply, and we study several related aspects of Flint's water troubles, many of which generalize well beyond this one city. For example, we show that elevated lead risks can be (weakly) predicted from observable home attributes. Then we explore the factors associated with elevated lead. These risk assessments were developed in part via a crowd sourced prediction challenge at the University of Michigan. To inform Flint residents of these assessments, they have been incorporated into a web and mobile application funded by \texttt{Google.org}. We also explore questions of self-selection in the residential testing program, examining which factors are linked to when and how frequently residents voluntarily sample their water.Comment: Applied Data Science track paper at KDD 2017. For associated promotional video, see https://www.youtube.com/watch?v=0g66ImaV8A

    Use of habitat suitability modeling in the integrated urban water system modeling of the Drava River (Varazdin, Croatia)

    Get PDF
    The development of practical tools for providing accurate ecological assessment of rivers and species conditions is necessary to preserve habitats and species, stop degradation and restore water quality. An understanding of the causal mechanisms and processes that affect the ecological water quality and shape macroinvertebrate communities at a local scale has important implications for conservation management and river restoration. This study used the integration of wastewater treatment, river water quality and ecological assessment models to study the effect of upgrading a wastewater treatment plant (WWTP) and their ecological effects for the receiving river. The WWTP and the water quality and quantity of the Drava river in Croatia were modelled in the software WEST. For the ecological modeling, the approach followed was to build habitat suitability and ecological assessment models based on classification trees. This technique allows predicting the biological water quality in terms of the occurrence of macroinvertebrates and the river status according to ecological water quality indices. The ecological models developed were satisfactory, and showed a good predictive performance and good discrimination capacity. Using the integrated ecological model for the Drava river, three scenarios were run and evaluated. The scenario assessment showed that it is necessary an integrated approach for the water management of the Drava river, which considers an upgrading of the WWTP with Nitrogen and Phosphorous removal and the treatment of other diffuse pollution and point sources (including the overflow of the WWTP). Additionally, if an increase in the minimum instream flow after the dams is considered, a higher dilution capacity and a higher self-cleaning capability could be obtained. The results proved that integrated models like the one presented here have an added value for decision support in water management. This kind of integrated approach is useful to get insight in aquatic ecosystems, for assessing investments in sanitation infrastructure of urban wastewater systems considering both, the fulfilling of legal physical chemical emission limits and the ecological state of the receiving waters
    corecore