Search CORE

39,138 research outputs found

Recommended from our members

VarSight: prioritizing clinically reported variants with binary classification algorithms.

Author: Anderson Julie A
Birch Camille L
Brown Donna M
Gajapathy Manavalan
Harris Jeremy M
Holt James M
Kelly Jacob M
Moss Alexander C
Shaterferdosian Fariba
Sosonkina Nadiya
Undiagnosed Diseases Network
Uno-Antonison Angelina E
Weborg Arthur
Wilk Brandon
Wilk Melissa A
Worthey Elizabeth A
Publication venue: eScholarship, University of California
Publication date: 01/10/2019
Field of study

BackgroundWhen applying genomic medicine to a rare disease patient, the primary goal is to identify one or more genomic variants that may explain the patient's phenotypes. Typically, this is done through annotation, filtering, and then prioritization of variants for manual curation. However, prioritization of variants in rare disease patients remains a challenging task due to the high degree of variability in phenotype presentation and molecular source of disease. Thus, methods that can identify and/or prioritize variants to be clinically reported in the presence of such variability are of critical importance.MethodsWe tested the application of classification algorithms that ingest variant annotations along with phenotype information for predicting whether a variant will ultimately be clinically reported and returned to a patient. To test the classifiers, we performed a retrospective study on variants that were clinically reported to 237 patients in the Undiagnosed Diseases Network.ResultsWe treated the classifiers as variant prioritization systems and compared them to four variant prioritization algorithms and two single-measure controls. We showed that the trained classifiers outperformed all other tested methods with the best classifiers ranking 72% of all reported variants and 94% of reported pathogenic variants in the top 20.ConclusionsWe demonstrated how freely available binary classification algorithms can be used to prioritize variants even in the presence of real-world variability. Furthermore, these classifiers outperformed all other tested methods, suggesting that they may be well suited for working with real rare disease patient datasets

eScholarship - University of California

PASS: a simple classifier system for data analysis

Author: Muruzábal Jorge
Publication venue
Publication date: 01/09/1993
Field of study

Let x be a vector of predictors and y a scalar response associated with it. Consider the regression problem of inferring the relantionship between predictors and response on the basis of a sample of observed pairs (x,y). This is a familiar problem for which a variety of methods are available. This paper describes a new method based on the classifier system approach to problem solving. Classifier systems provide a rich framework for learning and induction, and they have been suc:cessfully applied in the artificial intelligence literature for some time. The present method emiches the simplest classifier system architecture with some new heuristic and explores its potential in a purely inferential context. A prototype called PASS (Predictive Adaptative Sequential System) has been built to test these ideas empirically. Preliminary Monte Carlo experiments indicate that PASS is able to discover the structure imposed on the data in a wide array of cases

Universidad Carlos III de Madrid e-Archivo

Presymptomatic risk assessment for chronic non-communicable diseases

Author: AC Morrison
AJ MacGregor
AS Daar
Badri Padhukasahasram
C Kyogoku
C Tysk
CG Loftus
D Bentley
Daryl J. Thomas
DE Reich
Dietrich A. Stephan
E Zeggini
EF Remmers
Elana Silver
Eran Halperin
Heather Trumbower
JC Barrett
Jennifer Wessel
JK Pritchard
JM Chan
JT Salonen
KM Narayan
KM Small
LB Lusted
LJ Scott
MF Doran
Michele Cargill
MS Sandhu
NR Wray
O Kempthorne
Q Lu
Q Yang
R King
S Kathiresan
Thorkild I. A. Sorensen
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2010
Field of study

The prevalence of common chronic non-communicable diseases (CNCDs) far overshadows the prevalence of both monogenic and infectious diseases combined. All CNCDs, also called complex genetic diseases, have a heritable genetic component that can be used for pre-symptomatic risk assessment. Common single nucleotide polymorphisms (SNPs) that tag risk haplotypes across the genome currently account for a non-trivial portion of the germ-line genetic risk and we will likely continue to identify the remaining missing heritability in the form of rare variants, copy number variants and epigenetic modifications. Here, we describe a novel measure for calculating the lifetime risk of a disease, called the genetic composite index (GCI), and demonstrate its predictive value as a clinical classifier. The GCI only considers summary statistics of the effects of genetic variation and hence does not require the results of large-scale studies simultaneously assessing multiple risk factors. Combining GCI scores with environmental risk information provides an additional tool for clinical decision-making. The GCI can be populated with heritable risk information of any type, and thus represents a framework for CNCD pre-symptomatic risk assessment that can be populated as additional risk information is identified through next-generation technologies.Comment: Plos ONE paper. Previous version was withdrawn to be updated by the journal's pdf versio

arXiv.org e-Print Archive

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Hyperspectral classification of Cyperus esculentus clones and morphologically similar weeds

Author: Cool Simon R.
De Cauwer Benny
Lauwers Marlies
Nuyttens David
Pieters Jan
Publication venue: 'MDPI AG'
Publication date: 01/01/2020
Field of study

Cyperus esculentus (yellow nutsedge) is one of the world's worst weeds as it can cause great damage to crops and crop production. To eradicate C. esculentus, early detection is key-a challenging task as it is often confused with other Cyperaceae and displays wide genetic variability. In this study, the objective was to classify C. esculentus clones and morphologically similar weeds. Hyperspectral reflectance between 500 and 800 nm was tested as a measure to discriminate between (I) C. esculentus and morphologically similar Cyperaceae weeds, and between (II) different clonal populations of C. esculentus using three classification models: random forest (RF), regularized logistic regression (RLR) and partial least squares-discriminant analysis (PLS-DA). RLR performed better than RF and PLS-DA, and was able to adequately classify the samples. The possibility of creating an affordable multispectral sensing tool, for precise in-field recognition of C. esculentus plants based on fewer spectral bands, was tested. Results of this study were compared against simulated results from a commercially available multispectral camera with four spectral bands. The model created with customized bands performed almost equally well as the original PLS-DA or RLR model, and much better than the model describing multispectral image data from a commercially available camera. These results open up the opportunity to develop a dedicated robust tool for C. esculentus recognition based on four spectral bands and an appropriate classification model

Ghent University Academic Bibliography

Automating biomedical data science through tree-based pipeline optimization

Author: Andrews Peter C.
Kidd La Creis
Lavender Nicole A.
Moore Jason H.
Olson Randal S.
Urbanowicz Ryan J.
Publication venue
Publication date: 27/01/2016
Field of study

Over the past decade, data science and machine learning has grown from a mysterious art form to a staple tool across a variety of fields in academia, business, and government. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement a Tree-based Pipeline Optimization Tool (TPOT) and demonstrate its effectiveness on a series of simulated and real-world genetic data sets. In particular, we show that TPOT can build machine learning pipelines that achieve competitive classification accuracy and discover novel pipeline operators---such as synthetic feature constructors---that significantly improve classification accuracy on these data sets. We also highlight the current challenges to pipeline optimization, such as the tendency to produce pipelines that overfit the data, and suggest future research paths to overcome these challenges. As such, this work represents an early step toward fully automating machine learning pipeline design.Comment: 16 pages, 5 figures, to appear in EvoBIO 2016 proceeding

arXiv.org e-Print Archive

Scipedia

Inference in classifier systems

Author: Muruzábal Jorge
Publication venue
Publication date: 01/09/1993
Field of study

Classifier systems (Css) provide a rich framework for learning and induction, and they have beenı successfully applied in the artificial intelligence literature for some time. In this paper, both theı architecture and the inferential mechanisms in general CSs are reviewed, and a number of limitations and extensions of the basic approach are summarized. A system based on the CS approach that is capable of quantitative data analysis is outlined and some of its peculiarities discussed

Universidad Carlos III de Madrid e-Archivo

Genetic Classification of Populations using Supervised Learning

Author: A Motsinger-Reif
A Seretti
Aiden Corvin
B North
C Bailer-Jones
C Chang
Carlos Pinto
Colm O'Dushlaine
D Curtis
D Reich
D Reich
Daniel J. Kliebenstein
Derek Morris
E Jaynes
Elizabeth A. Heron
J Baik
J Baik
M Leshno
M Nelis
Michael Bridges
Michael Gill
N Patterson
O Lao
Ricardo Segurado
S Gull
S Penco
S Purcell
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 16/12/2010
Field of study

There are many instances in genetics in which we wish to determine whether two candidate populations are distinguishable on the basis of their genetic structure. Examples include populations which are geographically separated, case--control studies and quality control (when participants in a study have been genotyped at different laboratories). This latter application is of particular importance in the era of large scale genome wide association studies, when collections of individuals genotyped at different locations are being merged to provide increased power. The traditional method for detecting structure within a population is some form of exploratory technique such as principal components analysis. Such methods, which do not utilise our prior knowledge of the membership of the candidate populations. are termed \emph{unsupervised}. Supervised methods, on the other hand are able to utilise this prior knowledge when it is available. In this paper we demonstrate that in such cases modern supervised approaches are a more appropriate tool for detecting genetic differences between populations. We apply two such methods, (neural networks and support vector machines) to the classification of three populations (two from Scotland and one from Bulgaria). The sensitivity exhibited by both these methods is considerably higher than that attained by principal components analysis and in fact comfortably exceeds a recently conjectured theoretical limit on the sensitivity of unsupervised methods. In particular, our methods can distinguish between the two Scottish populations, where principal components analysis cannot. We suggest, on the basis of our results that a supervised learning approach should be the method of choice when classifying individuals into pre-defined populations, particularly in quality control for large scale genome wide association studies.Comment: Accepted PLOS On

arXiv.org e-Print Archive

Aberdeen University Research

CiteSeerX

Crossref

Online Research @ Cardiff

Research Repository UCD

Directory of Open Access Journals

Irish Universities

UCL Discovery

PubMed Central

University of Melbourne Institutional Repository

Classification systems offer a microcosm of issues in conceptual processing: A commentary on Kemmerer (2016)

Author: Barsalou Lawrence W.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2017
Field of study

This is a commentary on Kemmerer (2016), Categories of Object Concepts Across Languages and Brains: The Relevance of Nominal Classification Systems to Cognitive Neuroscience, DOI: 10.1080/23273798.2016.1198819

Crossref

Enlighten