Search CORE

14,823 research outputs found

Variable selection and updating in model-based discriminant analysis for high dimensional data with food authenticity applications

Author: Adrian
Brendan Murphy
E. Raftery
Nema Dean
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2010
Field of study

Food authenticity studies are concerned with determining if food samples have been correctly labelled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-supervised manner using both labeled and unlabeled data. The method is shown to give excellent classification performance on several high-dimensional multiclass food authenticity datasets with more variables than observations. The variables selected by the proposed method provide information about which variables are meaningful for classification purposes. A headlong search strategy for variable selection is shown to be efficient in terms of computation and achieves excellent classification performance. In applications to several food authenticity datasets, our proposed method outperformed default implementations of Random Forests, AdaBoost, transductive SVMs and Bayesian Multinomial Regression by substantial margins

arXiv.org e-Print Archive

CiteSeerX

Crossref

Research Repository UCD

PubMed Central

Enlighten

Prediction of dementia patients: A comparative approach using parametric vs. non parametric classifiers

Author: Guerreiro Manuela
Maroco João
Mendonça Alexandre de
Santana Isabel
Silva Dina Lúcia Gomes da
Publication venue: Sociedade Portuguesa de Estatística
Publication date: 01/01/2012
Field of study

In this paper, we report a comparison study of 7 non parametric classifiers (Multilayer perceptron Neural Networks, Radial Basis Function Neural Networks, SupportVectorMachines, CART, CHAID and QUEST Classification trees and Random Forests) as compared to Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression tested in a real data application of mild cognitive impaired elderly patients conversion to dementia. When classification results are compared both on overall accuracy, specificity and sensitivity, Linear Discriminant Analysis and Random Forests rank first among all the classifiers

Repositório do ISPA

Localized Regression

Author: Binder Harald
Tutz Gerhard
Publication venue
Publication date: 01/01/2004
Field of study

The main problem with localized discriminant techniques is the curse of dimensionality, which seems to restrict their use to the case of few variables. This restriction does not hold if localization is combined with a reduction of dimension. In particular it is shown that localization yields powerful classifiers even in higher dimensions if localization is combined with locally adaptive selection of predictors. A robust localized logistic regression (LLR) method is developed for which all tuning parameters are chosen data¡adaptively. In an extended simulation study we evaluate the potential of the proposed procedure for various types of data and compare it to other classification procedures. In addition we demonstrate that automatic choice of localization, predictor selection and penalty parameters based on cross validation is working well. Finally the method is applied to real data sets and its real world performance is compared to alternative procedures

Open Access LMU

How to Find More Supernovae with Less Work: Object Classification Techniques for Difference Imaging

Author: B. A. Weaver
Becker A. C.
C. Aragon
D. Wong
Fisher R. A.
Freund Y.
R. C. Thomas
R. Romano
S. Bailey
Zahn C. T.
Publication venue: 'University of Chicago Press'
Publication date: 02/05/2007
Field of study

We present the results of applying new object classification techniques to difference images in the context of the Nearby Supernova Factory supernova search. Most current supernova searches subtract reference images from new images, identify objects in these difference images, and apply simple threshold cuts on parameters such as statistical significance, shape, and motion to reject objects such as cosmic rays, asteroids, and subtraction artifacts. Although most static objects subtract cleanly, even a very low false positive detection rate can lead to hundreds of non-supernova candidates which must be vetted by human inspection before triggering additional followup. In comparison to simple threshold cuts, more sophisticated methods such as Boosted Decision Trees, Random Forests, and Support Vector Machines provide dramatically better object discrimination. At the Nearby Supernova Factory, we reduced the number of non-supernova candidates by a factor of 10 while increasing our supernova identification efficiency. Methods such as these will be crucial for maintaining a reasonable false positive rate in the automated transient alert pipelines of upcoming projects such as PanSTARRS and LSST.Comment: 25 pages; 6 figures; submitted to Ap

arXiv.org e-Print Archive

Crossref

UNT Digital Library

Financial-distress prediction of Islamic banks using tree-based stochastic techniques

Author: Gepp Adrian
Halteh Khaled
Kumar Kuldeep
Publication venue: 'Emerald'
Publication date: 22/06/2018
Field of study

Bond University Research Portal

Recommended from our members

Statistical Workflow for Feature Selection in Human Metabolomics Data.

Author: Antonelli Joseph
Cheng Susan
Claggett Brian L
Demler Olga V
Deng Katherine
Henglin Mir
Hushcha Pavel V
Jain Mohit
Kim Andy
Kim Nicole
Lagerborg Kim A
Mora Samia
Niiranen Teemu J
Ovsak Gavin
Pereira Alexandre C
Rao Kevin
Tyagi Octavia
Watrous Jeramie D
Publication venue: eScholarship, University of California
Publication date: 01/07/2019
Field of study

High-throughput metabolomics investigations, when conducted in large human cohorts, represent a potentially powerful tool for elucidating the biochemical diversity underlying human health and disease. Large-scale metabolomics data sources, generated using either targeted or nontargeted platforms, are becoming more common. Appropriate statistical analysis of these complex high-dimensional data will be critical for extracting meaningful results from such large-scale human metabolomics studies. Therefore, we consider the statistical analytical approaches that have been employed in prior human metabolomics studies. Based on the lessons learned and collective experience to date in the field, we offer a step-by-step framework for pursuing statistical analyses of cohort-based human metabolomics data, with a focus on feature selection. We discuss the range of options and approaches that may be employed at each stage of data management, analysis, and interpretation and offer guidance on the analytical decisions that need to be considered over the course of implementing a data analysis workflow. Certain pervasive analytical challenges facing the field warrant ongoing focused research. Addressing these challenges, particularly those related to analyzing human metabolomics data, will allow for more standardization of as well as advances in how research in the field is practiced. In turn, such major analytical advances will lead to substantial improvements in the overall contributions of human metabolomics investigations

eScholarship - University of California