Search CORE

6,169 research outputs found

Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space

Author: Ifrim Georgiana
Wiuf Carsten
Publication venue
Publication date: 03/08/2010
Field of study

We present a framework for discriminative sequence classification where the learner works directly in the high dimensional predictor space of all subsequences in the training set. This is possible by employing a new coordinate-descent algorithm coupled with bounding the magnitude of the gradient for selecting discriminative subsequences fast. We characterize the loss functions for which our generic learning algorithm can be applied and present concrete implementations for logistic regression (binomial log-likelihood loss) and support vector machines (squared hinge loss). Application of our algorithm to protein remote homology detection and remote fold recognition results in performance comparable to that of state-of-the-art methods (e.g., kernel support vector machines). Unlike state-of-the-art classifiers, the resulting classification models are simply lists of weighted discriminative subsequences and can thus be interpreted and related to the biological problem

arXiv.org e-Print Archive

CiteSeerX

Automating the Construction of Jet Observables with Machine Learning

Author: Datta Kaustuv
Larkoski Andrew
Nachman Benjamin
Publication venue: 'American Physical Society (APS)'
Publication date: 05/03/2019
Field of study

Machine-learning assisted jet substructure tagging techniques have the potential to significantly improve searches for new particles and Standard Model measurements in hadronic final states. Techniques with simple analytic forms are particularly useful for establishing robustness and gaining physical insight. We introduce a procedure to automate the construction of a large class of observables that are chosen to completely specify

M

-body phase space. The procedure is validated on the task of distinguishing

H\rightarrow b\bar{b}

from

g\rightarrow b\bar{b}

, where

M=3

and previous brute-force approaches to construct an optimal product observable for the

M

-body phase space have established the baseline performance. We then use the new method to design tailored observables for the boosted

Z'

search, where

M=4

and brute-force methods are intractable. The new classifiers outperform standard

2

-prong tagging observables, illustrating the power of the new optimization method for improving searches and measurement at the LHC and beyond.Comment: 15 pages, 8 tables, 12 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

eScholarship - University of California

Validation procedures in radiological diagnostic models. Neural network and logistic regression

Author: Estanislao Arana
Luis Martí
Pedro Delicado
Publication venue
Publication date
Field of study

The objective of this paper is to compare the performance of two predictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.Skull, neoplasms, logistic regression, neural networks, receiver operating characteristic curve, statistics, resampling

Research Papers in Economics

Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

Author: Cooper Gregory F.
Naeini Mahdi Pakdaman
Publication venue
Publication date: 16/11/2015
Field of study

Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called \textit{ensemble of near isotonic regression} (ENIR). The method can be considered as an extension of BBQ, a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression. ENIR is designed to address the key limitation of isotonic regression which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be combined with many existing classification models. We demonstrate the performance of ENIR on synthetic and real datasets for the commonly used binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is

O(N \log N)

time, where

N

is the number of samples

arXiv.org e-Print Archive

Crossref

Supervised Classification: Quite a Brief Overview

Author: Aizerman
Ben-David
Besag
Beygelzimer
Bishop
Boser
Bottou
Bradley
Braga-Neto
Breiman
Breiman
Carbonneau
Chandola
Chapelle
Cheplygina
Chow
Christianini
Cohen
Cohn
Cortes
Cortes
Cover
Devroye
Dietterich
Dietterich
Dubuisson
Duda
Duda
Duin
Duin
Duin
Duin
Dwork
Dwork
Efron
Efron
Efron
Fanelli
Fawcett
Fedorov
Fisher
Fix
Freund
Fu
Galar
Geman
Girosi
Guyon
Hand
Hand
Hastie
Hinton
Ho
Ho
Hoerl
Hoffgen
Ioannidis
Isaksson
Jahrer
Jain
Jain
Kahneman
Krijthe
Kuncheva
Lachenbruch
Lafferty
Landgrebe
Langley
Lavrač
Leek
Levine
Li
Li
Li
Li
Little
Loog
Loog
Loog
Loog
Loog
Markou
Maron
McLachlan
Minka
Moonesinghe
Nair
Niemeijer
Nissen
Pan
Poggio
Polikar
Provost
Pękalska
Pękalska
Pękalska
Quinlan
Quiñonero-Candela
Rasmussen
Ripley
Rosenblatt
Rubinstein
Schaffer
Schiavo
Schmidhuber
Schölkopf
Schölkopf
Schölkopf
Settles
Shrivastava
Smola
Suykens
Tax
Tibshirani
Vapnik
Wahba
Wahba
Wahba
Wald
White
Wolpert
Wolpert
Wolpert
Yang
Zhou
Zhu
Publication venue
Publication date: 25/10/2017
Field of study

The original problem of supervised classification considers the task of automatically assigning objects to their respective classes on the basis of numerical measurements derived from these objects. Classifiers are the tools that implement the actual functional mapping from these measurements---also called features or inputs---to the so-called class label---or output. The fields of pattern recognition and machine learning study ways of constructing such classifiers. The main idea behind supervised methods is that of learning from examples: given a number of example input-output relations, to what extent can the general mapping be learned that takes any new and unseen feature vector to its correct class? This chapter provides a basic introduction to the underlying ideas of how to come to a supervised classification problem. In addition, it provides an overview of some specific classification techniques, delves into the issues of object representation and classifier evaluation, and (very) briefly covers some variations on the basic supervised classification task that may also be of interest to the practitioner

arXiv.org e-Print Archive

Crossref

Nonparametric liquefaction triggering and postliquefaction deformations

Author: Moss Robb
Yazdi J. S.
Publication venue: DigitalCommons@CalPoly
Publication date: 22/09/2016
Field of study

This study evaluates granular liquefaction triggering case-history data using a nonparametric approach. This approach assumes no functional form in the relationship between liquefied and nonliquefied cases as measured using cone penetration test (CPT) data. From a statistical perspective, this allows for an estimate of the threshold of liquefaction triggering unbiased by prior functional forms, and also provides a platform for testing existing published methods for accuracy and precision. The resulting threshold exhibits some unique trends, which are then interpreted based on postliquefaction deformation behavior. The range of postliquefaction deformations are differentiated into three zones: (1) large deformations associated with metastable conditions; (2) medium deformations associated with cyclic strain failure; and (3) small deformations associated with cyclic stress failure. Deformations are further defined based on the absence or presence of static driving shear stresses. This work presents a single simplified framework that provides quantitative guidance on triggering and qualitative guidance on deformation potential for quick assessment of risks associated with seismic soil liquefaction failure

DigitalCommons@CalPoly

Value Focused Thinking Applications to Supervised Pattern Classification with Extensions to Hyperspectral Anomaly Detection Algorithms

Author: Scanland David S.
Publication venue: AFIT Scholar
Publication date: 26/03/2015
Field of study

Hyperspectral imaging (HSI) is an emerging analytical tool with flexible applications in different target detection and classification environments, including Military Intelligence, environmental conservation, etc. Algorithms are being developed at a rapid rate, solving various related detection problems under certain assumptions. At the core of these algorithms is the concept of supervised pattern classification, which trains an algorithm to data with enough generalizability that it can be applied to multiple instances of data. It is necessary to develop a logical methodology that can weigh responses and provide an output value that can help determine an optimum algorithm. This research focuses on the comparison of supervised learning classification algorithms through the development of a value focused thinking (VFT) hierarchy. This hierarchy represents a fusion of qualitative/ quantitative parameter values developed with Subject Matter Expert a priori information. Parameters include a fusion of bias/variance values decomposed from quadratic and zero/one loss functions, and a comparison of cross-validation methodologies and resulting error. This methodology is utilized to compare the aforementioned classifiers as applied to hyperspectral imaging data. Conclusions reached include a proof of concept of the credibility and applicability of the value focused thinking process to determine an optimal algorithm in various conditions

AFTI Scholar (Air Force Institute of Technology)