6,169 research outputs found

    Bounded Coordinate-Descent for Biological Sequence Classification in High Dimensional Predictor Space

    Full text link
    We present a framework for discriminative sequence classification where the learner works directly in the high dimensional predictor space of all subsequences in the training set. This is possible by employing a new coordinate-descent algorithm coupled with bounding the magnitude of the gradient for selecting discriminative subsequences fast. We characterize the loss functions for which our generic learning algorithm can be applied and present concrete implementations for logistic regression (binomial log-likelihood loss) and support vector machines (squared hinge loss). Application of our algorithm to protein remote homology detection and remote fold recognition results in performance comparable to that of state-of-the-art methods (e.g., kernel support vector machines). Unlike state-of-the-art classifiers, the resulting classification models are simply lists of weighted discriminative subsequences and can thus be interpreted and related to the biological problem

    Automating the Construction of Jet Observables with Machine Learning

    Full text link
    Machine-learning assisted jet substructure tagging techniques have the potential to significantly improve searches for new particles and Standard Model measurements in hadronic final states. Techniques with simple analytic forms are particularly useful for establishing robustness and gaining physical insight. We introduce a procedure to automate the construction of a large class of observables that are chosen to completely specify MM-body phase space. The procedure is validated on the task of distinguishing HbbˉH\rightarrow b\bar{b} from gbbˉg\rightarrow b\bar{b}, where M=3M=3 and previous brute-force approaches to construct an optimal product observable for the MM-body phase space have established the baseline performance. We then use the new method to design tailored observables for the boosted ZZ' search, where M=4M=4 and brute-force methods are intractable. The new classifiers outperform standard 22-prong tagging observables, illustrating the power of the new optimization method for improving searches and measurement at the LHC and beyond.Comment: 15 pages, 8 tables, 12 figure

    Validation procedures in radiological diagnostic models. Neural network and logistic regression

    Get PDF
    The objective of this paper is to compare the performance of two predictive radiological models, logistic regression (LR) and neural network (NN), with five different resampling methods. One hundred and sixty-seven patients with proven calvarial lesions as the only known disease were enrolled. Clinical and CT data were used for LR and NN models. Both models were developed with cross validation, leave-one-out and three different bootstrap algorithms. The final results of each model were compared with error rate and the area under receiver operating characteristic curves (Az). The neural network obtained statistically higher Az than LR with cross validation. The remaining resampling validation methods did not reveal statistically significant differences between LR and NN rules. The neural network classifier performs better than the one based on logistic regression. This advantage is well detected by three-fold cross-validation, but remains unnoticed when leave-one-out or bootstrap algorithms are used.Skull, neoplasms, logistic regression, neural networks, receiver operating characteristic curve, statistics, resampling

    Binary Classifier Calibration using an Ensemble of Near Isotonic Regression Models

    Full text link
    Learning accurate probabilistic models from data is crucial in many practical tasks in data mining. In this paper we present a new non-parametric calibration method called \textit{ensemble of near isotonic regression} (ENIR). The method can be considered as an extension of BBQ, a recently proposed calibration method, as well as the commonly used calibration method based on isotonic regression. ENIR is designed to address the key limitation of isotonic regression which is the monotonicity assumption of the predictions. Similar to BBQ, the method post-processes the output of a binary classifier to obtain calibrated probabilities. Thus it can be combined with many existing classification models. We demonstrate the performance of ENIR on synthetic and real datasets for the commonly used binary classification models. Experimental results show that the method outperforms several common binary classifier calibration methods. In particular on the real data, ENIR commonly performs statistically significantly better than the other methods, and never worse. It is able to improve the calibration power of classifiers, while retaining their discrimination power. The method is also computationally tractable for large scale datasets, as it is O(NlogN)O(N \log N) time, where NN is the number of samples

    Supervised Classification: Quite a Brief Overview

    Full text link
    The original problem of supervised classification considers the task of automatically assigning objects to their respective classes on the basis of numerical measurements derived from these objects. Classifiers are the tools that implement the actual functional mapping from these measurements---also called features or inputs---to the so-called class label---or output. The fields of pattern recognition and machine learning study ways of constructing such classifiers. The main idea behind supervised methods is that of learning from examples: given a number of example input-output relations, to what extent can the general mapping be learned that takes any new and unseen feature vector to its correct class? This chapter provides a basic introduction to the underlying ideas of how to come to a supervised classification problem. In addition, it provides an overview of some specific classification techniques, delves into the issues of object representation and classifier evaluation, and (very) briefly covers some variations on the basic supervised classification task that may also be of interest to the practitioner

    Nonparametric liquefaction triggering and postliquefaction deformations

    Get PDF
    This study evaluates granular liquefaction triggering case-history data using a nonparametric approach. This approach assumes no functional form in the relationship between liquefied and nonliquefied cases as measured using cone penetration test (CPT) data. From a statistical perspective, this allows for an estimate of the threshold of liquefaction triggering unbiased by prior functional forms, and also provides a platform for testing existing published methods for accuracy and precision. The resulting threshold exhibits some unique trends, which are then interpreted based on postliquefaction deformation behavior. The range of postliquefaction deformations are differentiated into three zones: (1) large deformations associated with metastable conditions; (2) medium deformations associated with cyclic strain failure; and (3) small deformations associated with cyclic stress failure. Deformations are further defined based on the absence or presence of static driving shear stresses. This work presents a single simplified framework that provides quantitative guidance on triggering and qualitative guidance on deformation potential for quick assessment of risks associated with seismic soil liquefaction failure

    Value Focused Thinking Applications to Supervised Pattern Classification with Extensions to Hyperspectral Anomaly Detection Algorithms

    Get PDF
    Hyperspectral imaging (HSI) is an emerging analytical tool with flexible applications in different target detection and classification environments, including Military Intelligence, environmental conservation, etc. Algorithms are being developed at a rapid rate, solving various related detection problems under certain assumptions. At the core of these algorithms is the concept of supervised pattern classification, which trains an algorithm to data with enough generalizability that it can be applied to multiple instances of data. It is necessary to develop a logical methodology that can weigh responses and provide an output value that can help determine an optimum algorithm. This research focuses on the comparison of supervised learning classification algorithms through the development of a value focused thinking (VFT) hierarchy. This hierarchy represents a fusion of qualitative/ quantitative parameter values developed with Subject Matter Expert a priori information. Parameters include a fusion of bias/variance values decomposed from quadratic and zero/one loss functions, and a comparison of cross-validation methodologies and resulting error. This methodology is utilized to compare the aforementioned classifiers as applied to hyperspectral imaging data. Conclusions reached include a proof of concept of the credibility and applicability of the value focused thinking process to determine an optimal algorithm in various conditions
    corecore