Search CORE

275 research outputs found

Learning Dynamic Feature Selection for Fast Sequential Prediction

Author: McCallum Andrew
Silverstein Kate
Strubell Emma
Vilnis Luke
Publication venue
Publication date: 01/01/2015
Field of study

We present paired learning and inference algorithms for significantly reducing computation and increasing speed of the vector dot products in the classifiers that are at the heart of many NLP components. This is accomplished by partitioning the features into a sequence of templates which are ordered such that high confidence can often be reached using only a small fraction of all features. Parameter estimation is arranged to maximize accuracy and early confidence in this sequence. Our approach is simpler and better suited to NLP than other related cascade methods. We present experiments in left-to-right part-of-speech tagging, named entity recognition, and transition-based dependency parsing. On the typical benchmarking datasets we can preserve POS tagging accuracy above 97% and parsing LAS above 88.5% both with over a five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase in speed.Comment: Appears in The 53rd Annual Meeting of the Association for Computational Linguistics, Beijing, China, July 201

arXiv.org e-Print Archive

Crossref

Training for Fast Sequential Prediction Using Dynamic Feature Selection

Author: McCallum Andrew
Strubell Emma
Vilnis Luke
Publication venue
Publication date: 19/12/2014
Field of study

arXiv.org e-Print Archive

CiteSeerX

Efficient least angle regression for identification of linear-in-the-parameters models

Author: Beach Thomas H.
Rezgui Yacine
Zhao Wanqing
Publication venue: 'The Royal Society'
Publication date: 01/02/2017
Field of study

Least angle regression, as a promising model selection method, differentiates itself from conventional stepwise and stagewise methods, in that it is neither too greedy nor too slow. It is closely related to L1 norm optimization, which has the advantage of low prediction variance through sacrificing part of model bias property in order to enhance model generalization capability. In this paper, we propose an efficient least angle regression algorithm for model selection for a large class of linear-in-the-parameters models with the purpose of accelerating the model selection process. The entire algorithm works completely in a recursive manner, where the correlations between model terms and residuals, the evolving directions and other pertinent variables are derived explicitly and updated successively at every subset selection step. The model coefficients are only computed when the algorithm finishes. The direct involvement of matrix inversions is thereby relieved. A detailed computational complexity analysis indicates that the proposed algorithm possesses significant computational efficiency, compared with the original approach where the well-known efficient Cholesky decomposition is involved in solving least angle regression. Three artificial and real-world examples are employed to demonstrate the effectiveness, efficiency and numerical stability of the proposed algorithm

Online Research @ Cardiff

E-space: Manchester Metropolitan University's Research Repository

PubMed Central

University of East Anglia digital repository

Boosting with early stopping: Convergence and consistency

Author: Yu Bin
Zhang Tong
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 16/08/2005
Field of study

Boosting is one of the most significant advances in machine learning for classification and regression. In its original and computationally flexible version, boosting seeks to minimize empirically a loss function in a greedy fashion. The resulting estimator takes an additive function form and is built iteratively by applying a base estimator (or learner) to updated samples depending on the previous iterations. An unusual regularization technique, early stopping, is employed based on CV or a test set. This paper studies numerical convergence, consistency and statistical rates of convergence of boosting with early stopping, when it is carried out over the linear span of a family of basis functions. For general loss functions, we prove the convergence of boosting's greedy optimization to the infinimum of the loss function over the linear span. Using the numerical convergence result, we find early-stopping strategies under which boosting is shown to be consistent based on i.i.d. samples, and we obtain bounds on the rates of convergence for boosting estimators. Simulation studies are also presented to illustrate the relevance of our theoretical results for providing insights to practical aspects of boosting. As a side product, these results also reveal the importance of restricting the greedy search step-sizes, as known in practice through the work of Friedman and others. Moreover, our results lead to a rigorous proof that for a linearly separable problem, AdaBoost with \epsilon\to0 step-size becomes an L^1-margin maximizer when left to run to convergence.Comment: Published at http://dx.doi.org/10.1214/009053605000000255 in the Annals of Statistics (http://www.imstat.org/aos/) by the Institute of Mathematical Statistics (http://www.imstat.org

arXiv.org e-Print Archive

Crossref

Small sample size learning in bioinformatics

Author: Bedo Justin
Publication venue
Publication date: 21/11/2018
Field of study

The Australian National University

An adaptive multiclass nearest neighbor classifier

Author: Puchkin Nikita
Spokoiny Vladimir
Publication venue: 'EDP Sciences'
Publication date: 03/11/2019
Field of study

We consider a problem of multiclass classification, where the training sample

S_n = \{(X_i, Y_i)\}_{i=1}^n

is generated from the model

\mathbb P(Y = m | X = x) = \eta_m(x)

1 \leq m \leq M

, and

\eta_1(x), \dots, \eta_M(x)

are unknown

\alpha

-Holder continuous functions.Given a test point

X

, our goal is to predict its label. A widely used

\mathsf k

-nearest-neighbors classifier constructs estimates of

\eta_1(X), \dots, \eta_M(X)

and uses a plug-in rule for the prediction. However, it requires a proper choice of the smoothing parameter

\mathsf k

, which may become tricky in some situations. In our solution, we fix several integers

n_1, \dots, n_K

, compute corresponding

n_k

-nearest-neighbor estimates for each

m

and each

n_k

and apply an aggregation procedure. We study an algorithm, which constructs a convex combination of these estimates such that the aggregated estimate behaves approximately as well as an oracle choice. We also provide a non-asymptotic analysis of the procedure, prove its adaptation to the unknown smoothness parameter

\alpha

and to the margin and establish rates of convergence under mild assumptions.Comment: Accepted in ESAIM: Probability & Statistics. The original publication is available at www.esaim-ps.or

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

Gradient boosting models for photovoltaic power estimation under partial shading conditions

Author: Batzelis E
Brown G
Nikolaou N
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

The energy yield estimation of a photovoltaic (PV) system operating under partially shaded conditions is a challenging task and a very active area of research. In this paper, we attack this problem with the aid of machine learning techniques. Using data simulated by the equivalent circuit of a PV string operating under partial shading, we train and evaluate three different gradient boosted regression tree models to predict the global maximum power point (MPP). Our results show that all three approaches improve upon the state-of-the-art closed-form estimates, in terms of both average and worst-case performance. Moreover, we show that even a small number of training examples is sufficient to achieve improved global MPP estimation. The methods proposed are fast to train and deploy and allow for further improvements in performance should more computational resources be available

Southampton (e-Prints Soton)

UCL Discovery

Spiral - Imperial College Digital Repository

Tree Boosting Data Competitions with XGBoost

Author: Bort Escabias Carlos
Publication venue: 'Edicions de la Universitat de Barcelona'
Publication date: 01/01/2017
Field of study

This Master's Degree Thesis objective is to provide understanding on how to approach a supervised learning predictive problem and illustrate it using a statistical/machine learning algorithm, Tree Boosting. A review of tree methodology is introduced in order to understand its evolution, since Classification and Regression Trees, followed by Bagging, Random Forest and, nowadays, Tree Boosting. The methodology is explained following the XGBoost implementation, which achieved state-of-the-art results in several data competitions. A framework for applied predictive modelling is explained with its proper concepts: objective function, regularization term, overfitting, hyperparameter tuning, k-fold cross validation and feature engineering. All these concepts are illustrated with a real dataset of videogame churn; used in a datathon competition

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC