11,337 research outputs found

    Committee-Based Sample Selection for Probabilistic Classifiers

    Full text link
    In many real-world learning tasks, it is expensive to acquire a sufficient number of labeled examples for training. This paper investigates methods for reducing annotation cost by `sample selection'. In this approach, during training the learning program examines many unlabeled examples and selects for labeling only those that are most informative at each stage. This avoids redundantly labeling examples that contribute little new information. Our work follows on previous research on Query By Committee, extending the committee-based paradigm to the context of probabilistic classification. We describe a family of empirical methods for committee-based sample selection in probabilistic classification models, which evaluate the informativeness of an example by measuring the degree of disagreement between several model variants. These variants (the committee) are drawn randomly from a probability distribution conditioned by the training set labeled so far. The method was applied to the real-world natural language processing task of stochastic part-of-speech tagging. We find that all variants of the method achieve a significant reduction in annotation cost, although their computational efficiency differs. In particular, the simplest variant, a two member committee with no parameters to tune, gives excellent results. We also show that sample selection yields a significant reduction in the size of the model used by the tagger

    Machine Learning in Automated Text Categorization

    Full text link
    The automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last ten years, due to the increased availability of documents in digital form and the ensuing need to organize them. In the research community the dominant approach to this problem is based on machine learning techniques: a general inductive process automatically builds a classifier by learning, from a set of preclassified documents, the characteristics of the categories. The advantages of this approach over the knowledge engineering approach (consisting in the manual definition of a classifier by domain experts) are a very good effectiveness, considerable savings in terms of expert manpower, and straightforward portability to different domains. This survey discusses the main approaches to text categorization that fall within the machine learning paradigm. We will discuss in detail issues pertaining to three different problems, namely document representation, classifier construction, and classifier evaluation.Comment: Accepted for publication on ACM Computing Survey

    Efficient Version-Space Reduction for Visual Tracking

    Full text link
    Discrminative trackers, employ a classification approach to separate the target from its background. To cope with variations of the target shape and appearance, the classifier is updated online with different samples of the target and the background. Sample selection, labeling and updating the classifier is prone to various sources of errors that drift the tracker. We introduce the use of an efficient version space shrinking strategy to reduce the labeling errors and enhance its sampling strategy by measuring the uncertainty of the tracker about the samples. The proposed tracker, utilize an ensemble of classifiers that represents different hypotheses about the target, diversify them using boosting to provide a larger and more consistent coverage of the version-space and tune the classifiers' weights in voting. The proposed system adjusts the model update rate by promoting the co-training of the short-memory ensemble with a long-memory oracle. The proposed tracker outperformed state-of-the-art trackers on different sequences bearing various tracking challenges.Comment: CRV'17 Conferenc

    Active Collaborative Ensemble Tracking

    Full text link
    A discriminative ensemble tracker employs multiple classifiers, each of which casts a vote on all of the obtained samples. The votes are then aggregated in an attempt to localize the target object. Such method relies on collective competence and the diversity of the ensemble to approach the target/non-target classification task from different views. However, by updating all of the ensemble using a shared set of samples and their final labels, such diversity is lost or reduced to the diversity provided by the underlying features or internal classifiers' dynamics. Additionally, the classifiers do not exchange information with each other while striving to serve the collective goal, i.e., better classification. In this study, we propose an active collaborative information exchange scheme for ensemble tracking. This, not only orchestrates different classifier towards a common goal but also provides an intelligent update mechanism to keep the diversity of classifiers and to mitigate the shortcomings of one with the others. The data exchange is optimized with regard to an ensemble uncertainty utility function, and the ensemble is updated via co-training. The evaluations demonstrate promising results realized by the proposed algorithm for the real-world online tracking.Comment: AVSS 2017 Submissio

    Decentralized learning with budgeted network load using Gaussian copulas and classifier ensembles

    Get PDF
    We examine a network of learners which address the same classification task but must learn from different data sets. The learners cannot share data but instead share their models. Models are shared only one time so as to preserve the network load. We introduce DELCO (standing for Decentralized Ensemble Learning with COpulas), a new approach allowing to aggregate the predictions of the classifiers trained by each learner. The proposed method aggregates the base classifiers using a probabilistic model relying on Gaussian copulas. Experiments on logistic regressor ensembles demonstrate competing accuracy and increased robustness in case of dependent classifiers. A companion python implementation can be downloaded at https://github.com/john-klein/DELC

    Ensemble Committees for Stock Return Classification and Prediction

    Full text link
    This paper considers a portfolio trading strategy formulated by algorithms in the field of machine learning. The profitability of the strategy is measured by the algorithm's capability to consistently and accurately identify stock indices with positive or negative returns, and to generate a preferred portfolio allocation on the basis of a learned model. Stocks are characterized by time series data sets consisting of technical variables that reflect market conditions in a previous time interval, which are utilized produce binary classification decisions in subsequent intervals. The learned model is constructed as a committee of random forest classifiers, a non-linear support vector machine classifier, a relevance vector machine classifier, and a constituent ensemble of k-nearest neighbors classifiers. The Global Industry Classification Standard (GICS) is used to explore the ensemble model's efficacy within the context of various fields of investment including Energy, Materials, Financials, and Information Technology. Data from 2006 to 2012, inclusive, are considered, which are chosen for providing a range of market circumstances for evaluating the model. The model is observed to achieve an accuracy of approximately 70% when predicting stock price returns three months in advance.Comment: 15 pages, 4 figures, Neukom Institute Computational Undergraduate Research prize - second plac
    • …
    corecore