10 research outputs found

    Generalized Boosting Algorithms for Convex Optimization

    Full text link
    Boosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the weak to strong learning guarantees for the existing gradient boosting work for strongly-smooth, strongly-convex objectives under this new measure of performance, and also demonstrate that this work fails for non-smooth objectives. To address this issue, we present new algorithms which extend this boosting approach to arbitrary convex loss functions and give corresponding weak to strong convergence results. In addition, we demonstrate experimental results that support our analysis and demonstrate the need for the new algorithms we present.Comment: Extended version of paper presented at the International Conference on Machine Learning, 2011. 9 pages + appendix with proof

    Wide Boosting

    Full text link
    Gradient boosting (GB) is a popular methodology used to solve prediction problems through minimization of a differentiable loss function, LL. GB is especially performant in low and medium dimension problems. This paper presents a simple adjustment to GB motivated in part by artificial neural networks. Specifically, our adjustment inserts a square or rectangular matrix multiplication between the output of a GB model and the loss, LL. This allows the output of a GB model to have increased dimension prior to being fed into the loss and is thus "wider" than standard GB implementations. We provide performance comparisons on several publicly available datasets. When using the same tuning methodology and same maximum boosting rounds, Wide Boosting outperforms standard GB in every dataset we try.Comment: Gradient Boosting, Wide Neural Network

    Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost

    Get PDF
    The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off. We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training. To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance. We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets. To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude

    Property price prediction: a model utilising sentiment analysis

    Get PDF
    The increase in the use of social media has led many researchers and companies to investigate the potential uses of the data that is generated by these social media platforms. This research study investigates how the use of sentiment variables, obtained from the social media platform Twitter, can be used to augment housing transfer data in order to develop a predictive model. The Design Science Research (DSR) methodology was followed, guided by a Social Media Framework. Experimentation was required within the Design Cycle of the DSR methodology, which lead to the adoption of the Experimental Research methodology within this cycle. An initial literature review identified regression models for property price prediction. Through experimentation, Gradient Boosting regression was identified as an optimal regression model for this purpose. Thereafter a review of sentiment analysis models was conducted which resulted in the proposal of a CNN-LSTM model for the classification of Tweets. Initial experimentation conducted with this proposed model resulted in an obtained accuracy comparable to the top performing sentiment analysis models identified. A dataset obtained through SemEval, a series of evaluations of computational semantic analysis systems, was used for this phase. For the final experimentation, The CNN-LSTM model was used to obtain sentiment variables from Tweets that were collected from the Western Cape Province in 2017. This property dataset was augmented with the sentiment variables, after which experimentation was conducted by applying Gradient Boosting regression. The augmentation was done in two ways, either based on suburb pertaining to the property, or to the month in which the property was transferred. The results indicate that a model for Property Price Prediction Utilising Sentiment Analysis demonstrates a small improvement when suburb-based sentiment, obtained from Tweets with a minimum threshold per suburb, is utilised. An important finding was the fact that, when geo-coordinates are removed from the dataset, the sentiment variables replace them in the regression results, producing the same level as accuracy as when the coordinates are included

    Tensor Representations for Object Classification and Detection

    Get PDF
    A key problem in object recognition is finding a suitable object representation. For historical and computational reasons, vector descriptions that encode particular statistical properties of the data have been broadly applied. However, employing tensor representation can describe the interactions of multiple factors inherent to image formation. One of the most convenient uses for tensors is to represent complex objects in order to build a discriminative description. Thus thesis has several main contributions, focusing on visual data detection (e.g. of heads or pedestrians) and classification (e.g. of head or human body orientation) in still images and on machine learning techniques to analyse tensor data. These applications are among the most studied in computer vision and are typically formulated as binary or multi-class classification problems. The applicative context of this thesis is the video surveillance, where classification and detection tasks can be very hard, due to the scarce resolution and the noise characterising sensor data. Therefore, the main goal in that context is to design algorithms that can characterise different objects of interest, especially when immersed in a cluttered background and captured at low resolution. In the different amount of machine learning approaches, the ensemble-of-classifiers demonstrated to reach excellent classification accuracy, good generalisation ability, and robustness of noisy data. For these reasons, some approaches in that class have been adopted as basic machine classification frameworks to build robust classifiers and detectors. Moreover, also kernel machines has been exploited for classification purposes, since they represent a natural learning framework for tensors
    corecore