10 research outputs found
Generalized Boosting Algorithms for Convex Optimization
Boosting is a popular way to derive powerful learners from simpler hypothesis
classes. Following previous work (Mason et al., 1999; Friedman, 2000) on
general boosting frameworks, we analyze gradient-based descent algorithms for
boosting with respect to any convex objective and introduce a new measure of
weak learner performance into this setting which generalizes existing work. We
present the weak to strong learning guarantees for the existing gradient
boosting work for strongly-smooth, strongly-convex objectives under this new
measure of performance, and also demonstrate that this work fails for
non-smooth objectives. To address this issue, we present new algorithms which
extend this boosting approach to arbitrary convex loss functions and give
corresponding weak to strong convergence results. In addition, we demonstrate
experimental results that support our analysis and demonstrate the need for the
new algorithms we present.Comment: Extended version of paper presented at the International Conference
on Machine Learning, 2011. 9 pages + appendix with proof
Wide Boosting
Gradient boosting (GB) is a popular methodology used to solve prediction
problems through minimization of a differentiable loss function, . GB is
especially performant in low and medium dimension problems. This paper presents
a simple adjustment to GB motivated in part by artificial neural networks.
Specifically, our adjustment inserts a square or rectangular matrix
multiplication between the output of a GB model and the loss, . This allows
the output of a GB model to have increased dimension prior to being fed into
the loss and is thus "wider" than standard GB implementations. We provide
performance comparisons on several publicly available datasets. When using the
same tuning methodology and same maximum boosting rounds, Wide Boosting
outperforms standard GB in every dataset we try.Comment: Gradient Boosting, Wide Neural Network
Supervised Machine Learning Under Test-Time Resource Constraints: A Trade-off Between Accuracy and Cost
The past decade has witnessed how the field of machine learning has established itself as a necessary component in several multi-billion-dollar industries. The real-world industrial setting introduces an interesting new problem to machine learning research: computational resources must be budgeted and cost must be strictly accounted for during test-time. A typical problem is that if an application consumes x additional units of cost during test-time, but will improve accuracy by y percent, should the additional x resources be allocated? The core of this problem is a trade-off between accuracy and cost. In this thesis, we examine components of test-time cost, and develop different strategies to manage this trade-off.
We first investigate test-time cost and discover that it typically consists of two parts: feature extraction cost and classifier evaluation cost. The former reflects the computational efforts of transforming data instances to feature vectors, and could be highly variable when features are heterogeneous. The latter reflects the effort of evaluating a classifier, which could be substantial, in particular nonparametric algorithms. We then propose three strategies to explicitly trade-off accuracy and the two components of test-time cost during classifier training.
To budget the feature extraction cost, we first introduce two algorithms: GreedyMiser and Anytime Representation Learning (AFR). GreedyMiser employs a strategy that incorporates the extraction cost information during classifier training to explicitly minimize the test-time cost. AFR extends GreedyMiser to learn a cost-sensitive feature representation rather than a classifier, and turns traditional Support Vector Machines (SVM) into test- time cost-sensitive anytime classifiers. GreedyMiser and AFR are evaluated on two real-world data sets from two different application domains, and both achieve record performance.
We then introduce Cost Sensitive Tree of Classifiers (CSTC) and Cost Sensitive Cascade of Classifiers (CSCC), which share a common strategy that trades-off the accuracy and the amortized test-time cost. CSTC introduces a tree structure and directs test inputs along different tree traversal paths, each is optimized for a specific sub-partition of the input space, extracting different, specialized subsets of features. CSCC extends CSTC and builds a linear cascade, instead of a tree, to cope with class-imbalanced binary classification tasks. Since both CSTC and CSCC extract different features for different inputs, the amortized test-time cost is greatly reduced while maintaining high accuracy. Both approaches out-perform the current state-of-the-art on real-world data sets.
To trade-off accuracy and high classifier evaluation cost of nonparametric classifiers, we propose a model compression strategy and develop Compressed Vector Machines (CVM). CVM focuses on the nonparametric kernel Support Vector Machines (SVM), whose test-time evaluation cost is typically substantial when learned from large training sets. CVM is a post-processing algorithm which compresses the learned SVM model by reducing and optimizing support vectors. On several benchmark data sets, CVM maintains high test accuracy while reducing the test-time evaluation cost by several orders of magnitude
Property price prediction: a model utilising sentiment analysis
The increase in the use of social media has led many researchers and companies to investigate the potential uses of the data that is generated by these social media platforms. This research study investigates how the use of sentiment variables, obtained from the social media platform Twitter, can be used to augment housing transfer data in order to develop a predictive model. The Design Science Research (DSR) methodology was followed, guided by a Social Media Framework. Experimentation was required within the Design Cycle of the DSR methodology, which lead to the adoption of the Experimental Research methodology within this cycle. An initial literature review identified regression models for property price prediction. Through experimentation, Gradient Boosting regression was identified as an optimal regression model for this purpose. Thereafter a review of sentiment analysis models was conducted which resulted in the proposal of a CNN-LSTM model for the classification of Tweets. Initial experimentation conducted with this proposed model resulted in an obtained accuracy comparable to the top performing sentiment analysis models identified. A dataset obtained through SemEval, a series of evaluations of computational semantic analysis systems, was used for this phase. For the final experimentation, The CNN-LSTM model was used to obtain sentiment variables from Tweets that were collected from the Western Cape Province in 2017. This property dataset was augmented with the sentiment variables, after which experimentation was conducted by applying Gradient Boosting regression. The augmentation was done in two ways, either based on suburb pertaining to the property, or to the month in which the property was transferred. The results indicate that a model for Property Price Prediction Utilising Sentiment Analysis demonstrates a small improvement when suburb-based sentiment, obtained from Tweets with a minimum threshold per suburb, is utilised. An important finding was the fact that, when geo-coordinates are removed from the dataset, the sentiment variables replace them in the regression results, producing the same level as accuracy as when the coordinates are included
Tensor Representations for Object Classification and Detection
A key problem in object recognition is finding a suitable object representation.
For historical and computational reasons, vector descriptions that encode particular
statistical properties of the data have been broadly applied. However, employing
tensor representation can describe the interactions of multiple factors
inherent to image formation. One of the most convenient uses for tensors is to represent
complex objects in order to build a discriminative description.
Thus thesis has several main contributions, focusing on visual data detection (e.g. of heads or pedestrians) and classification (e.g. of head or human body orientation) in still images and on machine learning techniques to analyse tensor data. These applications are among the most studied in computer vision and are typically formulated as binary or multi-class classification problems.
The applicative context of this thesis is the video surveillance, where classification and detection tasks
can be very hard, due to the scarce resolution and the noise characterising
sensor data. Therefore, the main goal in that context is to design algorithms that can
characterise different objects of interest, especially when immersed in a cluttered
background and captured at low resolution.
In the different amount of machine learning approaches, the ensemble-of-classifiers demonstrated to reach
excellent classification accuracy, good generalisation ability, and robustness of noisy data. For these
reasons, some approaches in that class have been adopted as basic machine classification
frameworks to build robust classifiers and detectors. Moreover, also
kernel machines has been exploited for classification purposes,
since they represent a natural learning framework for tensors