47 research outputs found
Grinding wheel condition monitoring with boosted classifiers
In this thesis, two data sets collected in grinding process under different cutting and wheel conditions were studied. One is the cutting forces in three directions, i.e. X, Y and Z, collected under two different cutting conditions. The other one is the acoustic emission (AE) signals collected under different wheel conditions(sharp and dull). For the goal of grinding wheel condition monitoring, the regression model with autocorrelated errors was proved to be effective and was used to extract features from signals in this study. The coefficients of the models served as the features used in the classification step that employed boosting method. Based on the AdaBoost and A-boosting algorithms which can only be used in two classes situation, two improved boosting methods called Adaboost-M and A-boosting-M, which can be used to classify multiple classes, are proposed. With the forces data set, we compared Adaboost-M and A-boosting-M against the traditional AdaBoost.M1 and the corresponding weak learners(KNN and Prototype). The accuracies of Adaboost-M and A-boosting-M are higher than that of AdaBoost.M1 and the weak learners in our application. With the AE data set, our focus is to recognize the signals collected when the wheels were dull from the signals collected when the wheels were sharp. The AdaBoost, A-boosting and the corresponding weak learners(KNN and Proto) were used. The results indicate that (i) boosting does not improve the effectiveness of k-nearest neighbor but greatly improve the effectives of the prototype classifier, (ii) depending upon the data, AdaBoost or A-Boosting might produce higher classification accuracy, (iii) the error of false positive is higher than the error of false negative for the better classifiers. Based on the study, the combined use of AR models for feature extraction and boosted algorithms for classification are proved to be a viable approach for grinding wheel condition monitoring
Adabook and Multibook: adaptive boosting with chance correction
There has been considerable interest in boosting and bagging, including the combination of the adaptive
techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same
time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa,
Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning
algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a
weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and
when this happens the booster can give up too early. This phenomenon is known to occur with conventional
Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems
using restart techniques based on bagging. This paper thus complements the theoretical work showing the
necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a
chance-corrected measure can improve boosting. We show that the early surrender problem occurs in
MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard
Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when
Cost-Sensitive Boosting for Classification of Imbalanced Data
The classification of data with imbalanced class distributions has
posed a significant drawback in the performance attainable by most
well-developed classification systems, which assume relatively
balanced class distributions. This problem is especially crucial
in many application domains, such as medical diagnosis, fraud
detection, network intrusion, etc., which are of great importance
in machine learning and data mining.
This thesis explores meta-techniques which are applicable to most
classifier learning algorithms, with the aim to advance the
classification of imbalanced data. Boosting is a powerful
meta-technique to learn an ensemble of weak models with a promise
of improving the classification accuracy. AdaBoost has been taken
as the most successful boosting algorithm. This thesis starts with
applying AdaBoost to an associative classifier for both learning
time reduction and accuracy improvement. However, the promise of
accuracy improvement is trivial in the context of the class
imbalance problem, where accuracy is less meaningful. The insight
gained from a comprehensive analysis on the boosting strategy of
AdaBoost leads to the investigation of cost-sensitive boosting
algorithms, which are developed by introducing cost items into the
learning framework of AdaBoost. The cost items are used to denote
the uneven identification importance among classes, such that the
boosting strategies can intentionally bias the learning towards
classes associated with higher identification importance and
eventually improve the identification performance on them. Given
an application domain, cost values with respect to different types
of samples are usually unavailable for applying the proposed
cost-sensitive boosting algorithms. To set up the effective cost
values, empirical methods are used for bi-class applications and
heuristic searching of the Genetic Algorithm is employed for
multi-class applications.
This thesis also covers the implementation of the proposed
cost-sensitive boosting algorithms. It ends with a discussion on
the experimental results of classification of real-world
imbalanced data. Compared with existing algorithms, the new
algorithms this thesis presents are superior in achieving better
measurements regarding the learning objectives
Totally Corrective Multiclass Boosting with Binary Weak Learners
In this work, we propose a new optimization framework for multiclass boosting
learning. In the literature, AdaBoost.MO and AdaBoost.ECC are the two
successful multiclass boosting algorithms, which can use binary weak learners.
We explicitly derive these two algorithms' Lagrange dual problems based on
their regularized loss functions. We show that the Lagrange dual formulations
enable us to design totally-corrective multiclass algorithms by using the
primal-dual optimization technique. Experiments on benchmark data sets suggest
that our multiclass boosting can achieve a comparable generalization capability
with state-of-the-art, but the convergence speed is much faster than stage-wise
gradient descent boosting. In other words, the new totally corrective
algorithms can maximize the margin more aggressively.Comment: 11 page
Features and Algorithms for Visual Parsing of Handwritten Mathematical Expressions
Math expressions are an essential part of scientific documents. Handwritten math expressions recognition can benefit human-computer interaction especially in the education domain and is a critical part of document recognition and analysis.
Parsing the spatial arrangement of symbols is an essential part of math expression recognition. A variety of parsing techniques have been developed during the past three decades, and fall into two groups. The first group is graph-based parsing. It selects a path or sub-graph which obeys some rule to form a possible interpretation for the given expression. The second group is grammar driven parsing. Grammars and related parameters are defined manually for different tasks. The time complexity of these two groups parsing is high, and they often impose some strict constraints to reduce the computation.
The aim of this thesis is working towards building a straightforward and effective parser with as few constraints as possible. First, we propose using a line of sight graph for representing the layout of strokes and symbols in math expressions. It achieves higher F-score than other graph representations and reduces search space for parsing. Second, we modify the shape context feature with Parzen window density estimation. This feature set works well for symbol segmentation, symbol classification and symbol layout analysis. We get a higher symbol segmentation F-score than other systems on CROHME 2014 dataset. Finally, we develop a Maximum Spanning Tree (MST) based parser using Edmonds\u27 algorithm, which extracts an MST from the directed line of sight graph in two passes: first symbols are segmented, and then symbols and spatial relationship are labeled. The time complexity of our MST-based parsing is lower than the time complexity of CYK parsing with context-free grammars. Also, our MST-based parsing obtains higher structure rate and expression rate than CYK parsing when symbol segmentation is accurate. Correct structure means we get the structure of the symbol layout tree correct, even though the label of the edge in the symbol layout tree might be wrong. The performance of our math expression recognition system with MST-based parsing is competitive on CROHME 2012 and 2014 datasets.
For future work, how to incorporate symbol classifier result and correct segmentation error in MST-based parsing needs more research