10 research outputs found

    Exploiting Universum data in AdaBoost using gradient descent

    Full text link
    Recently, Universum data that does not belong to any class of the training data, has been applied for training better classifiers. In this paper, we address a novel boosting algorithm called UAdaBoost that can improve the classification performance of AdaBoost with Universum data. UAdaBoost chooses a function by minimizing the loss for labeled data and Universum data. The cost function is minimized by a greedy, stagewise, functional gradient procedure. Each training stage of UAdaBoost is fast and efficient. The standard AdaBoost weights labeled samples during training iterations while UAdaBoost gives an explicit weighting scheme for Universum samples as well. In addition, this paper describes the practical conditions for the effectiveness of Universum learning. These conditions are based on the analysis of the distribution of ensemble predictions over training samples. Experiments on handwritten digits classification and gender classification problems are presented. As exhibited by our experimental results, the proposed method can obtain superior performances over the standard AdaBoost by selecting proper Universum data. © 2014 Elsevier B.V

    Review on Machine Learning Algorithms for Weather Forecasting Issues

    Get PDF
    Machine leaning is a ground of recent research that officially focuses on the theory, performance, and properties of learning systems and algorithms. It is particularly cross disciplinary field building upon ideas from many different kinds of fields such as artificial intelligence, optimization theory, information theory, statistics, cognitive science, optimal control, and many other disciplines of science, engineering, and mathematics. Since implementation in a wide range of applications, machine learning has covered almost every scientific domain, which has brought great impact on the science and society. Machine learning techniques has been used on a variety of problems, including recommendation engines, recognition systems, informatics and data mining, and autonomous control systems. This research paper compared different machine algorithms for classification. Classification is used when the desired output is a discrete label

    Semi-supervised machine learning techniques for classification of evolving data in pattern recognition

    Get PDF
    The amount of data recorded and processed over recent years has increased exponentially. To create intelligent systems that can learn from this data, we need to be able to identify patterns hidden in the data itself, learn these pattern and predict future results based on our current observations. If we think about this system in the context of time, the data itself evolves and so does the nature of the classification problem. As more data become available, different classification algorithms are suitable for a particular setting. At the beginning of the learning cycle when we have a limited amount of data, online learning algorithms are more suitable. When truly large amounts of data become available, we need algorithms that can handle large amounts of data that might be only partially labeled as a result of the bottleneck in the learning pipeline from human labeling of the data. An excellent example of evolving data is gesture recognition, and it is present throughout our work. We need a gesture recognition system to work fast and with very few examples at the beginning. Over time, we are able to collect more data and the system can improve. As the system evolves, the user expects it to work better and not to have to become involved when the classifier is unsure about decisions. This latter situation produces additional unlabeled data. Another example of an application is medical classification, where experts’ time is a rare resource and the amount of received and labeled data disproportionately increases over time. Although the process of data evolution is continuous, we identify three main discrete areas of contribution in different scenarios. When the system is very new and not enough data are available, online learning is used to learn after every single example and to capture the knowledge very fast. With increasing amounts of data, offline learning techniques are applicable. Once the amount of data is overwhelming and the teacher cannot provide labels for all the data, we have another setup that combines labeled and unlabeled data. These three setups define our areas of contribution; and our techniques contribute in each of them with applications to pattern recognition scenarios, such as gesture recognition and sketch recognition. An online learning setup significantly restricts the range of techniques that can be used. In our case, the selected baseline technique is the Evolving TS-Fuzzy Model. The semi-supervised aspect we use is a relation between rules created by this model. Specifically, we propose a transductive similarity model that utilizes the relationship between generated rules based on their decisions about a query sample during the inference time. The activation of each of these rules is adjusted according to the transductive similarity, and the new decision is obtained using the adjusted activation. We also propose several new variations to the transductive similarity itself. Once the amount of data increases, we are not limited to the online learning setup, and we can take advantage of the offline learning scenario, which normally performs better than the online one because of the independence of sample ordering and global optimization with respect to all samples. We use generative methods to obtain data outside of the training set. Specifically, we aim to improve the previously mentioned TS Fuzzy Model by incorporating semi-supervised learning in the offline learning setup without unlabeled data. We use the Universum learning approach and have developed a method called UFuzzy. This method relies on artificially generated examples with high uncertainty (Universum set), and it adjusts the cost function of the algorithm to force the decision boundary to be close to the Universum data. We were able to prove the hypothesis behind the design of the UFuzzy classifier that Universum learning can improve the TS Fuzzy Model and have achieved improved performance on more than two dozen datasets and applications. With increasing amounts of data, we use the last scenario, in which the data comprises both labeled data and additional non-labeled data. This setting is one of the most common ones for semi-supervised learning problems. In this part of our work, we aim to improve the widely popular tecjniques of self-training (and its successor help-training) that are both meta-frameworks over regular classifier methods but require probabilistic representation of output, which can be hard to obtain in the case of discriminative classifiers. Therefore, we develop a new algorithm that uses the modified active learning technique Query-by-Committee (QbC) to sample data with high certainty from the unlabeled set and subsequently embed them into the original training set. Our new method allows us to achieve increased performance over both a range of datasets and a range of classifiers. These three works are connected by gradually relaxing the constraints on the learning setting in which we operate. Although our main motivation behind the development was to increase performance in various real-world tasks (gesture recognition, sketch recognition), we formulated our work as general methods in such a way that they can be used outside a specific application setup, the only restriction being that the underlying data evolve over time. Each of these methods can successfully exist on its own. The best setting in which they can be used is a learning problem where the data evolve over time and it is possible to discretize the evolutionary process. Overall, this work represents a significant contribution to the area of both semi-supervised learning and pattern recognition. It presents new state-of-the-art techniques that overperform baseline solutions, and it opens up new possibilities for future research

    Online incremental learning from scratch with application to handwritten gesture recognition

    Get PDF
    In recent years we have witnessed a significant growth of data to be processed by machines in order to extract all the knowledge from those data. For agile decision making, the arrival of new data is not bounded by time, and their characteristics may change over time. The knowledge stored in machines needs to be kept up to date. In traditional machine learning, this can be done by re-learning on new data. However, with the constant occurrence of high data volume, the process becomes overloaded, and this kind of task becomes unfeasible. Online learning methods have been proposed not only to help with the processing of high data volume, but also to become more agile and adaptive towards the changing nature of data. Online learning brings simplicity to the updates of the model by decoupling a single update from the whole updating process. Such a learning process is often referred to as incremental learning, where the machine learns by small increments to build the whole knowledge. The aim of this research is to contribute to online incremental learning for pattern recognition and more specifically, by handwritten symbol recognition. In this work, we focus on specific problems of online learning related to the stability-plasticity dilemma and fast processing. Driven by the application and the focus of this research, we apply our solutions to Neuro-Fuzzy models. The concept of Neuro-Fuzzy modeling is based on dividing the space of inseparable and overlying classes into sub-problems governed by fuzzy rules, where the classification is handled by a simple linear model and then used for a combined result of the model, i.e., only a partially linear one. Each handwritten symbol can be represented by a number of sets of symbols of one class that resemble each other but vary internally. Thus, each class appears difficult to be described by a simple model. By dividing it into a number of simpler problems, the task of describing the class is more feasible, which is our main motivation for choosing Neuro-Fuzzy models. To contribute to real-time processing, while at the same time maintaining a high recognition rate, we developed Incremental Similarity, a similarity measure using incremental learning and, most importantly, simple updates. Our solution has been applied to a number of models and has shown superior results. Often, the distribution of the classes is not uniform, i.e., there are blocks of occurrences and non-occurrences of some classes. As a result, if any given class is not used for a period of time, it will be forgotten. Since the models used in our research use Recursive Least Squares, we proposed Elastic Memory Learning, a method for this kind of optimization and, we have achieved significantly better results. The use of hyper-parameters tends to be a necessity for many models. However, the fixing of these parameters is performed by a cross-validation. In online learning, cross-validation is not possible, especially for real-time learning. In our work we have developed a new model that instead of fixing its hyper-parameters uses them as parameters that are learned in an automatic way according to the changing trends in the data. From there, the whole structure of the model needs to be self-adapted each time, which yields our proposal of a self-organized Neuro-Fuzzy model (SO-ARTIST). Since online incremental learning learns from one data point at a time, at the beginning there is only one. This is referred to as starting from scratch and leads to low generalization of the model at the beginning of the learning process, i.e., high variance. In our work we integrated a model based on kinematic theory to generate synthetic data into the online learning pipeline, and this has led to significantly lower variance at the initial stage of the learning process. Altogether, this work has contributed a number of novel methods in the area of online learning that have been published in international journals and presented at international conferences. The main goal of this thesis has been fulfilled and all the objectives have been tackled. Our results have shown significant impact in this area

    Tensor Representations for Object Classification and Detection

    Get PDF
    A key problem in object recognition is finding a suitable object representation. For historical and computational reasons, vector descriptions that encode particular statistical properties of the data have been broadly applied. However, employing tensor representation can describe the interactions of multiple factors inherent to image formation. One of the most convenient uses for tensors is to represent complex objects in order to build a discriminative description. Thus thesis has several main contributions, focusing on visual data detection (e.g. of heads or pedestrians) and classification (e.g. of head or human body orientation) in still images and on machine learning techniques to analyse tensor data. These applications are among the most studied in computer vision and are typically formulated as binary or multi-class classification problems. The applicative context of this thesis is the video surveillance, where classification and detection tasks can be very hard, due to the scarce resolution and the noise characterising sensor data. Therefore, the main goal in that context is to design algorithms that can characterise different objects of interest, especially when immersed in a cluttered background and captured at low resolution. In the different amount of machine learning approaches, the ensemble-of-classifiers demonstrated to reach excellent classification accuracy, good generalisation ability, and robustness of noisy data. For these reasons, some approaches in that class have been adopted as basic machine classification frameworks to build robust classifiers and detectors. Moreover, also kernel machines has been exploited for classification purposes, since they represent a natural learning framework for tensors

    Drawing, Handwriting Processing Analysis: New Advances and Challenges

    No full text
    International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Cost-sensitive classification based on Bregman divergences

    Get PDF
    The main object of this PhD. Thesis is the identification, characterization and study of new loss functions to address the so-called cost-sensitive classification. Many decision problems are intrinsically cost-sensitive. However, the dominating preference for cost-insensitive methods in the machine learning literature is a natural consequence of the fact that true costs in real applications are di fficult to evaluate. Since, in general, uncovering the correct class of the data is less costly than any decision error, designing low error decision systems is a reasonable (but suboptimal) approach. For instance, consider the classification of credit applicants as either being good customers (will pay back the credit) or bad customers (will fail to pay o part of the credit). The cost of classifying one risky borrower as good could be much higher than the cost of classifying a potentially good customer as bad. Our proposal relies on Bayes decision theory where the goal is to assign instances to the class with minimum expected cost. The decision is made involving both costs and posterior probabilities of the classes. Obtaining calibrated probability estimates at the classifier output requires a suitable learning machine, a large enough representative data set as well as an adequate loss function to be minimized during learning. The design of the loss function can be aided by the costs: classical decision theory shows that cost matrices de ne class boundaries determined by posterior class probability estimates. Strictly speaking, in order to make optimal decisions, accurate probability estimates are only required near the decision boundaries. It is key to point out that the election of the loss function becomes especially relevant when the prior knowledge about the problem is limited or the available training examples are somehow unsuitable. In those cases, different loss functions lead to dramatically different posterior probabilities estimates. We focus our study on the set of Bregman divergences. These divergences offer a rich family of proper losses that has recently become very popular in the machine learning community [Nock and Nielsen, 2009, Reid and Williamson, 2009a]. The first part of the Thesis deals with the development of a novel parametric family of multiclass Bregman divergences which captures the information in the cost matrix, so that the loss function is adapted to each specific problem. Multiclass costsensitive learning is one of the main challenges in cost-sensitive learning and, through this parametric family, we provide a natural framework to successfully overcome binary tasks. Following this idea, two lines are explored: Cost-sensitive supervised classification: We derive several asymptotic results. The first analysis guarantees that the proposed Bregman divergence has maximum sensitivity to changes at probability vectors near the decision regions. Further analysis shows that the optimization of this Bregman divergence becomes equivalent to minimizing the overall cost regret in non-separable problems, and to maximizing a margin in separable problems. Cost-sensitive semi-supervised classification: When labeled data is scarce but unlabeled data is widely available, semi-supervised learning is an useful tool to make the most of the unlabeled data. We discuss an optimization problem relying on the minimization of our parametric family of Bregman divergences, using both labeled and unlabeled data, based on what is called the Entropy Minimization principle. We propose the rst multiclass cost-sensitive semi-supervised algorithm, under the assumption that inter-class separation is stronger than intra-class separation. The second part of the Thesis deals with the transformation of this parametric family of Bregman divergences into a sequence of Bregman divergences. Work along this line can be further divided into two additional areas: Foundations of sequences of Bregman divergences: We generalize some previous results about the design and characterization of Bregman divergences that are suitable for learning and their relationship with convexity. In addition, we aim to broaden the subset of Bregman divergences that are interesting for cost-sensitive learning. Under very general conditions, we nd sequences of (cost-sensitive) Bregman divergences, whose minimization provides minimum (cost-sensitive) risk for non-separable problems and some type of maximum margin classifiers in separable cases. Learning with example-dependent costs: A strong assumption is widespread through most cost-sensitive learning algorithms: misclassification costs are the same for all examples. In many cases this statement is not true. We claim that using the example-dependent costs directly is more natural and will lead to the production of more accurate classifiers. For these reasons, we consider the extension of cost-sensitive sequences of Bregman losses to example-dependent cost scenarios to generate finely tuned posterior probability estimates
    corecore