1,158 research outputs found
Bag-Level Aggregation for Multiple Instance Active Learning in Instance Classification Problems
A growing number of applications, e.g. video surveillance and medical image
analysis, require training recognition systems from large amounts of weakly
annotated data while some targeted interactions with a domain expert are
allowed to improve the training process. In such cases, active learning (AL)
can reduce labeling costs for training a classifier by querying the expert to
provide the labels of most informative instances. This paper focuses on AL
methods for instance classification problems in multiple instance learning
(MIL), where data is arranged into sets, called bags, that are weakly labeled.
Most AL methods focus on single instance learning problems. These methods are
not suitable for MIL problems because they cannot account for the bag structure
of data. In this paper, new methods for bag-level aggregation of instance
informativeness are proposed for multiple instance active learning (MIAL). The
\textit{aggregated informativeness} method identifies the most informative
instances based on classifier uncertainty, and queries bags incorporating the
most information. The other proposed method, called \textit{cluster-based
aggregative sampling}, clusters data hierarchically in the instance space. The
informativeness of instances is assessed by considering bag labels, inferred
instance labels, and the proportion of labels that remain to be discovered in
clusters. Both proposed methods significantly outperform reference methods in
extensive experiments using benchmark data from several application domains.
Results indicate that using an appropriate strategy to address MIAL problems
yields a significant reduction in the number of queries needed to achieve the
same level of performance as single instance AL methods
Machine Learning in Resource-constrained Devices: Algorithms, Strategies, and Applications
The ever-increasing growth of technologies is changing people's everyday life. As a major consequence: 1) the amount of available data is growing and 2) several applications rely on battery supplied devices that are required to process data in real time. In this scenario the need for ad-hoc strategies for the development of low-power and low-latency intelligent systems capable of learning inductive rules from data using a modest mount of computational resources is becoming vital. At the same time, one needs to develop specic methodologies to manage complex patterns such as text and images.
This Thesis presents different approaches and techniques for the development of fast learning models explicitly designed to be hosted on embedded systems. The proposed methods proved able to achieve state-of-the-art performances in term of the trade-off between generalization capabilities and area requirements when implemented in low-cost digital devices. In addition, advanced strategies for ecient sentiment analysis in text and images are proposed
Evolutionary Optimization Of Support Vector Machines
Support vector machines are a relatively new approach for creating classifiers that have become increasingly popular in the machine learning community. They present several advantages over other methods like neural networks in areas like training speed, convergence, complexity control of the classifier, as well as a stronger mathematical background based on optimization and statistical learning theory. This thesis deals with the problem of model selection with support vector machines, that is, the problem of finding the optimal parameters that will improve the performance of the algorithm. It is shown that genetic algorithms provide an effective way to find the optimal parameters for support vector machines. The proposed algorithm is compared with a backpropagation Neural Network in a dataset that represents individual models for electronic commerce
Longitudinal tracking of physiological state with electromyographic signals.
Electrophysiological measurements have been used in recent history to classify instantaneous physiological configurations, e.g., hand gestures. This work investigates the feasibility of working with changes in physiological configurations over time (i.e., longitudinally) using a variety of algorithms from the machine learning domain. We demonstrate a high degree of classification accuracy for a binary classification problem derived from electromyography measurements before and after a 35-day bedrest. The problem difficulty is increased with a more dynamic experiment testing for changes in astronaut sensorimotor performance by taking electromyography and force plate measurements before, during, and after a jump from a small platform. A LASSO regularization is performed to observe changes in relationship between electromyography features and force plate outcomes. SVM classifiers are employed to correctly identify the times at which these experiments are performed, which is important as these indicate a trajectory of adaptation
Estimating Confidences for Classifier Decisions using Extreme Value Theory
Classifiers generally lack a mechanism to compute decision confidences. As humans, when we sense that the confidence for a decision is low, we either conduct additional actions to improve our confidence or dismiss the decision. While this reasoning is natural to us, it is currently missing in most common decision algorithms (i.e., classifiers) used in computer vision or machine learning. This limits the capability for a machine to take further actions to either improve a result or dismiss the decision. In this thesis, we design algorithms for estimating the confidence for decisions made by classifiers such as nearest-neighbor or support vector machines. We developed these algorithms leveraging the theory of extreme values. We use the statistical models that this theory provides for modeling the classifier's decision scores for correct and incorrect outcomes. Our proposed algorithms exploit these statistical models in order to compute a correctness belief: the probability that the classifier's decision is correct. In this work, we show how these beliefs can be used to filter bad classifications and to speed up robust estimations via sample and consensus algorithms, which are used in computer vision for estimating camera motions and for reconstructing the scene's 3D structure. Moreover, we show how these beliefs improve the classification accuracy of one-class support vector machines. In conclusion, we show that extreme value theory leads to powerful mechanisms that can predict the correctness of a classifier's decision
Investigation into the Application of Personality Insights and Language Tone Analysis in Spam Classification
Due to its persistence spam remains as one of the biggest problems facing users and suppliers of email communication services. Machine learning techniques have been very successful at preventing many spam mails from arriving in user mailboxes, however they still account for over 50% of all emails sent. Despite this relative success the economic cost of spam has been estimated as high as 20 billion so spam can still be considered a considerable problem. In essence a spam email is a commercial communication trying to entice the receiver to take some positive action. This project uses the text from emails and creates personality insight and language tone scores through the use of IBM Watsons’ Tone Analyzer API. Those scores are used to investigate whether the language used in emails can be transformed into useful features that can be used to correctly classify them as spam or genuine emails. And during the course of this investigation a range of machine learning techniques are applied. Results from this experiment found that where just the personality insight and language tone features are used in the model some promising results with one dataset were shown. However over all datasets results were inconclusive with this model. Furthermore it was found that in a model where these features were used in combination with a normalised term-frequency feature-set no real improvement in the classification performance was shown
- …