25 research outputs found
Support matrix machine: A review
Support vector machine (SVM) is one of the most studied paradigms in the
realm of machine learning for classification and regression problems. It relies
on vectorized input data. However, a significant portion of the real-world data
exists in matrix format, which is given as input to SVM by reshaping the
matrices into vectors. The process of reshaping disrupts the spatial
correlations inherent in the matrix data. Also, converting matrices into
vectors results in input data with a high dimensionality, which introduces
significant computational complexity. To overcome these issues in classifying
matrix input data, support matrix machine (SMM) is proposed. It represents one
of the emerging methodologies tailored for handling matrix input data. The SMM
method preserves the structural information of the matrix data by using the
spectral elastic net property which is a combination of the nuclear norm and
Frobenius norm. This article provides the first in-depth analysis of the
development of the SMM model, which can be used as a thorough summary by both
novices and experts. We discuss numerous SMM variants, such as robust, sparse,
class imbalance, and multi-class classification models. We also analyze the
applications of the SMM model and conclude the article by outlining potential
future research avenues and possibilities that may motivate academics to
advance the SMM algorithm
Semi-supervised machine learning techniques for classification of evolving data in pattern recognition
The amount of data recorded and processed over recent years has increased exponentially. To create intelligent systems that can learn from this data, we need to be able to identify patterns hidden in the data itself, learn these pattern and predict future results based on our current observations. If we think about this system in the context of time, the data itself evolves and so does the nature of the classification problem. As more data become available, different classification algorithms are suitable for a particular setting. At the beginning of the learning cycle when we have a limited amount of data, online learning algorithms are more suitable. When truly large amounts of data become available, we need algorithms that can handle large amounts of data that might be only partially labeled as a result of the bottleneck in the learning pipeline from human labeling of the data.
An excellent example of evolving data is gesture recognition, and it is present throughout our work. We need a gesture recognition system to work fast and with very few examples at the beginning. Over time, we are able to collect more data and the system can improve. As the system evolves, the user expects it to work better and not to have to become involved when the classifier is unsure about decisions. This latter situation produces additional unlabeled data. Another example of an application is medical classification, where expertsâ time is a rare resource and the amount of received and labeled data disproportionately increases over time.
Although the process of data evolution is continuous, we identify three main discrete areas of contribution in different scenarios. When the system is very new and not enough data are available, online learning is used to learn after every single example and to capture the knowledge very fast. With increasing amounts of data, offline learning techniques are applicable. Once the amount of data is overwhelming and the teacher cannot provide labels for all the data, we have another setup that combines labeled and unlabeled data. These three setups define our areas of contribution; and our techniques contribute in each of them with applications to pattern recognition scenarios, such as gesture recognition and sketch recognition.
An online learning setup significantly restricts the range of techniques that can be used. In our case, the selected baseline technique is the Evolving TS-Fuzzy Model. The semi-supervised aspect we use is a relation between rules created by this model. Specifically, we propose a transductive similarity model that utilizes the relationship between generated rules based on their decisions about a query sample during the inference time. The activation of each of these rules is adjusted according to the transductive similarity, and the new decision is obtained using the adjusted activation. We also propose several new variations to the transductive similarity itself.
Once the amount of data increases, we are not limited to the online learning setup, and we can take advantage of the offline learning scenario, which normally performs better than the online one because of the independence of sample ordering and global optimization with respect to all samples. We use generative methods to obtain data outside of the training set. Specifically, we aim to improve the previously mentioned TS Fuzzy Model by incorporating semi-supervised learning in the offline learning setup without unlabeled data. We use the Universum learning approach and have developed a method called UFuzzy. This method relies on artificially generated examples with high uncertainty (Universum set), and it adjusts the cost function of the algorithm to force the decision boundary to be close to the Universum data. We were able to prove the hypothesis behind the design of the UFuzzy classifier that Universum learning can improve the TS Fuzzy Model and have achieved improved performance on more than two dozen datasets and applications.
With increasing amounts of data, we use the last scenario, in which the data comprises both labeled data and additional non-labeled data. This setting is one of the most common ones for semi-supervised learning problems. In this part of our work, we aim to improve the widely popular tecjniques of self-training (and its successor help-training) that are both meta-frameworks over regular classifier methods but require probabilistic representation of output, which can be hard to obtain in the case of discriminative classifiers. Therefore, we develop a new algorithm that uses the modified active learning technique Query-by-Committee (QbC) to sample data with high certainty from the unlabeled set and subsequently embed them into the original training set. Our new method allows us to achieve increased performance over both a range of datasets and a range of classifiers.
These three works are connected by gradually relaxing the constraints on the learning setting in which we operate. Although our main motivation behind the development was to increase performance in various real-world tasks (gesture recognition, sketch recognition), we formulated our work as general methods in such a way that they can be used outside a specific application setup, the only restriction being that the underlying data evolve over time. Each of these methods can successfully exist on its own. The best setting in which they can be used is a learning problem where the data evolve over time and it is possible to discretize the evolutionary process.
Overall, this work represents a significant contribution to the area of both semi-supervised learning and pattern recognition. It presents new state-of-the-art techniques that overperform baseline solutions, and it opens up new possibilities for future research
Using Interior Point Methods for Large-scale Support Vector Machine training
Support Vector Machines (SVMs) are powerful machine learning techniques for classification
and regression, but the training stage involves a convex quadratic optimization program
that is most often computationally expensive. Traditionally, active-set methods have been
used rather than interior point methods, due to the Hessian in the standard dual formulation
being completely dense. But as active-set methods are essentially sequential, they may not
be adequate for machine learning challenges of the future. Additionally, training time may be
limited, or data may grow so large that cluster-computing approaches need to be considered.
Interior point methods have the potential to answer these concerns directly. They scale
efficiently, they can provide good early approximations, and they are suitable for parallel
and multi-core environments. To apply them to SVM training, it is necessary to address
directly the most computationally expensive aspect of the algorithm. We therefore present an
exact reformulation of the standard linear SVM training optimization problem that exploits
separability of terms in the objective. By so doing, per-iteration computational complexity
is reduced from O(n3) to O(n). We show how this reformulation can be applied to many
machine learning problems in the SVM family.
Implementation issues relating to specializing the algorithm are explored through extensive
numerical experiments. They show that the performance of our algorithm for large dense
or noisy data sets is consistent and highly competitive, and in some cases can out perform all
other approaches by a large margin. Unlike active set methods, performance is largely unaffected
by noisy data. We also show how, by exploiting the block structure of the augmented
system matrix, a hybrid MPI/Open MP implementation of the algorithm enables data and
linear algebra computations to be efficiently partitioned amongst parallel processing nodes
in a clustered computing environment.
The applicability of our technique is extended to nonlinear SVMs by low-rank approximation
of the kernel matrix. We develop a heuristic designed to represent clusters using a
small number of features. Additionally, an early approximation scheme reduces the number of samples that need to be considered. Both elements improve the computational efficiency
of the training phase.
Taken as a whole, this thesis shows that with suitable problem formulation and efficient
implementation techniques, interior point methods are a viable optimization technology to
apply to large-scale SVM training, and are able to provide state-of-the-art performance
Drawing, Handwriting Processing Analysis: New Advances and Challenges
International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline
Versification and Authorship Attribution
The technique known as contemporary stylometry uses different methods, including machine learning, to discover a poemâs author based on features like the frequencies of words and character n-grams. However, there is one potential textual fingerprint stylometry tends to ignore: versification, or the very making of language into verse. Using poetic texts in three different languages (Czech, German, and Spanish), Petr PlechĂĄÄ asks whether versification features like rhythm patterns and types of rhyme can help determine authorship. He then tests its findings on two unsolved literary mysteries. In the first, PlechĂĄÄ distinguishes the parts of the Elizabethan verse play The Two Noble Kinsmen written by William Shakespeare from those written by his coauthor, John Fletcher. In the second, he seeks to solve a case of suspected forgery: how authentic was a group of poems first published as the work of the nineteenth-century Russian author Gavriil Stepanovich Batenkov? This book of poetic investigation should appeal to literary sleuths the world over.illustrato
Recommended from our members
SAFE AND PRACTICAL MACHINE LEARNING
As increasingly sensitive decision making problems become automated using models trained by machine learning algorithms, it is important for machine learning researchers to design training algorithms that provide assurance that the models they produce will be well behaved. While significant progress has been made toward designing safe machine learning algorithms, there are several obstacles that prevent these strategies from being useful in practice. In this defense, I will highlight two of these challenges, and provide methods and results demonstrating that they can be overcome.
First, for many applications, the user must be able to easily specify general and potentially complex definitions of unsafe behavior. While most existing safe machine learning algorithms make strong assumptions about how unsafe behavior is defined, I will describe a flexible interface that allows the user to specify their definitions in a straightforward way at training time, and that is general enough to enforce a wide range of commonly used definitions.
Second, users often require guarantees to hold even when a trained model is deployed into an environment that differs from the training environment. In these settings, the safety guarantees provided by existing methods are no longer valid when the environment changes, presenting significant risk. I will consider two instances of this problem. In the first instance, I will provide algorithms with safety guarantees that hold when the differences between the training and deployment environments are caused by a change in the probability of encountering certain classes of observations. These algorithms are particularly useful in social applications, where the distribution of protected attributes, such as race or sex, may change over time. Next, I will provide algorithms with safety guarantees that hold in more general settings, in which the differences between the training and deployment environments are more challenging to describe. In both settings, I will present experiments showing that the guarantees provided by these algorithms are valid in practice, even when these changes are made antagonistically
Towards Comprehensive Foundations of Computational Intelligence
Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based methods providing a framework for such meta-learning, and a more general approach based on chains of transformations. Many useful transformations that extract information from features are discussed. Heterogeneous adaptive systems are presented as particular example of transformation-based systems, and the goal of learning is redefined to facilitate creation of simpler data models. The need to understand data structures leads to techniques for logical and prototype-based rule extraction, and to generation of multiple alternative models, while the need to increase predictive power of adaptive models leads to committees of competent models. Learning from partial observations is a natural extension towards reasoning based on perceptions, and an approach to intuitive solving of such problems is presented. Throughout the paper neurocognitive inspirations are frequently used and are especially important in modeling of the higher cognitive functions. Promising directions such as liquid and laminar computing are identified and many open problems presented.
Cost-sensitive classification based on Bregman divergences
The main object of this PhD. Thesis is the identification, characterization and
study of new loss functions to address the so-called cost-sensitive classification. Many
decision problems are intrinsically cost-sensitive. However, the dominating preference
for cost-insensitive methods in the machine learning literature is a natural consequence
of the fact that true costs in real applications are di fficult to evaluate.
Since, in general, uncovering the correct class of the data is less costly than any
decision error, designing low error decision systems is a reasonable (but suboptimal)
approach. For instance, consider the classification of credit applicants as either being good customers (will pay back the credit) or bad customers (will fail to pay o part of the credit). The cost of classifying one risky borrower as good could be much higher than the cost of classifying a potentially good customer as bad.
Our proposal relies on Bayes decision theory where the goal is to assign instances
to the class with minimum expected cost. The decision is made involving both costs and posterior probabilities of the classes. Obtaining calibrated probability
estimates at the classifier output requires a suitable learning machine, a large enough
representative data set as well as an adequate loss function to be minimized during
learning. The design of the loss function can be aided by the costs: classical decision
theory shows that cost matrices de ne class boundaries determined by posterior class
probability estimates. Strictly speaking, in order to make optimal decisions, accurate
probability estimates are only required near the decision boundaries. It is key to
point out that the election of the loss function becomes especially relevant when
the prior knowledge about the problem is limited or the available training examples
are somehow unsuitable. In those cases, different loss functions lead to dramatically
different posterior probabilities estimates. We focus our study on the set of Bregman
divergences. These divergences offer a rich family of proper losses that has recently
become very popular in the machine learning community [Nock and Nielsen, 2009,
Reid and Williamson, 2009a].
The first part of the Thesis deals with the development of a novel parametric family of multiclass Bregman divergences which captures the information in the cost
matrix, so that the loss function is adapted to each specific problem. Multiclass costsensitive learning is one of the main challenges in cost-sensitive learning and, through this parametric family, we provide a natural framework to successfully overcome
binary tasks. Following this idea, two lines are explored:
Cost-sensitive supervised classification: We derive several asymptotic results.
The first analysis guarantees that the proposed Bregman divergence has maximum sensitivity to changes at probability vectors near the decision regions. Further analysis shows that the optimization of this Bregman divergence becomes equivalent to minimizing the overall cost regret in non-separable problems, and to maximizing a margin in separable problems.
Cost-sensitive semi-supervised classification: When labeled data is
scarce but unlabeled data is widely available, semi-supervised learning is an
useful tool to make the most of the unlabeled data. We discuss an optimization
problem relying on the minimization of our parametric family of Bregman divergences, using both labeled and unlabeled data, based on what is called the Entropy Minimization principle. We propose the rst multiclass cost-sensitive semi-supervised algorithm, under the assumption that inter-class separation is stronger than intra-class separation.
The second part of the Thesis deals with the transformation of this parametric family of Bregman divergences into a sequence of Bregman divergences. Work along this line can be further divided into two additional areas:
Foundations of sequences of Bregman divergences: We generalize some
previous results about the design and characterization of Bregman divergences
that are suitable for learning and their relationship with convexity. In addition,
we aim to broaden the subset of Bregman divergences that are interesting for
cost-sensitive learning. Under very general conditions, we nd sequences of (cost-sensitive) Bregman divergences, whose minimization provides minimum (cost-sensitive) risk for non-separable problems and some type of maximum margin classifiers in separable cases.
Learning with example-dependent costs: A strong assumption is widespread through most cost-sensitive learning algorithms: misclassification costs are the same for all examples. In many cases this statement is not true.
We claim that using the example-dependent costs directly is more natural and will lead to the production of more accurate classifiers. For these reasons, we consider the extension of cost-sensitive sequences of Bregman losses to example-dependent cost scenarios to generate finely tuned posterior probability estimates
Recommended from our members
Leveraging Structures of the Data in Deep Learning
The performance of deep learning frameworks could be significantly improved through considering the particular underlying structures for each dataset. In this thesis, I summarize our three work about boosting the performance of deep learning models through leveraging structures of the data. In the first work, we theoretically justify that, for convolutional neural networks (CNNs), neighborhoods of a pixel should be redefined as its most correlated spatial locations, in order to achieve a lower generalization error. Based on the correlation pattern, we propose a data-driven approach to design multiple layers of different customized filter shapes by repeatedly solving lasso problems. In the second work, we address the problem of scale-invariance in deep learning. We propose ScaleNet to predict object scales. Through recursively applying ScaleNet and rescaling, pretrained deep networks can identify objects with scales significantly different from the training set. In the last work, we perform an extensive study on PointConv based frameworks to tackle the problems of scale \& rotation invariances in point cloud convolution. PointConv is a novel convolution operation that can be directly applied on point clouds, and achieves parity with 2D CNNs in terms of formulation and performance. It takes coordinates of points as inputs to generate corresponding weights for convolution. We identify two effective strategies -- first, for point clouds converted from regular 2D raster images, we replace the multi-layer perceptrons (MLPs) based weight function with much simpler cubic polynomials, and achieve more robustness and better performance than traditional 2D CNNs on MNIST dataset. Next, for 3D point clouds, we introduce a novel viewpoint-invariant (VI) descriptor utilizing geometric properties between a center point and its local neighbors, as the additional input to the weight function. Integrated with the VI descriptor, we not only significantly improve the robustness of PointConv but also achieve comparable or better performance in comparison to the state-of-the-art point-based approaches on both SemanticKITTI and ScanNet
Pattern Recognition
Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition