49,617 research outputs found
Автоматизована система навчання напівкерованої машини опорних векторів
Дипломна робота містить: 101 с., 9 табл., 46 рис., 2 додатки, 37 джерел
У роботі розглянуто та проаналізовано методи напівкерованого навчання,
а саме напівкеровану машину опорних векторів та різні підходи до її реалізації.
Робота обраного підходу була представлена та досліджена на практичній задачі,
а саме класифікації двовимірних точкових вибірок різної форми, а також задачі
бінарної та багатокласової класифікації текстів.
Об’єкт дослідження: методи напівкерованого навчання як спосіб
подолання проблеми маркування даних.
Предмет дослідження: метод опорних векторів та його модифікація для
задачі напівкерованого навчання.Thesis: 101 p., 9 tabl., 46 fig., 2 appendices, 37 sources
The work examines and analyzes semi-supervised learning methods, in
particular the semi-supervised support vector machine and various approaches to its
implementation. Results of applying the chosen approach were presented and
examined on a practical task, namely the classification of two-dimensional datasets
points of various shapes, as well as the task of binary and multi-class text classification.
Research object: semi-supervised learning methods as a way to overcome the
problem of data labeling.
Research subject: the support vector machine and its modification for semi-
supervised learning tasks
On the optimal usage of labelled examples in semi-supervised multi-class classification problems
In recent years, the performance of semi-supervised learning has been theoretically investigated. However, most of this theoretical development has focussed on binary classification problems. In this paper, we take it a step further by extending the work of Castelli and Cover [1] [2] to the multi-class paradigm. Particularly, we consider the key problem in semi-supervised learning of classifying an unseen instance x into one of K different classes, using a training dataset sampled from a mixture density distribution and composed of l labelled records and u unlabelled examples. Even under the assumption of identifiability of the mixture and having infinite unlabelled examples, labelled records are needed to determine the K decision regions. Therefore, in this paper, we first investigate the minimum number of labelled examples needed to accomplish that task. Then, we propose an optimal multi-class learning algorithm which is a generalisation of the optimal procedure proposed in the literature for binary problems. Finally, we make use of this generalisation to study the probability of error when the binary class constraint is relaxed
Semi-supervised binary classification with latent distance learning
Binary classification (BC) is a practical task that is ubiquitous in
real-world problems, such as distinguishing healthy and unhealthy objects in
biomedical diagnostics and defective and non-defective products in
manufacturing inspections. Nonetheless, fully annotated data are commonly
required to effectively solve this problem, and their collection by domain
experts is a tedious and expensive procedure. In contrast to BC, several
significant semi-supervised learning techniques that heavily rely on stochastic
data augmentation techniques have been devised for solving multi-class
classification. In this study, we demonstrate that the stochastic data
augmentation technique is less suitable for solving typical BC problems because
it can omit crucial features that strictly distinguish between positive and
negative samples. To address this issue, we propose a new learning
representation to solve the BC problem using a few labels with a random k-pair
cross-distance learning mechanism. First, by harnessing a few labeled samples,
the encoder network learns the projection of positive and negative samples in
angular spaces to maximize and minimize their inter-class and intra-class
distances, respectively. Second, the classifier learns to discriminate between
positive and negative samples using on-the-fly labels generated based on the
angular space and labeled samples to solve BC tasks. Extensive experiments were
conducted using four real-world publicly available BC datasets. With few labels
and without any data augmentation techniques, the proposed method outperformed
state-of-the-art semi-supervised and self-supervised learning methods.
Moreover, with 10% labeling, our semi-supervised classifier could obtain
competitive accuracy compared with a fully supervised setting
Multi-task Self-Supervised Learning for Human Activity Detection
Deep learning methods are successfully used in applications pertaining to
ubiquitous computing, health, and well-being. Specifically, the area of human
activity recognition (HAR) is primarily transformed by the convolutional and
recurrent neural networks, thanks to their ability to learn semantic
representations from raw input. However, to extract generalizable features,
massive amounts of well-curated data are required, which is a notoriously
challenging task; hindered by privacy issues, and annotation costs. Therefore,
unsupervised representation learning is of prime importance to leverage the
vast amount of unlabeled data produced by smart devices. In this work, we
propose a novel self-supervised technique for feature learning from sensory
data that does not require access to any form of semantic labels. We learn a
multi-task temporal convolutional network to recognize transformations applied
on an input signal. By exploiting these transformations, we demonstrate that
simple auxiliary tasks of the binary classification result in a strong
supervisory signal for extracting useful features for the downstream task. We
extensively evaluate the proposed approach on several publicly available
datasets for smartphone-based HAR in unsupervised, semi-supervised, and
transfer learning settings. Our method achieves performance levels superior to
or comparable with fully-supervised networks, and it performs significantly
better than autoencoders. Notably, for the semi-supervised case, the
self-supervised features substantially boost the detection rate by attaining a
kappa score between 0.7-0.8 with only 10 labeled examples per class. We get
similar impressive performance even if the features are transferred from a
different data source. While this paper focuses on HAR as the application
domain, the proposed technique is general and could be applied to a wide
variety of problems in other areas
A Unifying Framework in Vector-valued Reproducing Kernel Hilbert Spaces for Manifold Regularization and Co-Regularized Multi-view Learning
This paper presents a general vector-valued reproducing kernel Hilbert spaces
(RKHS) framework for the problem of learning an unknown functional dependency
between a structured input space and a structured output space. Our formulation
encompasses both Vector-valued Manifold Regularization and Co-regularized
Multi-view Learning, providing in particular a unifying framework linking these
two important learning approaches. In the case of the least square loss
function, we provide a closed form solution, which is obtained by solving a
system of linear equations. In the case of Support Vector Machine (SVM)
classification, our formulation generalizes in particular both the binary
Laplacian SVM to the multi-class, multi-view settings and the multi-class
Simplex Cone SVM to the semi-supervised, multi-view settings. The solution is
obtained by solving a single quadratic optimization problem, as in standard
SVM, via the Sequential Minimal Optimization (SMO) approach. Empirical results
obtained on the task of object recognition, using several challenging datasets,
demonstrate the competitiveness of our algorithms compared with other
state-of-the-art methods.Comment: 72 page
Conflicts, Villains, Resolutions: Towards models of Narrative Media Framing
Despite increasing interest in the automatic detection of media frames in
NLP, the problem is typically simplified as single-label classification and
adopts a topic-like view on frames, evading modelling the broader
document-level narrative. In this work, we revisit a widely used
conceptualization of framing from the communication sciences which explicitly
captures elements of narratives, including conflict and its resolution, and
integrate it with the narrative framing of key entities in the story as heroes,
victims or villains. We adapt an effective annotation paradigm that breaks a
complex annotation task into a series of simpler binary questions, and present
an annotated data set of English news articles, and a case study on the framing
of climate change in articles from news outlets across the political spectrum.
Finally, we explore automatic multi-label prediction of our frames with
supervised and semi-supervised approaches, and present a novel retrieval-based
method which is both effective and transparent in its predictions. We conclude
with a discussion of opportunities and challenges for future work on
document-level models of narrative framing.Comment: To appear in ACL 2023 (main conference
- …