25,212 research outputs found
Exhaustive and Efficient Constraint Propagation: A Semi-Supervised Learning Perspective and Its Applications
This paper presents a novel pairwise constraint propagation approach by
decomposing the challenging constraint propagation problem into a set of
independent semi-supervised learning subproblems which can be solved in
quadratic time using label propagation based on k-nearest neighbor graphs.
Considering that this time cost is proportional to the number of all possible
pairwise constraints, our approach actually provides an efficient solution for
exhaustively propagating pairwise constraints throughout the entire dataset.
The resulting exhaustive set of propagated pairwise constraints are further
used to adjust the similarity matrix for constrained spectral clustering. Other
than the traditional constraint propagation on single-source data, our approach
is also extended to more challenging constraint propagation on multi-source
data where each pairwise constraint is defined over a pair of data points from
different sources. This multi-source constraint propagation has an important
application to cross-modal multimedia retrieval. Extensive results have shown
the superior performance of our approach.Comment: The short version of this paper appears as oral paper in ECCV 201
TK-KNN: A Balanced Distance-Based Pseudo Labeling Approach for Semi-Supervised Intent Classification
The ability to detect intent in dialogue systems has become increasingly
important in modern technology. These systems often generate a large amount of
unlabeled data, and manually labeling this data requires substantial human
effort. Semi-supervised methods attempt to remedy this cost by using a model
trained on a few labeled examples and then by assigning pseudo-labels to
further a subset of unlabeled examples that has a model prediction confidence
higher than a certain threshold. However, one particularly perilous consequence
of these methods is the risk of picking an imbalanced set of examples across
classes, which could lead to poor labels. In the present work, we describe
Top-K K-Nearest Neighbor (TK-KNN), which uses a more robust pseudo-labeling
approach based on distance in the embedding space while maintaining a balanced
set of pseudo-labeled examples across classes through a ranking-based approach.
Experiments on several datasets show that TK-KNN outperforms existing models,
particularly when labeled data is scarce on popular datasets such as CLINC150
and Banking77. Code is available at https://github.com/ServiceNow/tk-knnComment: 9 pages, 6 figures, 4 table
Graph Based Semi-supervised Learning with Convolution Neural Networks to Classify Crisis Related Tweets
During time-critical situations such as natural disasters, rapid
classification of data posted on social networks by affected people is useful
for humanitarian organizations to gain situational awareness and to plan
response efforts. However, the scarcity of labeled data in the early hours of a
crisis hinders machine learning tasks thus delays crisis response. In this
work, we propose to use an inductive semi-supervised technique to utilize
unlabeled data, which is often abundant at the onset of a crisis event, along
with fewer labeled data. Specif- ically, we adopt a graph-based deep learning
framework to learn an inductive semi-supervised model. We use two real-world
crisis datasets from Twitter to evaluate the proposed approach. Our results
show significant improvements using unlabeled data as compared to only using
labeled data.Comment: 5 pages. arXiv admin note: substantial text overlap with
arXiv:1805.0515
- …