17,332 research outputs found
Knowledge Base Population using Semantic Label Propagation
A crucial aspect of a knowledge base population system that extracts new
facts from text corpora, is the generation of training data for its relation
extractors. In this paper, we present a method that maximizes the effectiveness
of newly trained relation extractors at a minimal annotation cost. Manual
labeling can be significantly reduced by Distant Supervision, which is a method
to construct training data automatically by aligning a large text corpus with
an existing knowledge base of known facts. For example, all sentences
mentioning both 'Barack Obama' and 'US' may serve as positive training
instances for the relation born_in(subject,object). However, distant
supervision typically results in a highly noisy training set: many training
sentences do not really express the intended relation. We propose to combine
distant supervision with minimal manual supervision in a technique called
feature labeling, to eliminate noise from the large and noisy initial training
set, resulting in a significant increase of precision. We further improve on
this approach by introducing the Semantic Label Propagation method, which uses
the similarity between low-dimensional representations of candidate training
instances, to extend the training set in order to increase recall while
maintaining high precision. Our proposed strategy for generating training data
is studied and evaluated on an established test collection designed for
knowledge base population tasks. The experimental results show that the
Semantic Label Propagation strategy leads to substantial performance gains when
compared to existing approaches, while requiring an almost negligible manual
annotation effort.Comment: Submitted to Knowledge Based Systems, special issue on Knowledge
Bases for Natural Language Processin
Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language
We address the problem of efficient acoustic-model refinement (continuous
retraining) using semi-supervised and active learning for a low resource Indian
language, wherein the low resource constraints are having i) a small labeled
corpus from which to train a baseline `seed' acoustic model and ii) a large
training corpus without orthographic labeling or from which to perform a data
selection for manual labeling at low costs. The proposed semi-supervised
learning decodes the unlabeled large training corpus using the seed model and
through various protocols, selects the decoded utterances with high reliability
using confidence levels (that correlate to the WER of the decoded utterances)
and iterative bootstrapping. The proposed active learning protocol uses
confidence level based metric to select the decoded utterances from the large
unlabeled corpus for further labeling. The semi-supervised learning protocols
can offer a WER reduction, from a poorly trained seed model, by as much as 50%
of the best WER-reduction realizable from the seed model's WER, if the large
corpus were labeled and used for acoustic-model training. The active learning
protocols allow that only 60% of the entire training corpus be manually
labeled, to reach the same performance as the entire data
A Machine Learning Based Analytical Framework for Semantic Annotation Requirements
The Semantic Web is an extension of the current web in which information is
given well-defined meaning. The perspective of Semantic Web is to promote the
quality and intelligence of the current web by changing its contents into
machine understandable form. Therefore, semantic level information is one of
the cornerstones of the Semantic Web. The process of adding semantic metadata
to web resources is called Semantic Annotation. There are many obstacles
against the Semantic Annotation, such as multilinguality, scalability, and
issues which are related to diversity and inconsistency in content of different
web pages. Due to the wide range of domains and the dynamic environments that
the Semantic Annotation systems must be performed on, the problem of automating
annotation process is one of the significant challenges in this domain. To
overcome this problem, different machine learning approaches such as supervised
learning, unsupervised learning and more recent ones like, semi-supervised
learning and active learning have been utilized. In this paper we present an
inclusive layered classification of Semantic Annotation challenges and discuss
the most important issues in this field. Also, we review and analyze machine
learning applications for solving semantic annotation problems. For this goal,
the article tries to closely study and categorize related researches for better
understanding and to reach a framework that can map machine learning techniques
into the Semantic Annotation challenges and requirements
Efficient Asymmetric Co-Tracking using Uncertainty Sampling
Adaptive tracking-by-detection approaches are popular for tracking arbitrary
objects. They treat the tracking problem as a classification task and use
online learning techniques to update the object model. However, these
approaches are heavily invested in the efficiency and effectiveness of their
detectors. Evaluating a massive number of samples for each frame (e.g.,
obtained by a sliding window) forces the detector to trade the accuracy in
favor of speed. Furthermore, misclassification of borderline samples in the
detector introduce accumulating errors in tracking. In this study, we propose a
co-tracking based on the efficient cooperation of two detectors: a rapid
adaptive exemplar-based detector and another more sophisticated but slower
detector with a long-term memory. The sampling labeling and co-learning of the
detectors are conducted by an uncertainty sampling unit, which improves the
speed and accuracy of the system. We also introduce a budgeting mechanism which
prevents the unbounded growth in the number of examples in the first detector
to maintain its rapid response. Experiments demonstrate the efficiency and
effectiveness of the proposed tracker against its baselines and its superior
performance against state-of-the-art trackers on various benchmark videos.Comment: Submitted to IEEE ICSIPA'201
Efficient Version-Space Reduction for Visual Tracking
Discrminative trackers, employ a classification approach to separate the
target from its background. To cope with variations of the target shape and
appearance, the classifier is updated online with different samples of the
target and the background. Sample selection, labeling and updating the
classifier is prone to various sources of errors that drift the tracker. We
introduce the use of an efficient version space shrinking strategy to reduce
the labeling errors and enhance its sampling strategy by measuring the
uncertainty of the tracker about the samples. The proposed tracker, utilize an
ensemble of classifiers that represents different hypotheses about the target,
diversify them using boosting to provide a larger and more consistent coverage
of the version-space and tune the classifiers' weights in voting. The proposed
system adjusts the model update rate by promoting the co-training of the
short-memory ensemble with a long-memory oracle. The proposed tracker
outperformed state-of-the-art trackers on different sequences bearing various
tracking challenges.Comment: CRV'17 Conferenc
- …