28,514 research outputs found
Weakly supervised segment annotation via expectation kernel density estimation
Since the labelling for the positive images/videos is ambiguous in weakly
supervised segment annotation, negative mining based methods that only use the
intra-class information emerge. In these methods, negative instances are
utilized to penalize unknown instances to rank their likelihood of being an
object, which can be considered as a voting in terms of similarity. However,
these methods 1) ignore the information contained in positive bags, 2) only
rank the likelihood but cannot generate an explicit decision function. In this
paper, we propose a voting scheme involving not only the definite negative
instances but also the ambiguous positive instances to make use of the extra
useful information in the weakly labelled positive bags. In the scheme, each
instance votes for its label with a magnitude arising from the similarity, and
the ambiguous positive instances are assigned soft labels that are iteratively
updated during the voting. It overcomes the limitations of voting using only
the negative bags. We also propose an expectation kernel density estimation
(eKDE) algorithm to gain further insight into the voting mechanism.
Experimental results demonstrate the superiority of our scheme beyond the
baselines.Comment: 9 pages, 2 figure
Multi-Label Classifier Chains for Bird Sound
Bird sound data collected with unattended microphones for automatic surveys,
or mobile devices for citizen science, typically contain multiple
simultaneously vocalizing birds of different species. However, few works have
considered the multi-label structure in birdsong. We propose to use an ensemble
of classifier chains combined with a histogram-of-segments representation for
multi-label classification of birdsong. The proposed method is compared with
binary relevance and three multi-instance multi-label learning (MIML)
algorithms from prior work (which focus more on structure in the sound, and
less on structure in the label sets). Experiments are conducted on two
real-world birdsong datasets, and show that the proposed method usually
outperforms binary relevance (using the same features and base-classifier), and
is better in some cases and worse in others compared to the MIML algorithms.Comment: 6 pages, 1 figure, submission to ICML 2013 workshop on bioacoustics.
Note: this is a minor revision- the blind submission format has been replaced
with one that shows author names, and a few corrections have been mad
Multi-Instance Multi-Label Learning
In this paper, we propose the MIML (Multi-Instance Multi-Label learning)
framework where an example is described by multiple instances and associated
with multiple class labels. Compared to traditional learning frameworks, the
MIML framework is more convenient and natural for representing complicated
objects which have multiple semantic meanings. To learn from MIML examples, we
propose the MimlBoost and MimlSvm algorithms based on a simple degeneration
strategy, and experiments show that solving problems involving complicated
objects with multiple semantic meanings in the MIML framework can lead to good
performance. Considering that the degeneration process may lose information, we
propose the D-MimlSvm algorithm which tackles MIML problems directly in a
regularization framework. Moreover, we show that even when we do not have
access to the real objects and thus cannot capture more information from real
objects by using the MIML representation, MIML is still useful. We propose the
InsDif and SubCod algorithms. InsDif works by transforming single-instances
into the MIML representation for learning, while SubCod works by transforming
single-label examples into the MIML representation for learning. Experiments
show that in some tasks they are able to achieve better performance than
learning the single-instances or single-label examples directly.Comment: 64 pages, 10 figures; Artificial Intelligence, 201
Active learning in annotating micro-blogs dealing with e-reputation
Elections unleash strong political views on Twitter, but what do people
really think about politics? Opinion and trend mining on micro blogs dealing
with politics has recently attracted researchers in several fields including
Information Retrieval and Machine Learning (ML). Since the performance of ML
and Natural Language Processing (NLP) approaches are limited by the amount and
quality of data available, one promising alternative for some tasks is the
automatic propagation of expert annotations. This paper intends to develop a
so-called active learning process for automatically annotating French language
tweets that deal with the image (i.e., representation, web reputation) of
politicians. Our main focus is on the methodology followed to build an original
annotated dataset expressing opinion from two French politicians over time. We
therefore review state of the art NLP-based ML algorithms to automatically
annotate tweets using a manual initiation step as bootstrap. This paper focuses
on key issues about active learning while building a large annotated data set
from noise. This will be introduced by human annotators, abundance of data and
the label distribution across data and entities. In turn, we show that Twitter
characteristics such as the author's name or hashtags can be considered as the
bearing point to not only improve automatic systems for Opinion Mining (OM) and
Topic Classification but also to reduce noise in human annotations. However, a
later thorough analysis shows that reducing noise might induce the loss of
crucial information.Comment: Journal of Interdisciplinary Methodologies and Issues in Science -
Vol 3 - Contextualisation digitale - 201
Name Disambiguation from link data in a collaboration graph using temporal and topological features
In a social community, multiple persons may share the same name, phone number
or some other identifying attributes. This, along with other phenomena, such as
name abbreviation, name misspelling, and human error leads to erroneous
aggregation of records of multiple persons under a single reference. Such
mistakes affect the performance of document retrieval, web search, database
integration, and more importantly, improper attribution of credit (or blame).
The task of entity disambiguation partitions the records belonging to multiple
persons with the objective that each decomposed partition is composed of
records of a unique person. Existing solutions to this task use either
biographical attributes, or auxiliary features that are collected from external
sources, such as Wikipedia. However, for many scenarios, such auxiliary
features are not available, or they are costly to obtain. Besides, the attempt
of collecting biographical or external data sustains the risk of privacy
violation. In this work, we propose a method for solving entity disambiguation
task from link information obtained from a collaboration network. Our method is
non-intrusive of privacy as it uses only the time-stamped graph topology of an
anonymized network. Experimental results on two real-life academic
collaboration networks show that the proposed method has satisfactory
performance.Comment: The short version of this paper has been accepted to ASONAM 201
- …