660 research outputs found
Crowdsourcing Semantic Label Propagation in Relation Classification
Distant supervision is a popular method for performing relation extraction
from text that is known to produce noisy labels. Most progress in relation
extraction and classification has been made with crowdsourced corrections to
distant-supervised labels, and there is evidence that indicates still more
would be better. In this paper, we explore the problem of propagating human
annotation signals gathered for open-domain relation classification through the
CrowdTruth methodology for crowdsourcing, that captures ambiguity in
annotations by measuring inter-annotator disagreement. Our approach propagates
annotations to sentences that are similar in a low dimensional embedding space,
expanding the number of labels by two orders of magnitude. Our experiments show
significant improvement in a sentence-level multi-class relation classifier.Comment: In publication at the First Workshop on Fact Extraction and
Verification (FeVer) at EMNLP 201
Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis
The clustering ensemble technique aims to combine multiple clusterings into a
probably better and more robust clustering and has been receiving an increasing
attention in recent years. There are mainly two aspects of limitations in the
existing clustering ensemble approaches. Firstly, many approaches lack the
ability to weight the base clusterings without access to the original data and
can be affected significantly by the low-quality, or even ill clusterings.
Secondly, they generally focus on the instance level or cluster level in the
ensemble system and fail to integrate multi-granularity cues into a unified
model. To address these two limitations, this paper proposes to solve the
clustering ensemble problem via crowd agreement estimation and
multi-granularity link analysis. We present the normalized crowd agreement
index (NCAI) to evaluate the quality of base clusterings in an unsupervised
manner and thus weight the base clusterings in accordance with their clustering
validity. To explore the relationship between clusters, the source aware
connected triple (SACT) similarity is introduced with regard to their common
neighbors and the source reliability. Based on NCAI and multi-granularity
information collected among base clusterings, clusters, and data instances, we
further propose two novel consensus functions, termed weighted evidence
accumulation clustering (WEAC) and graph partitioning with multi-granularity
link analysis (GP-MGLA) respectively. The experiments are conducted on eight
real-world datasets. The experimental results demonstrate the effectiveness and
robustness of the proposed methods.Comment: The MATLAB source code of this work is available at:
https://www.researchgate.net/publication/28197031
Quality of Information in Mobile Crowdsensing: Survey and Research Challenges
Smartphones have become the most pervasive devices in people's lives, and are
clearly transforming the way we live and perceive technology. Today's
smartphones benefit from almost ubiquitous Internet connectivity and come
equipped with a plethora of inexpensive yet powerful embedded sensors, such as
accelerometer, gyroscope, microphone, and camera. This unique combination has
enabled revolutionary applications based on the mobile crowdsensing paradigm,
such as real-time road traffic monitoring, air and noise pollution, crime
control, and wildlife monitoring, just to name a few. Differently from prior
sensing paradigms, humans are now the primary actors of the sensing process,
since they become fundamental in retrieving reliable and up-to-date information
about the event being monitored. As humans may behave unreliably or
maliciously, assessing and guaranteeing Quality of Information (QoI) becomes
more important than ever. In this paper, we provide a new framework for
defining and enforcing the QoI in mobile crowdsensing, and analyze in depth the
current state-of-the-art on the topic. We also outline novel research
challenges, along with possible directions of future work.Comment: To appear in ACM Transactions on Sensor Networks (TOSN
Socializing the Semantic Gap: A Comparative Survey on Image Tag Assignment, Refinement and Retrieval
Where previous reviews on content-based image retrieval emphasize on what can
be seen in an image to bridge the semantic gap, this survey considers what
people tag about an image. A comprehensive treatise of three closely linked
problems, i.e., image tag assignment, refinement, and tag-based image retrieval
is presented. While existing works vary in terms of their targeted tasks and
methodology, they rely on the key functionality of tag relevance, i.e.
estimating the relevance of a specific tag with respect to the visual content
of a given image and its social context. By analyzing what information a
specific method exploits to construct its tag relevance function and how such
information is exploited, this paper introduces a taxonomy to structure the
growing literature, understand the ingredients of the main works, clarify their
connections and difference, and recognize their merits and limitations. For a
head-to-head comparison between the state-of-the-art, a new experimental
protocol is presented, with training sets containing 10k, 100k and 1m images
and an evaluation on three test sets, contributed by various research groups.
Eleven representative works are implemented and evaluated. Putting all this
together, the survey aims to provide an overview of the past and foster
progress for the near future.Comment: to appear in ACM Computing Survey
- …