5,698 research outputs found
Tweet, but Verify: Epistemic Study of Information Verification on Twitter
While Twitter provides an unprecedented opportunity to learn about breaking
news and current events as they happen, it often produces skepticism among
users as not all the information is accurate but also hoaxes are sometimes
spread. While avoiding the diffusion of hoaxes is a major concern during
fast-paced events such as natural disasters, the study of how users trust and
verify information from tweets in these contexts has received little attention
so far. We survey users on credibility perceptions regarding witness pictures
posted on Twitter related to Hurricane Sandy. By examining credibility
perceptions on features suggested for information verification in the field of
Epistemology, we evaluate their accuracy in determining whether pictures were
real or fake compared to professional evaluations performed by experts. Our
study unveils insight about tweet presentation, as well as features that users
should look at when assessing the veracity of tweets in the context of
fast-paced events. Some of our main findings include that while author details
not readily available on Twitter feeds should be emphasized in order to
facilitate verification of tweets, showing multiple tweets corroborating a fact
misleads users to trusting what actually is a hoax. We contrast some of the
behavioral patterns found on tweets with literature in Psychology research.Comment: Pre-print of paper accepted to Social Network Analysis and Mining
(Springer
Cross-document Cross-lingual Information Extraction and Tracking
Most current information extraction analyzes documents in isolation. The net result is a set of disconnected, inaccurate and often redundant annotations, because events are repeated in many news stories. In this talk we will present a new task of cross-document cross-lingual information extraction and tracking and its evaluation metrics. From enormous multi-lingual documents we identify important person entities which are frequently involved in events as ‘centroid entities’. Then we link the events involving the same centroid entity along a time line. We will also present a system performing this task and our current approaches to address the main research challenges. We will discuss how we can take advantage of redundancy to improve the accuracy of relation and event annotation, by means of
- Cross-document event coreference resolution
- Event ranking by salience and novelty, and
- Event organization by participant, time, and place
- Name translation
- Knowledge Discovery from Google Ngrams
- Domain Adaption Techniques for Applying Information Extraction to Scientific Literatur
Accelerated Method for Stochastic Composition Optimization with Nonsmooth Regularization
Stochastic composition optimization draws much attention recently and has
been successful in many emerging applications of machine learning, statistical
analysis, and reinforcement learning. In this paper, we focus on the
composition problem with nonsmooth regularization penalty. Previous works
either have slow convergence rate or do not provide complete convergence
analysis for the general problem. In this paper, we tackle these two issues by
proposing a new stochastic composition optimization method for composition
problem with nonsmooth regularization penalty. In our method, we apply variance
reduction technique to accelerate the speed of convergence. To the best of our
knowledge, our method admits the fastest convergence rate for stochastic
composition optimization: for strongly convex composition problem, our
algorithm is proved to admit linear convergence; for general composition
problem, our algorithm significantly improves the state-of-the-art convergence
rate from to . Finally, we apply
our proposed algorithm to portfolio management and policy evaluation in
reinforcement learning. Experimental results verify our theoretical analysis.Comment: AAAI 201
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
We construct a multilingual common semantic space based on distributional
semantics, where words from multiple languages are projected into a shared
space to enable knowledge and resource transfer across languages. Beyond word
alignment, we introduce multiple cluster-level alignments and enforce the word
clusters to be consistently distributed across multiple languages. We exploit
three signals for clustering: (1) neighbor words in the monolingual word
embedding space; (2) character-level information; and (3) linguistic properties
(e.g., apposition, locative suffix) derived from linguistic structure knowledge
bases available for thousands of languages. We introduce a new
cluster-consistent correlational neural network to construct the common
semantic space by aligning words as well as clusters. Intrinsic evaluation on
monolingual and multilingual QVEC tasks shows our approach achieves
significantly higher correlation with linguistic features than state-of-the-art
multi-lingual embedding learning methods do. Using low-resource language name
tagging as a case study for extrinsic evaluation, our approach achieves up to
24.5\% absolute F-score gain over the state of the art.Comment: 10 page
- …