31,241 research outputs found
Dagstuhl Reports : Volume 1, Issue 2, February 2011
Online Privacy: Towards Informational Self-Determination on the Internet (Dagstuhl Perspectives Workshop 11061) : Simone Fischer-HĂŒbner, Chris Hoofnagle, Kai Rannenberg, Michael Waidner, Ioannis Krontiris and Michael Marhöfer Self-Repairing Programs (Dagstuhl Seminar 11062) : Mauro PezzĂ©, Martin C. Rinard, Westley Weimer and Andreas Zeller Theory and Applications of Graph Searching Problems (Dagstuhl Seminar 11071) : Fedor V. Fomin, Pierre Fraigniaud, Stephan Kreutzer and Dimitrios M. Thilikos Combinatorial and Algorithmic Aspects of Sequence Processing (Dagstuhl Seminar 11081) : Maxime Crochemore, Lila Kari, Mehryar Mohri and Dirk Nowotka Packing and Scheduling Algorithms for Information and Communication Services (Dagstuhl Seminar 11091) Klaus Jansen, Claire Mathieu, Hadas Shachnai and Neal E. Youn
Hierarchical Losses and New Resources for Fine-grained Entity Typing and Linking
Extraction from raw text to a knowledge base of entities and fine-grained
types is often cast as prediction into a flat set of entity and type labels,
neglecting the rich hierarchies over types and entities contained in curated
ontologies. Previous attempts to incorporate hierarchical structure have
yielded little benefit and are restricted to shallow ontologies. This paper
presents new methods using real and complex bilinear mappings for integrating
hierarchical information, yielding substantial improvement over flat
predictions in entity linking and fine-grained entity typing, and achieving new
state-of-the-art results for end-to-end models on the benchmark FIGER dataset.
We also present two new human-annotated datasets containing wide and deep
hierarchies which we will release to the community to encourage further
research in this direction: MedMentions, a collection of PubMed abstracts in
which 246k mentions have been mapped to the massive UMLS ontology; and TypeNet,
which aligns Freebase types with the WordNet hierarchy to obtain nearly 2k
entity types. In experiments on all three datasets we show substantial gains
from hierarchy-aware training.Comment: ACL 201
Answering Complex Questions by Joining Multi-Document Evidence with Quasi Knowledge Graphs
Direct answering of questions that involve multiple entities and relations is a challenge for text-based QA. This problem is most pronounced when answers can be found only by joining evidence from multiple documents. Curated knowledge graphs (KGs) may yield good answers, but are limited by their inherent incompleteness and potential staleness. This paper presents QUEST, a method that can answer complex questions directly from textual sources on-the-fly, by computing similarity joins over partial results from different documents. Our method is completely unsupervised, avoiding training-data bottlenecks and being able to cope with rapidly evolving ad hoc topics and formulation style in user questions. QUEST builds a noisy quasi KG with node and edge weights, consisting of dynamically retrieved entity names and relational phrases. It augments this graph with types and semantic alignments, and computes the best answers by an algorithm for Group Steiner Trees. We evaluate QUEST on benchmarks of complex questions, and show that it substantially outperforms state-of-the-art baselines
Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints
Unsupervised estimation of latent variable models is a fundamental problem
central to numerous applications of machine learning and statistics. This work
presents a principled approach for estimating broad classes of such models,
including probabilistic topic models and latent linear Bayesian networks, using
only second-order observed moments. The sufficient conditions for
identifiability of these models are primarily based on weak expansion
constraints on the topic-word matrix, for topic models, and on the directed
acyclic graph, for Bayesian networks. Because no assumptions are made on the
distribution among the latent variables, the approach can handle arbitrary
correlations among the topics or latent factors. In addition, a tractable
learning method via optimization is proposed and studied in numerical
experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and
Bayesian networks are studied. Simulation section is adde
Adaptive Representations for Tracking Breaking News on Twitter
Twitter is often the most up-to-date source for finding and tracking breaking
news stories. Therefore, there is considerable interest in developing filters
for tweet streams in order to track and summarize stories. This is a
non-trivial text analytics task as tweets are short, and standard retrieval
methods often fail as stories evolve over time. In this paper we examine the
effectiveness of adaptive mechanisms for tracking and summarizing breaking news
stories. We evaluate the effectiveness of these mechanisms on a number of
recent news events for which manually curated timelines are available.
Assessments based on ROUGE metrics indicate that an adaptive approaches are
best suited for tracking evolving stories on Twitter.Comment: 8 Pag
- âŠ