3,536 research outputs found
Living Knowledge
Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following
TimeMachine: Timeline Generation for Knowledge-Base Entities
We present a method called TIMEMACHINE to generate a timeline of events and
relations for entities in a knowledge base. For example for an actor, such a
timeline should show the most important professional and personal milestones
and relationships such as works, awards, collaborations, and family
relationships. We develop three orthogonal timeline quality criteria that an
ideal timeline should satisfy: (1) it shows events that are relevant to the
entity; (2) it shows events that are temporally diverse, so they distribute
along the time axis, avoiding visual crowding and allowing for easy user
interaction, such as zooming in and out; and (3) it shows events that are
content diverse, so they contain many different types of events (e.g., for an
actor, it should show movies and marriages and awards, not just movies). We
present an algorithm to generate such timelines for a given time period and
screen size, based on submodular optimization and web-co-occurrence statistics
with provable performance guarantees. A series of user studies using Mechanical
Turk shows that all three quality criteria are crucial to produce quality
timelines and that our algorithm significantly outperforms various baseline and
state-of-the-art methods.Comment: To appear at ACM SIGKDD KDD'15. 12pp, 7 fig. With appendix. Demo and
other info available at http://cs.stanford.edu/~althoff/timemachine
The Factual Inconsistency Problem in Abstractive Text Summarization: A Survey
Recently, various neural encoder-decoder models pioneered by Seq2Seq
framework have been proposed to achieve the goal of generating more abstractive
summaries by learning to map input text to output text. At a high level, such
neural models can freely generate summaries without any constraint on the words
or phrases used. Moreover, their format is closer to human-edited summaries and
output is more readable and fluent. However, the neural model's abstraction
ability is a double-edged sword. A commonly observed problem with the generated
summaries is the distortion or fabrication of factual information in the
article. This inconsistency between the original text and the summary has
caused various concerns over its applicability, and the previous evaluation
methods of text summarization are not suitable for this issue. In response to
the above problems, the current research direction is predominantly divided
into two categories, one is to design fact-aware evaluation metrics to select
outputs without factual inconsistency errors, and the other is to develop new
summarization systems towards factual consistency. In this survey, we focus on
presenting a comprehensive review of these fact-specific evaluation methods and
text summarization models.Comment: 9 pages, 5 figure
Linked Data Entity Summarization
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion
- …