30,467 research outputs found
Document Filtering for Long-tail Entities
Filtering relevant documents with respect to entities is an essential task in
the context of knowledge base construction and maintenance. It entails
processing a time-ordered stream of documents that might be relevant to an
entity in order to select only those that contain vital information.
State-of-the-art approaches to document filtering for popular entities are
entity-dependent: they rely on and are also trained on the specifics of
differentiating features for each specific entity. Moreover, these approaches
tend to use so-called extrinsic information such as Wikipedia page views and
related entities which is typically only available only for popular head
entities. Entity-dependent approaches based on such signals are therefore
ill-suited as filtering methods for long-tail entities. In this paper we
propose a document filtering method for long-tail entities that is
entity-independent and thus also generalizes to unseen or rarely seen entities.
It is based on intrinsic features, i.e., features that are derived from the
documents in which the entities are mentioned. We propose a set of features
that capture informativeness, entity-saliency, and timeliness. In particular,
we introduce features based on entity aspect similarities, relation patterns,
and temporal expressions and combine these with standard features for document
filtering. Experiments following the TREC KBA 2014 setup on a publicly
available dataset show that our model is able to improve the filtering
performance for long-tail entities over several baselines. Results of applying
the model to unseen entities are promising, indicating that the model is able
to learn the general characteristics of a vital document. The overall
performance across all entities---i.e., not just long-tail entities---improves
upon the state-of-the-art without depending on any entity-specific training
data.Comment: CIKM2016, Proceedings of the 25th ACM International Conference on
Information and Knowledge Management. 201
CHORUS Deliverable 3.3: Vision Document - Intermediate version
The goal of the CHORUS vision document is to create a high level vision on audio-visual search engines in order to give guidance to the future R&D work in this area (in line with the mandate of CHORUS as a Coordination Action).
This current intermediate draft of the CHORUS vision document (D3.3) is based on the previous CHORUS vision documents D3.1 to D3.2 and on the results of the six CHORUS Think-Tank meetings held in March, September and November 2007 as well as in April, July and October 2008, and on the feedback from other CHORUS events.
The outcome of the six Think-Thank meetings will not just be to the benefit of the participants which are stakeholders and experts from academia and industry – CHORUS, as a coordination action of the EC, will feed back the findings (see Summary) to the projects under its purview and, via its website, to the whole community working in the domain of AV content search.
A few subjections of this deliverable are to be completed after the eights (and presumably last) Think-Tank meeting in spring 2009
Explicit diversification of event aspects for temporal summarization
During major events, such as emergencies and disasters, a large volume of information is reported on newswire and social media platforms. Temporal summarization (TS) approaches are used to automatically produce concise overviews of such events by extracting text snippets from related articles over time. Current TS approaches rely on a combination of event relevance and textual novelty for snippet selection. However, for events that span multiple days, textual novelty is often a poor criterion for selecting snippets, since many snippets are textually unique but are semantically redundant or non-informative. In this article, we propose a framework for the diversification of snippets using explicit event aspects, building on recent works in search result diversification. In particular, we first propose two techniques to identify explicit aspects that a user might want to see covered in a summary for different types of event. We then extend a state-of-the-art explicit diversification framework to maximize the coverage of these aspects when selecting summary snippets for unseen events. Through experimentation over the TREC TS 2013, 2014, and 2015 datasets, we show that explicit diversification for temporal summarization significantly outperforms classical novelty-based diversification, as the use of explicit event aspects reduces the amount of redundant and off-topic snippets returned, while also increasing summary timeliness
Metadata
Metadata, or data about data, play a crucial rule in social sciences to ensure that high quality documentation and community knowledge are properly captured and surround the data across its entire life cycle, from the early stages of production to secondary analysis by researchers or use by policy makers and other key stakeholders. The paper provides an overview of the social sciences metadata landscape, best practices and related information technologies. It particularly focuses on two specifications - the Data Documentation Initiative (DDI) and the Statistical Data and Metadata Exchange Standard (SDMX) - seen as central to a global metadata management framework for social data and official statistics. It also highlights current directions, outlines typical integration challenges, and provides a set of high level recommendations for producers, archives, researchers and sponsors in order to foster the adoption of metadata standards and best practices in the years to come.social sciences, metadata, data, statistics, documentation, data quality, XML, DDI, SDMX, archive, preservation, production, access, dissemination, analysis
Analysis of source code metrics from ns-2 and ns-3 network simulators
Ns-2 and its successor ns-3 are discrete-event simulators which are closely related to each
other as they share common background, concepts and similar aims. Ns-3 is still under
development, but it offers some interesting characteristics for developers while ns-2 still
has a large user base. While other studies have compared different network simulators,
focusing on performance measurements, in this paper we adopted a different approach
by focusing on technical characteristics and using software metrics to obtain useful conclusions.
We chose ns-2 and ns-3 for our case study because of the popularity of the former in
research and the increasing use of the latter. This reflects the current situation where ns-3
has emerged as a viable alternative to ns-2 due to its features and design. The paper
assesses the current state of both projects and their respective evolution supported by
the measurements obtained from a broad set of software metrics. By considering other
qualitative characteristics we obtained a summary of technical features of both simulators
including, architectural design, software dependencies or documentation policies.Ministerio de Ciencia e Innovación TEC2009-10639-C04-0
Design Features for the Social Web: The Architecture of Deme
We characterize the "social Web" and argue for several features that are
desirable for users of socially oriented web applications. We describe the
architecture of Deme, a web content management system (WCMS) and extensible
framework, and show how it implements these desired features. We then compare
Deme on our desiderata with other web technologies: traditional HTML, previous
open source WCMSs (illustrated by Drupal), commercial Web 2.0 applications, and
open-source, object-oriented web application frameworks. The analysis suggests
that a WCMS can be well suited to building social websites if it makes more of
the features of object-oriented programming, such as polymorphism, and class
inheritance, available to non-programmers in an accessible vocabulary.Comment: Appeared in Luis Olsina, Oscar Pastor, Daniel Schwabe, Gustavo Rossi,
and Marco Winckler (Editors), Proceedings of the 8th International Workshop
on Web-Oriented Software Technologies (IWWOST 2009), CEUR Workshop
Proceedings, Volume 493, August 2009, pp. 40-51; 12 pages, 2 figures, 1 tabl
A Study of Realtime Summarization Metrics
Unexpected news events, such as natural disasters or other human tragedies, create a large volume of dynamic text data from official news media as well as less formal social media. Automatic real-time text summarization has become an important tool for quickly transforming this overabundance of text into clear, useful information for end-users including affected individuals, crisis responders, and interested third parties. Despite the importance of real-time summarization systems, their evaluation is not well understood as classic methods for text summarization are inappropriate for real-time and streaming conditions.
The TREC 2013-2015 Temporal Summarization (TREC-TS) track was one of the first evaluation campaigns to tackle the challenges of real-time summarization evaluation, introducing new metrics, ground-truth generation methodology and dataset. In this paper, we present a study of TREC-TS track evaluation methodology, with the aim of documenting its design, analyzing its effectiveness, as well as identifying improvements and best practices for the evaluation of temporal summarization systems
- …