3,261 research outputs found
Exploiting Social Annotation for Automatic Resource Discovery
Information integration applications, such as mediators or mashups, that
require access to information resources currently rely on users manually
discovering and integrating them in the application. Manual resource discovery
is a slow process, requiring the user to sift through results obtained via
keyword-based search. Although search methods have advanced to include evidence
from document contents, its metadata and the contents and link structure of the
referring pages, they still do not adequately cover information sources --
often called ``the hidden Web''-- that dynamically generate documents in
response to a query. The recently popular social bookmarking sites, which allow
users to annotate and share metadata about various information sources, provide
rich evidence for resource discovery. In this paper, we describe a
probabilistic model of the user annotation process in a social bookmarking
system del.icio.us. We then use the model to automatically find resources
relevant to a particular information domain. Our experimental results on data
obtained from \emph{del.icio.us} show this approach as a promising method for
helping automate the resource discovery task.Comment: 6 pages, submitted to AAAI07 workshop on Information Integration on
the We
Hypermedia-based discovery for source selection using low-cost linked data interfaces
Evaluating federated Linked Data queries requires consulting multiple sources on the Web. Before a client can execute queries, it must discover data sources, and determine which ones are relevant. Federated query execution research focuses on the actual execution, while data source discovery is often marginally discussed-even though it has a strong impact on selecting sources that contribute to the query results. Therefore, the authors introduce a discovery approach for Linked Data interfaces based on hypermedia links and controls, and apply it to federated query execution with Triple Pattern Fragments. In addition, the authors identify quantitative metrics to evaluate this discovery approach. This article describes generic evaluation measures and results for their concrete approach. With low-cost data summaries as seed, interfaces to eight large real-world datasets can discover each other within 7 minutes. Hypermedia-based client-side querying shows a promising gain of up to 50% in execution time, but demands algorithms that visit a higher number of interfaces to improve result completeness
A Bayesian Approach to Discovering Truth from Conflicting Sources for Data Integration
In practical data integration systems, it is common for the data sources
being integrated to provide conflicting information about the same entity.
Consequently, a major challenge for data integration is to derive the most
complete and accurate integrated records from diverse and sometimes conflicting
sources. We term this challenge the truth finding problem. We observe that some
sources are generally more reliable than others, and therefore a good model of
source quality is the key to solving the truth finding problem. In this work,
we propose a probabilistic graphical model that can automatically infer true
records and source quality without any supervision. In contrast to previous
methods, our principled approach leverages a generative process of two types of
errors (false positive and false negative) by modeling two different aspects of
source quality. In so doing, ours is also the first approach designed to merge
multi-valued attribute types. Our method is scalable, due to an efficient
sampling-based inference algorithm that needs very few iterations in practice
and enjoys linear time complexity, with an even faster incremental variant.
Experiments on two real world datasets show that our new method outperforms
existing state-of-the-art approaches to the truth finding problem.Comment: VLDB201
- …