544 research outputs found
A General Framework for Anytime Approximation in Probabilistic Databases
Anytime approximation algorithms that compute the probabilities of queries
over probabilistic databases can be of great use to statistical learning tasks.
Those approaches have been based so far on either (i) sampling or (ii)
branch-and-bound with model-based bounds. We present here a more general
branch-and-bound framework that extends the possible bounds by using
'dissociation', which yields tighter bounds.Comment: 3 pages, 2 figures, submitted to StarAI 2018 Worksho
On the Limitations of Provenance for Queries With Difference
The annotation of the results of database transformations was shown to be
very effective for various applications. Until recently, most works in this
context focused on positive query languages. The provenance semirings is a
particular approach that was proven effective for these languages, and it was
shown that when propagating provenance with semirings, the expected equivalence
axioms of the corresponding query languages are satisfied. There have been
several attempts to extend the framework to account for relational algebra
queries with difference. We show here that these suggestions fail to satisfy
some expected equivalence axioms (that in particular hold for queries on
"standard" set and bag databases). Interestingly, we show that this is not a
pitfall of these particular attempts, but rather every such attempt is bound to
fail in satisfying these axioms, for some semirings. Finally, we show
particular semirings for which an extension for supporting difference is
(im)possible.Comment: TAPP 201
H2O: An Autonomic, Resource-Aware Distributed Database System
This paper presents the design of an autonomic, resource-aware distributed
database which enables data to be backed up and shared without complex manual
administration. The database, H2O, is designed to make use of unused resources
on workstation machines. Creating and maintaining highly-available, replicated
database systems can be difficult for untrained users, and costly for IT
departments. H2O reduces the need for manual administration by autonomically
replicating data and load-balancing across machines in an enterprise.
Provisioning hardware to run a database system can be unnecessarily costly as
most organizations already possess large quantities of idle resources in
workstation machines. H2O is designed to utilize this unused capacity by using
resource availability information to place data and plan queries over
workstation machines that are already being used for other tasks. This paper
discusses the requirements for such a system and presents the design and
implementation of H2O.Comment: Presented at SICSA PhD Conference 2010 (http://www.sicsaconf.org/
Distant Supervision for Entity Linking
Entity linking is an indispensable operation of populating knowledge
repositories for information extraction. It studies on aligning a textual
entity mention to its corresponding disambiguated entry in a knowledge
repository. In this paper, we propose a new paradigm named distantly supervised
entity linking (DSEL), in the sense that the disambiguated entities that belong
to a huge knowledge repository (Freebase) are automatically aligned to the
corresponding descriptive webpages (Wiki pages). In this way, a large scale of
weakly labeled data can be generated without manual annotation and fed to a
classifier for linking more newly discovered entities. Compared with
traditional paradigms based on solo knowledge base, DSEL benefits more via
jointly leveraging the respective advantages of Freebase and Wikipedia.
Specifically, the proposed paradigm facilitates bridging the disambiguated
labels (Freebase) of entities and their textual descriptions (Wikipedia) for
Web-scale entities. Experiments conducted on a dataset of 140,000 items and
60,000 features achieve a baseline F1-measure of 0.517. Furthermore, we analyze
the feature performance and improve the F1-measure to 0.545
- …