Search CORE

28 research outputs found

On APIs for probabilistic databases

Author: Antova Lyublena
Koch Christoph
Publication venue
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

MayBMS: Managing Incomplete Information with Probabilistic World-Set Decompositions

Author: Antova Lyublena
Koch Christoph
Olteanu Dan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

$10^{(10^6)}$ Worlds and Beyond: Efficient Representation and Processing of Incomplete Information

Author: Antova Lyublena
Koch Christoph
Olteanu Dan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Harnessing the Deep Web: Present and Future

Author: Afanasiev Loredana
Antova Lyublena
Halevy Alon
Madhavan Jayant
Publication venue
Publication date: 01/01/2009
Field of study

Over the past few years, we have built a system that has exposed large volumes of Deep-Web content to Google.com users. The content that our system exposes contributes to more than 1000 search queries per-second and spans over 50 languages and hundreds of domains. The Deep Web has long been acknowledged to be a major source of structured data on the web, and hence accessing Deep-Web content has long been a problem of interest in the data management community. In this paper, we report on where we believe the Deep Web provides value and where it does not. We contrast two very different approaches to exposing Deep-Web content -- the surfacing approach that we used, and the virtual integration approach that has often been pursued in the data management literature. We emphasize where the values of each of the two approaches lie and caution against potential pitfalls. We outline important areas of future research and, in particular, emphasize the value that can be derived from analyzing large collections of potentially disparate structured data on the web.Comment: CIDR 200

arXiv.org e-Print Archive

CiteSeerX

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Query language support for incomplete information in the MayBMS system

Author: Antova Lyublena
Koch Christoph
Olteanu Dan
Publication venue
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

World-set Decompositions: Expressiveness and Efficient Algorithms

Author: Antova Lyublena
Koch Christoph
Olteanu Dan
Publication venue
Publication date: 07/06/2011
Field of study

Uncertain information is commonplace in real-world data management scenarios. The ability to represent large sets of possible instances (worlds) while supporting efficient storage and processing is an important challenge in this context. The recent formalism of world-set decompositions (WSDs) provides a space-efficient representation for uncertain data that also supports scalable processing. WSDs are complete for finite world-sets in that they can represent any finite set of possible worlds. For possibly infinite world-sets, we show that a natural generalization of WSDs precisely captures the expressive power of c-tables. We then show that several important decision problems are efficiently solvable on WSDs while they are NP-hard on c-tables. Finally, we give a polynomial-time algorithm for factorizing WSDs, i.e. an efficient algorithm for minimizing such representations

Infoscience - École polytechnique fédérale de Lausanne

Fast and Simple Relational Processing of Uncertain Data

Author: Antova Lyublena
Jansen Thomas
Koch Christoph
Olteanu Dan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

MayBMS: a probabilistic database management system

Author: Antova Lyublena
Huang Jiewen
Koch Christoph
Olteanu Dan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/06/2011
Field of study

Infoscience - École polytechnique fédérale de Lausanne

Maybms: A System For Managing Large Amounts Of Uncertain Data

Author: Antova Lyublena
Publication venue
Publication date: 09/04/2010
Field of study

This dissertation presents the foundations for building a scalable database management system for managing uncertain data, as it appears in different data management scenarios such as data integration, data cleaning, scientiﬁc data and web data management. The result of this work is MayBMS - a scalable open-source database management system for managing large amounts of uncertain data. MayBMS uses the so-called U-relational databases to represent uncertainty. U-relational databases store uncertainty and correlations in a purely relational way, and are a complete representation system for ﬁnite world sets. Other beneﬁts achieved by our representation model include compact storage and efﬁcient query evaluation. The results of our experimental evaluation clearly show that query evaluation in MayBMS scales up to large data sizes and uncertainty ratios, and that MayBMS consistently outperforms other current systems for managing uncertain data. The dissertation also discusses optimization of queries on vertically partitioned data, efﬁcient conﬁdence computation algorithms, and challenges and solutions when designing an application programming interface for uncertain databases

eCommons (Cornell Univ.)