Search CORE

5 research outputs found

Understanding, Estimating, and Incorporating Output Quality Into Join Algorithms For Information Extraction

Author: Doan Anhai
Gravano Luis
Ipeirotis Panagiotis G.
Jain Alpa
Publication venue
Publication date: 27/06/2008
Field of study

Information extraction (IE) systems are trained to extract specific relations from text databases. Real-world applications often require that the output of multiple IE systems be joined to produce the data of interest. To optimize the execution of a join of multiple extracted relations, it is not sufficient to consider only execution time. In fact, the quality of the join output is of critical importance: unlike in the relational world, different join execution plans can produce join results of widely different quality whenever IE systems are involved. In this paper, we develop a principled approach to understand, estimate, and incorporate output quality into the join optimization process over extracted relations. We argue that the output quality is affected by (a) the configuration of the IE systems used to process the documents, (b) the document retrieval strategies used to retrieve documents, and (c) the actual join algorithm used. Our analysis considers a variety of join algorithms from relational query optimization, and predicts the output quality –and, of course, the execution time– of the alternate execution plans. We establish the accuracy of our analytical models, as well as study the effectiveness of a quality-aware join optimizer, with a large-scale experimental evaluation over real-world text collections and state-of-the-art IE systems

New York University Faculty Digital Archive

Ranking objects by exploiting relationships: computing top-k over aggregation

Author: Dong Xin
Jiawei Han
Kaushik Chakrabarti
Venkatesh Ganti
Publication venue: Citeseer
Publication date: 01/01/2006
Field of study

ABSTRAC

CiteSeerX

Understanding, Estimating, and Incorporating Output Quality Into Join Algorithms For Information Extraction

Author: Doan Anhai
Gravano Luis
Ipeirotis Panagiotis G.
Jain Alpa
Publication venue
Publication date: 27/06/2008
Field of study

Efficient querying of Linked Data by distributing workload

Author: Robas Jan
Publication venue
Publication date: 08/11/2016
Field of study

Online data is presented in different ways and in various forms which are not mutually compatible. This problem is also present in Web APIs, because we usually have to implement a specialised client, suited for the kind of data the Web service is providing. This problem is solved with Linked Data. The problem with Linked Data is the query performance and the availability of remote SPARQL endpoints. With Triple Pattern Fragments we can execute SPARQL queries by transferring some workload to the client, but in contrast we have to transfer more data. The existing AMF extension reduces the amount of HTTP requests and consequently the amount of transferred data on some queries, while increasing the amount of transferred data with others. In this thesis we present our extension, where we try to lower the amount of HTTP requests and the amount of transferred data by extending the metadata with a Bloom filter, containing data, linked with triples on the current page of the Triple Pattern Fragment. We have compared our extension with the AMF extension and achieved encouraging results. We have also proposed a fix for the AMF extension, which is already included in the official repository. Finally, we have developed a simple graphical user interface that enables composition of SPARQL queries and their execution using our extension

Repository of the University of Ljubljana

ePrints.FRI

Content And Multimedia Database Management Systems

Author: Vries Arjen Paul de
Publication venue: University of Twente, Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/1999
Field of study

A database management system is a general-purpose software system that facilitates the processes of defining, constructing, and manipulating databases for various applications. The main characteristic of the ‘database approach’ is that it increases the value of data by its emphasis on data independence. DBMSs, and in particular those based on the relational data model, have been very successful at the management of administrative data in the business domain. This thesis has investigated data management in multimedia digital libraries, and its implications on the design of database management systems. The main problem of multimedia data management is providing access to the stored objects. The content structure of administrative data is easily represented in alphanumeric values. Thus, database technology has primarily focused on handling the objects’ logical structure. In the case of multimedia data, representation of content is far from trivial though, and not supported by current database management systems

CiteSeerX

VU Research Portal

CWI's Institutional Repository

University of Twente Research Information