Search CORE

5,513 research outputs found

On Region Algebras, XML Databases, and Information Retrieval

Author: Apers P.M.G.
Hiemstra D.
Mihajlovic V.
Publication venue: Institute for Logic, Language and Computation
Publication date: 01/01/2003
Field of study

This paper describes some new ideas on developing a logical algebra for databases that manage textual data and support information retrieval functionality. We describe a first prototype of such a system

Radboud Repository

University of Twente Research Information

Quasi-SLCA based Keyword Query Processing over Probabilistic XML Data

Author: Li Jianxin
Liu Chengfei
Yu Jeffrey Xu
Zhou Rui
Publication venue
Publication date: 10/01/2013
Field of study

The probabilistic threshold query is one of the most common queries in uncertain databases, where a result satisfying the query must be also with probability meeting the threshold requirement. In this paper, we investigate probabilistic threshold keyword queries (PrTKQ) over XML data, which is not studied before. We first introduce the notion of quasi-SLCA and use it to represent results for a PrTKQ with the consideration of possible world semantics. Then we design a probabilistic inverted (PI) index that can be used to quickly return the qualified answers and filter out the unqualified ones based on our proposed lower/upper bounds. After that, we propose two efficient and comparable algorithms: Baseline Algorithm and PI index-based Algorithm. To accelerate the performance of algorithms, we also utilize probability density function. An empirical study using real and synthetic data sets has verified the effectiveness and the efficiency of our approaches

arXiv.org e-Print Archive

Deakin Research Online

Adelaide Research & Scholarship

Swinburne Research Bank

The State-of-the-arts in Focused Search

Author: Li Rongmei
Publication venue: University of Twente, Centre for Telematics and Information Technology
Publication date: 01/01/2009
Field of study

The continuous influx of various text data on the Web requires search engines to improve their retrieval abilities for more specific information. The need for relevant results to a user’s topic of interest has gone beyond search for domain or type specific documents to more focused result (e.g. document fragments or answers to a query). The introduction of XML provides a format standard for data representation, storage, and exchange. It helps focused search to be carried out at different granularities of a structured document with XML markups. This report aims at reviewing the state-of-the-arts in focused search, particularly techniques for topic-specific document retrieval, passage retrieval, XML retrieval, and entity ranking. It is concluded with highlight of open problems

University of Twente Research Information

Searching Multimedia Data using MPEG-7 Descriptions in a Broadcast Terminal

Author: Lalmas Mounia
Mory Benoit
Moutogianni Katerina
Putz Wolfgang
Rölleke Thomas
Publication venue
Publication date: 30/12/2013
Field of study

Queen Mary Research Online

Entity Ranking on Graphs: Studies on Expert Finding

Author: Hiemstra D.
Rode H.
Serdyukov P.
Zaragoza H.
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2007
Field of study

Todays web search engines try to offer services for finding various information in addition to simple web pages, like showing locations or answering simple fact queries. Understanding the association of named entities and documents is one of the key steps towards such semantic search tasks. This paper addresses the ranking of entities and models it in a graph-based relevance propagation framework. In particular we study the problem of expert finding as an example of an entity ranking task. Entity containment graphs are introduced that represent the relationship between text fragments on the one hand and their contained entities on the other hand. The paper shows how these graphs can be used to propagate relevance information from the pre-ranked text fragments to their entities. We use this propagation framework to model existing approaches to expert finding based on the entity's indegree and extend them by recursive relevance propagation based on a probabilistic random walk over the entity containment graphs. Experiments on the TREC expert search task compare the retrieval performance of the different graph and propagation models

CiteSeerX

Radboud Repository

University of Twente Research Information

IMPrECISE: Good-is-good-enough data integration

Author: Keijzer Ander de
Keulen Maurice van
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2008
Field of study

IMPrECISE is an XQuery module that adds probabilistic XML functionality to an existing XML DBMS, in our case MonetDB/XQuery. We demonstrate probabilistic XML and data integration functionality of IMPrECISE. The prototype is configurable with domain knowledge such that the amount of uncertainty arising during data integration is reduced to an acceptable level, thus obtaining a "good is good enough" data integration with minimal human effort

CiteSeerX

University of Twente Research Information

A Database Approach to Content-based XML retrieval

Author: Hiemstra D.
Publication venue: European Research Consortium for Informatics and Mathematics (ERCIM)
Publication date: 01/01/2002
Field of study

This paper describes a rst prototype system for content-based retrieval from XML data. The system's design supports both XPath queries and complex information retrieval queries based on a language modelling approach to information retrieval. Evaluation using the INEX benchmark shows that it is beneficial if the system is biased to retrieve large XML fragments over small fragments

CiteSeerX

Radboud Repository

University of Twente Research Information

Quality Measures in Uncertain Data Management

Author: Keijzer A. de
Keulen M. van
Publication venue: Springer Verlag
Publication date: 01/01/2007
Field of study

Many applications deal with data that is uncertain. Some examples are applications dealing with sensor information, data integration applications and healthcare applications. Instead of these applications having to deal with the uncertainty, it should be the responsibility of the DBMS to manage all data including uncertain data. Several projects do research on this topic. In this paper, we introduce four measures to be used to assess and compare important characteristics of data and systems

University of Twente Research Information