Search CORE

91 research outputs found

PFTijah: text search in an XML database system

Author: Flokstra J.
Hiemstra D.
Os R. van
Rode H.
Publication venue: Ecole Nationale Supérieure des Mines de Saint-Etienne
Publication date: 01/01/2006
Field of study

This paper introduces the PFTijah system, a text search system that is integrated with an XML/XQuery database management system. We present examples of its use, we explain some of the system internals, and discuss plans for future work. PFTijah is part of the open source release of MonetDB/XQuery

CiteSeerX

Radboud Repository

University of Twente Research Information

Sound ranking algorithms for XML search

Author: Apers P.M.G.
Flokstra J.
Hiemstra D.
Klinger S.
Rode H.
Publication venue: University of Otago
Publication date: 01/01/2008
Field of study

Ranking algorithms for XML should reflect the actual combined content and structure constraints of queries, while at the same time producing equal rankings for queries that are semantically equal. Ranking algorithms that produce different rankings for queries that are semantically equal are easily detected by tests on large databases: We call such algorithms not sound. We report the behavior of different approaches to ranking content-and-structure queries on pairs of queries for which we expect equal ranking results from the query semantics. We show that most of these approaches are not sound. Of the remaining approaches, only 3 adhere to the W3C XQuery Full-Text standard

KOPS - The Institutional Repository of the University of Konstanz

CiteSeerX

University of Twente Research Information

Towards Business Intelligence over Unified Structured and Unstructured Data Using XML

Author: Vishu Krishnamurthy
Zhen Hua Liu
Publication venue: 'IntechOpen'
Publication date: 01/02/2012
Field of study

IntechOpen

DB&IR Integration: Report on the Dagstuhl Seminar ''Ranked XML Querying''

Author: Amer-Yahia S.
Hiemstra Djoerd
Roelleke T.
Srivastava D.
Weikum G.
Publication venue: Dagstuhl
Publication date: 01/01/2008
Field of study

University of Twente Research Information

Structured Text Retrieval Models

Author: C.L.A. Clarke
G. Navarro
R.A. Baeza-Yates
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 03/04/2008
Field of study

Structured text retrieval models provide a formal definition or mathematical framework for querying semistructured textual databases. A textual database contains both content and structure. The content is the text itself, and the structure divides the database into separate textual parts and relates those textual parts by some criterion. Often, textual databases can be represented as marked up text, for instance as XML, where the XML elements define the structure on the text content. Retrieval models for textual databases should comprise three parts: 1) a model of the text, 2) a model of the structure, and 3) a query language [4]: The model of the text defines a tokenization into words or other semantic units, as well as stop words, stemming, synonyms, etc. The model of the structure defines parts of the text, typically a contiguous portion of the text called element, region, or segment, which is defined on top of the text modelâ\u80\u99s word tokens. The query language typically defines a number of operators on content and structure such as set operators and operators like â\u80\u9ccontaining â\u80\u9d and â\u80\u9ccontained-by â\u80\u9d to model relations between content and structure, as well as relations between the structural elements themselves. Using such a query language, the (expert) user can for instance formulate requests like â\u80\u9cI want a paragraph discussing formal models near to a table discussing the differences between databases and information retrievalâ\u80\u9d. Here, â\u80\u9cformal models â\u80\u9d and â\u80\u9cdifferences between databases and information retrieval â\u80\u9d should match the content that needs to be retrieved from the database, whereas â\u80\u9cparagraph â\u80\u9d and â\u80\u9ctable â\u80\u9d refer to structural constraints on the units to retrieve. The features, structuring power, and the expressiveness of the query languages of several models for structured text retrieval are discussed below. HISTORICAL BACKGROUND The STAIRS system (Storage and Information Retrieval System), which was developed at IBM already in the late 1950â\u80\u99s allowed querying both content and structure. Much like todayâ\u80\u99s On-line Public Access Catalogues, it wa

CiteSeerX

Crossref

Radboud Repository

University of Twente Research Information

A survey on tree matching and XML retrieval

Author: Aho
Al-Khalifa
Alilaouar
Amer-Yahia
Aouicha
Ayala
Bille
Bille
Botev
Bruno
Buneman
Burghardt
Cai
Campi
Ceri
Chamberlin
Chase
Chen
Chen
Chen
Chen
Chen
Chen
Cheng
Cole
Cole
Cyril Laitang
Dalamagas
Dalamagas
Damiani
Damiani
Dao
de Vries
Demaine
Denoyer
Dubiner
Dulucq
Dürr
Hamamache Kheddouci
Haw
Haw
Hoffmann
Hubert
Hummel
Izadi
Jansson
Jiang
Jiang
Jiang
Kamps
Karen Pinel-Sauvagnat
Kazai
Kazai
Kilpelainen
Klein
Knuth
Kosaraju
Kuboyama
Laitang
Lalmas
Lalmas
Le
Lei Ning
Levenshtein
Levy
Li
Li
Li
Lu
Lu
Mass
Mihajlovic
Mohammed Amin Tahraoui
Mohand Boughanem
Ogilvie
Pehcevski
Pehcevski
Pinel-Sauvagnat
Piwowarski
Popovici
Qin
Rao
Richter
Robie
Runapongsa
Schenkel
Schenkel
Schlieder
Shasha
Stahl
Tai
Tekli
Theobald
Trotman
Trotman
Trotman
Trotman
Trotman
van Zwol
Wagner
Wang
Wang
Wang
Wang
Wu
Yang
Yao
Zezula
Zezula
Zhang
Zhang
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/05/2013
Field of study

International audienceWith the increasing number of available XML documents, numerous approaches for retrieval have been proposed in the literature. They usually use the tree representation of documents and queries to process them, whether in an implicit or explicit way. Although retrieving XML documents can be considered as a tree matching problem between the query tree and the document trees, only a few approaches take advantage of the algorithms and methods proposed by the graph theory. In this paper, we aim at studying the theoretical approaches proposed in the literature for tree matching and at seeing how these approaches have been adapted to XML querying and retrieval, from both an exact and an approximate matching perspective. This study will allow us to highlight theoretical aspects of graph theory that have not been yet explored in XML retrieval

Crossref

Scientific Publications of the University of Toulouse II Le Mirail

Hal - Université Grenoble Alpes

Open Archive Toulouse Archive Ouverte

HAL

Hal-Diderot

A database approach to information retrieval:The remarkable relationship between language models and region models

Author: Hiemstra Djoerd
Mihajlovic V.
Publication venue: Centre for Telematics and Information Technology (CTIT)
Publication date: 01/01/2005
Field of study

In this report, we unify two quite distinct approaches to information retrieval: region models and language models. Region models were developed for structured document retrieval. They provide a well-defined behaviour as well as a simple query language that allows application developers to rapidly develop applications. Language models are particularly useful to reason about the ranking of search results, and for developing new ranking approaches. The unified model allows application developers to define complex language modeling approaches as logical queries on a textual database. We show a remarkable one-to-one relationship between region queries and the language models they represent for a wide variety of applications: simple ad-hoc search, cross-language retrieval, video retrieval, and web search

University of Twente Research Information