Search CORE

672 research outputs found

Content-Aware DataGuides for Indexing Large Collections of XML Documents

Author: Bry François
Meuss Holger
Schulz Klaus U.
Weigel Felix
Publication venue
Publication date: 01/01/2003
Field of study

XML is well-suited for modelling structured data with textual content. However, most indexing approaches perform structure and content matching independently, combining the retrieved path and keyword occurrences in a third step. This paper shows that retrieval in XML documents can be accelerated significantly by processing text and structure simultaneously during all retrieval phases. To this end, the Content-Aware DataGuide (CADG) enhances the wellknown DataGuide with (1) simultaneous keyword and path matching and (2) a precomputed content/structure join. Extensive experiments prove the CADG to be 50-90% faster than the DataGuide for various sorts of query and document, including difficult cases such as poorly structured queries and recursive document paths. A new query classification scheme identifies precise query characteristics with a predominant influence on the performance of the individual indices. The experiments show that the CADG is applicable to many real-world applications, in particular large collections of heterogeneously structured XML documents

CiteSeerX

Open Access LMU

High-Performance Reachability Query Processing under Index Size Restrictions

Author: Anand Avishek
Bedathur Srikanta
Seufert Stephan
Weikum Gerhard
Publication venue
Publication date: 01/01/2012
Field of study

In this paper, we propose a scalable and highly efficient index structure for the reachability problem over graphs. We build on the well-known node interval labeling scheme where the set of vertices reachable from a particular node is compactly encoded as a collection of node identifier ranges. We impose an explicit bound on the size of the index and flexibly assign approximate reachability ranges to nodes of the graph such that the number of index probes to answer a query is minimized. The resulting tunable index structure generates a better range labeling if the space budget is increased, thus providing a direct control over the trade off between index size and the query processing performance. By using a fast recursive querying method in conjunction with our index structure, we show that in practice, reachability queries can be answered in the order of microseconds on an off-the-shelf computer - even for the case of massive-scale real world graphs. Our claims are supported by an extensive set of experimental results using a multitude of benchmark and real-world web-scale graph datasets.Comment: 30 page

arXiv.org e-Print Archive

MPG.PuRe

Template-driven teacher modelling approach : a thesis submitted in partial fulfilment of the requirements for the degree of Master of Science in Information Science at Massey University, Palmerston North

Author: Shi Yanmin
Publication venue: 'Massey University'
Publication date: 01/01/2004
Field of study

This thesis describes the Template-driven Teacher Modeling Approach, the initial implementation of the template server and the formative evaluation on the prototype. The initiative of Template-driven teacher modeling is to integrate the template server and intelligent teacher models in Web-based education systems for course authoring. There are a number of key components in the proposed system: user interface, template server and content repository. The Template-Driven Teacher Modeling (TDTM) architecture supports the course authoring by providing higher degree of control over the generation of presentation. The collection of accumulated templates in the template repository for a teacher or a group of teachers are selected as the inputs for the inference mechanism in teacher's model to calculate the best representation of the teaching strategy, and then predict teacher intention when he or she interacts with the system. Moreover, the presentation templates are kept to support the re-use of the on-line content at the level of individual screens with the help of Template Server

Massey Research Online

Adaptive indexing in modern database kernels

Author: Graefe G.
Idreos S. (Stratos)
Manegold S. (Stefan)
Publication venue: EDBT
Publication date: 01/03/2012
Field of study

Physical design represents one of the hardest problems for database management systems. Without proper tuning, systems cannot achieve good performance. Offline indexing creates indexes a priori assuming good workload knowledge and idle time. More recently, online indexing monitors the workload trends and creates or drops indexes online. Adaptive indexing takes another step towards completely automating the tuning process of a database system, by enabling incremental and partial online indexing. The main idea is that physical design changes continuously, adaptively, partially, incrementally and on demand while processing queries as part of the execution operators. As such it brings a plethora of opportunities for rethinking and improving every single corner of database system design. We will analyze the indexing space between offline, online and adaptive indexing through several state of the art indexing techniques, e. g., what-if analysis and soft indexes. We will discuss in detail adaptive indexing techniques such as database cracking, adaptive merging, sideways cracking and various hybrids that try to balance the online tuning overhead with the convergence speed to optimal performance. In addition, we will discuss how various aspects of modern techniques for database architectures, such as vectorization, bulk processing, column-store execution and storage affect adaptive indexing. Finally, we will discuss several open research topics towards fully automomous database kernels

CWI's Institutional Repository

Adaptive content mapping for internet navigation

Author: Brause Rüdiger W.
Ueberall Markus
Publication venue
Publication date: 08/09/2010
Field of study

The Internet as the biggest human library ever assembled keeps on growing. Although all kinds of information carriers (e.g. audio/video/hybrid file formats) are available, text based documents dominate. It is estimated that about 80% of all information worldwide stored electronically exists in (or can be converted into) text form. More and more, all kinds of documents are generated by means of a text processing system and are therefore available electronically. Nowadays, many printed journals are also published online and may even discontinue to appear in print form tomorrow. This development has many convincing advantages: the documents are both available faster (cf. prepress services) and cheaper, they can be searched more easily, the physical storage only needs a fraction of the space previously necessary and the medium will not age. For most people, fast and easy access is the most interesting feature of the new age; computer-aided search for specific documents or Web pages becomes the basic tool for information-oriented work. But this tool has problems. The current keyword based search machines available on the Internet are not really appropriate for such a task; either there are (way) too many documents matching the specified keywords are presented or none at all. The problem lies in the fact that it is often very difficult to choose appropriate terms describing the desired topic in the first place. This contribution discusses the current state-of-the-art techniques in content-based searching (along with common visualization/browsing approaches) and proposes a particular adaptive solution for intuitive Internet document navigation, which not only enables the user to provide full texts instead of manually selected keywords (if available), but also allows him/her to explore the whole database

Hochschulschriftenserver - Universität Frankfurt am Main

MonetDB: Two Decades of Research in Column-oriented Database Architectures

Author: Groffen F.E. (Fabian)
Idreos S. (Stratos)
Kersten M.L. (Martin)
Manegold S. (Stefan)
Mullender K.S. (Sjoerd)
Nes N.J. (Niels)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2012
Field of study

MonetDB is a state-of-the-art open-source column-store database management system targeting applications in need for analytics over large collections of data. MonetDB is actively used nowadays in health care, in telecommunications as well as in scientiﬁc databases and in data management research, accumulating on average more than 10,000 downloads on a monthly basis. This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current MonetDB design and form the basis for its future evolution

CWI's Institutional Repository

Accelerating data retrieval steps in XML documents

Author: Shen Yun
Publication venue
Publication date: 01/01/2005
Field of study

Repository@Hull - Worktribe