19 research outputs found
Indexing real-world data using semi-structured documents
We address the problem of deriving meaningful semantic index information for a multi-media database using a semi-structured docu-ment model. We show how our framework, called {em feature grammars, can be used to (1)~exploit third-party interpretation modules for real-world unstructured components, and (2)~use context-free grammars to convert such poorly or unstructured input to semi-structured output. The basic idea is to enrich context-free grammars with special symbols called detectors, which provide for the necessary structure {em just-in-time to satisfy a parser look-ahead. A prototype implementation has been constructed in the Acoi project to demonstrate the feasibility of this approach for indexing both images and audio documents
Programming with heterogeneous structures: Manipulating XML data using bondi
Manipulating semistructured data, such as XML, does not fit well within conventional programming languages. A typical manipulation requires finding all occurrences of a structure matching a structured search pattern, whose context may be different in different places, and both aspects cause difficulty. If a special-purpose query language is used to manipulate XML, an interface to a more general programming environment is required, and this interface typically creates runtime overhead for type conversion. However, adding XML manipulation to a general-purpose programming language has proven difficult because of problems associated with expressiveness and typing. We show an alternative approach that handles many kinds of patterns within an existing strongly-typed general-purpose programming language called bondi. The key ideas are to express complex search patterns as structures of simple patterns, pass these complex patterns as parameters to generic data-processing functions and traverse heterogeneous data structures by a generalized form of pattern matching. These ideas are made possible by the language's support for pattern calculus, whose typing on structures and patterns enables path and pattern polymorphism. With this approach, adding a new kind of pattern is just a matter of programming, not language design. Copyright © 2006, Australian Computer Society, Inc
YAXQL : A powerful and web-aware query language supporting query reuse and data integration
Since XML seems to be the next great wave on the web, several query languages for XML have been proposed. Unfortunately, none of these proposals comes even close to meet the requirements for such a query language. We review the requirements for a query language for XML and propose a new query language, YAXQL, which meet them
Querying websites using compact skeletons
AbstractSeveral commercial applications, such as online comparison shopping and process automation, require integrating information that is scattered across multiple websites or XML documents. Much research has been devoted to this problem, resulting in several research prototypes and commercial implementations. Such systems rely on wrappers that provide relational or other structured interfaces to websites. Traditionally, wrappers have been constructed by hand on a per-website basis, constraining the scalability of the system. We introduce a website structure inference mechanism called compact skeletons that is a step in the direction of automated wrapper generation. Compact skeletons provide a transformation from websites or other hierarchical data, such as XML documents, to relational tables. We study several classes of compact skeletons and provide polynomial-time algorithms and heuristics for automated construction of compact skeletons from websites. Experimental results show that our heuristics work well in practice. We also argue that compact skeletons are a natural extension of commercially deployed techniques for wrapper construction
Optimized Seamless Integration of Biomolecular Data
Today, scientific data is inevitably digitized, stored in a wide
variety of heterogeneous formats, and is accessible over the Internet.
Scientists need to access an integrated view of multiple remote or
local heterogeneous data sources. They then integrate the results
of complex queries and apply further analysis and visualization
to support the task of scientific discovery. Building such a digital
library for scientific discovery requires accessing and manipulating
data extracted from flat files or databases, documents retrieved from
the Web, as well as data that is locally materialized in warehouses
or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges
to be addressed include capturing and representing source capabilities;
developing a methodology to acquire and represent semantic knowledge
and metadata about source contents, overlap in source contents,
and access costs; and decision support to select sources
and capabilities using cost based and semantic knowledge, and
generating low cost query evaluation plans.
(Also referenced as UMIACS-TR-2001-51
Content warehouses
Nowadays, content management systems are an established technology. Based on the experiences from several application scenarios we discuss the points of contact between content management systems and other disciplines of information systems engineering like data warehouses, data mining, and data integration. We derive a system architecture called "content warehouse" that integrates these technologies and defines a more general and more sophisticated view on content management. As an example, a system for the collection, maintenance, and evaluation of biological content like survey data or multimedia resources is shown as a case study
Verteilung und Integration von Informationen im Verkehrsbereich
Verteilung und Mobilität spielen in der Verkehrstelematik eine
große Rolle. Die verwendeten Datenquellen sind im Allgemeinen
heterogen und von unterschiedlicher Qualität. Im Rahmen des
Verbundprojektes OVID der Universität Karlsruhe (TH) bot das
Institut für Programmstrukturen und Datenorganisation (IPD) im
Sommersemester 2004 ein Seminar mit dem Titel "Verteilung und
Integration von Informationen im Verkehrsbereich" an. In diesem
Seminar wurden Fragestellungen untersucht, die sich mit den
Anforderungen und existierenden Techniken für hochgradige
Verteilung und Mobilität von Datenquellen im Verkehrsbereich
beschäftigten. Die dabei erzielten Ergebnisse werden in diesem
Bericht vorgestellt