19 research outputs found

    Federating Heterogeneous Information Systems Using Web Services and Ontologies

    Get PDF

    Indexing real-world data using semi-structured documents

    Get PDF
    We address the problem of deriving meaningful semantic index information for a multi-media database using a semi-structured docu-ment model. We show how our framework, called {em feature grammars, can be used to (1)~exploit third-party interpretation modules for real-world unstructured components, and (2)~use context-free grammars to convert such poorly or unstructured input to semi-structured output. The basic idea is to enrich context-free grammars with special symbols called detectors, which provide for the necessary structure {em just-in-time to satisfy a parser look-ahead. A prototype implementation has been constructed in the Acoi project to demonstrate the feasibility of this approach for indexing both images and audio documents

    Programming with heterogeneous structures: Manipulating XML data using bondi

    Full text link
    Manipulating semistructured data, such as XML, does not fit well within conventional programming languages. A typical manipulation requires finding all occurrences of a structure matching a structured search pattern, whose context may be different in different places, and both aspects cause difficulty. If a special-purpose query language is used to manipulate XML, an interface to a more general programming environment is required, and this interface typically creates runtime overhead for type conversion. However, adding XML manipulation to a general-purpose programming language has proven difficult because of problems associated with expressiveness and typing. We show an alternative approach that handles many kinds of patterns within an existing strongly-typed general-purpose programming language called bondi. The key ideas are to express complex search patterns as structures of simple patterns, pass these complex patterns as parameters to generic data-processing functions and traverse heterogeneous data structures by a generalized form of pattern matching. These ideas are made possible by the language's support for pattern calculus, whose typing on structures and patterns enables path and pattern polymorphism. With this approach, adding a new kind of pattern is just a matter of programming, not language design. Copyright © 2006, Australian Computer Society, Inc

    YAXQL : A powerful and web-aware query language supporting query reuse and data integration

    Get PDF
    Since XML seems to be the next great wave on the web, several query languages for XML have been proposed. Unfortunately, none of these proposals comes even close to meet the requirements for such a query language. We review the requirements for a query language for XML and propose a new query language, YAXQL, which meet them

    Querying websites using compact skeletons

    Get PDF
    AbstractSeveral commercial applications, such as online comparison shopping and process automation, require integrating information that is scattered across multiple websites or XML documents. Much research has been devoted to this problem, resulting in several research prototypes and commercial implementations. Such systems rely on wrappers that provide relational or other structured interfaces to websites. Traditionally, wrappers have been constructed by hand on a per-website basis, constraining the scalability of the system. We introduce a website structure inference mechanism called compact skeletons that is a step in the direction of automated wrapper generation. Compact skeletons provide a transformation from websites or other hierarchical data, such as XML documents, to relational tables. We study several classes of compact skeletons and provide polynomial-time algorithms and heuristics for automated construction of compact skeletons from websites. Experimental results show that our heuristics work well in practice. We also argue that compact skeletons are a natural extension of commercially deployed techniques for wrapper construction

    Optimized Seamless Integration of Biomolecular Data

    Get PDF
    Today, scientific data is inevitably digitized, stored in a wide variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building such a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent semantic knowledge and metadata about source contents, overlap in source contents, and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans. (Also referenced as UMIACS-TR-2001-51

    Content warehouses

    Get PDF
    Nowadays, content management systems are an established technology. Based on the experiences from several application scenarios we discuss the points of contact between content management systems and other disciplines of information systems engineering like data warehouses, data mining, and data integration. We derive a system architecture called "content warehouse" that integrates these technologies and defines a more general and more sophisticated view on content management. As an example, a system for the collection, maintenance, and evaluation of biological content like survey data or multimedia resources is shown as a case study

    Verteilung und Integration von Informationen im Verkehrsbereich

    Get PDF
    Verteilung und Mobilität spielen in der Verkehrstelematik eine große Rolle. Die verwendeten Datenquellen sind im Allgemeinen heterogen und von unterschiedlicher Qualität. Im Rahmen des Verbundprojektes OVID der Universität Karlsruhe (TH) bot das Institut für Programmstrukturen und Datenorganisation (IPD) im Sommersemester 2004 ein Seminar mit dem Titel "Verteilung und Integration von Informationen im Verkehrsbereich" an. In diesem Seminar wurden Fragestellungen untersucht, die sich mit den Anforderungen und existierenden Techniken für hochgradige Verteilung und Mobilität von Datenquellen im Verkehrsbereich beschäftigten. Die dabei erzielten Ergebnisse werden in diesem Bericht vorgestellt
    corecore