1,866 research outputs found

    The design and implementation of an infrastructure for multimedia digital libraries

    Get PDF
    We develop an infrastructure for managing, indexing and serving multimedia content in digital libraries. This infrastructure follows the model of the Web, and thereby is distributed in nature. We discuss the design of the Librarian, the component that manages meta data about the content. The management of meta data has been separated from the media servers that manage the content itself. Also, the extraction of the meta data is largely independent of the Librarian. We introduce our extensible data model and the daemon paradigm that are the core pieces of this architecture. We evaluate our initial implementation using a relational database. We conclude with a discussion of the lessons we learned in building this system, and proposals for improving the flexibility, reliability, and performance of the syste

    Middleware-based Database Replication: The Gaps between Theory and Practice

    Get PDF
    The need for high availability and performance in data management systems has been fueling a long running interest in database replication from both academia and industry. However, academic groups often attack replication problems in isolation, overlooking the need for completeness in their solutions, while commercial teams take a holistic approach that often misses opportunities for fundamental innovation. This has created over time a gap between academic research and industrial practice. This paper aims to characterize the gap along three axes: performance, availability, and administration. We build on our own experience developing and deploying replication systems in commercial and academic settings, as well as on a large body of prior related work. We sift through representative examples from the last decade of open-source, academic, and commercial database replication systems and combine this material with case studies from real systems deployed at Fortune 500 customers. We propose two agendas, one for academic research and one for industrial R&D, which we believe can bridge the gap within 5-10 years. This way, we hope to both motivate and help researchers in making the theory and practice of middleware-based database replication more relevant to each other.Comment: 14 pages. Appears in Proc. ACM SIGMOD International Conference on Management of Data, Vancouver, Canada, June 200

    Geographica: A Benchmark for Geospatial RDF Stores

    Full text link
    Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been defined and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. In this paper, we develop a benchmark, called Geographica, which uses both real-world and synthetic data to test the offered functionality and the performance of some prominent geospatial RDF stores

    User Defined Types and Nested Tables in Object Relational Databases

    Get PDF
    Bernadette Byrne, Mary Garvey, ‘User Defined Types and Nested Tables in Object Relational Databases’, paper presented at the United Kingdom Academy for Information Systems 2006: Putting Theory into Practice, Cheltenham, UK, 5-7 June, 2006.There has been much research and work into incorporating objects into databases with a number of object databases being developed in the 1980s and 1990s. During the 1990s the concept of object relational databases became popular, with object extensions to the relational model. As a result, several relational databases have added such extensions. There has been little in the way of formal evaluation of object relational extensions to commercial database systems. In this work an airline flight logging system, a real-world database application, was taken and a database developed using a regular relational database and again using object relational extensions, allowing the evaluation of the relational extensions.Peer reviewe

    Rumble: Data Independence for Large Messy Data Sets

    Full text link
    This paper introduces Rumble, an engine that executes JSONiq queries on large, heterogeneous and nested collections of JSON objects, leveraging the parallel capabilities of Spark so as to provide a high degree of data independence. The design is based on two key insights: (i) how to map JSONiq expressions to Spark transformations on RDDs and (ii) how to map JSONiq FLWOR clauses to Spark SQL on DataFrames. We have developed a working implementation of these mappings showing that JSONiq can efficiently run on Spark to query billions of objects into, at least, the TB range. The JSONiq code is concise in comparison to Spark's host languages while seamlessly supporting the nested, heterogeneous data sets that Spark SQL does not. The ability to process this kind of input, commonly found, is paramount for data cleaning and curation. The experimental analysis indicates that there is no excessive performance loss, occasionally even a gain, over Spark SQL for structured data, and a performance gain over PySpark. This demonstrates that a language such as JSONiq is a simple and viable approach to large-scale querying of denormalized, heterogeneous, arborescent data sets, in the same way as SQL can be leveraged for structured data sets. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.Comment: Preprint, 9 page

    Rule representation and management in ConceptBase.

    Get PDF
    • …
    corecore