117 research outputs found

    On the use of query-driven XML auto-indexing

    Full text link

    An application of AI methods for refining the storage strategy in multi-model database systems: A survey

    Get PDF
    Multi-Model database systems combine the advantages of traditional and NoSQL database systems. However, the management of these systems is challenging, as users have to design an appropriate storage strategy for their data. One of the most influential factors in the storage strategy is the selection of indexes. Indexes can significantly improve query performance, but they require additional storage space and maintenance overhead. Index selection problem is well-studied in the context of single-model Database Management Systems (DBMSs), but there is a lack of research in the context of multi-model database systems. We address this problem by conducting a survey of current state-of-the-art index selection algorithms and evaluating their applicability to other DBMSs. The results reveal the strengths and weaknesses of existing algorithms and highlight the need for specialized algorithms for multi-model database systems. Moreover, we formulate open questions and suggest future research directions in this field. Our research provides a foundation for the development of efficient index selection algorithms for multi-model DBMSs. 1Multi-Modelové databázové systémy kombinujú výhody tradičných a NoSQL databá- zových systémov. Správa týchto systémov je však náročná, pretože používatelia musia sami navrhnúť vhodnú stratégiu ukladania svojich dát. Jedným z najvplyvnejších fak- torov v stratégii ukladania dát je výber indexov. Indexy môžu výrazne zlepšiť rýchlosť výkonávania dotazov, ale vyžadujú dodatočné miesto na úložisku a ďalšie náklady na údržbu. Problém výberu indexov je v akadémií aj v priemysle veľmi známy a často skúmaný problém. Väčšina výskumu sa však zameriava na databázové systémy, ktoré podporujú iba jeden model dát. Výskum v kontexte multi-modelových databázových systémov zatiaľ chýba. Náš príspevok k riešeniu tohto problému je prehľad súčasných algoritmov na výber in- dexov a ich aplikovateľnosť na rozličné databázové systémy. Naše výsledky odhaľujú silné a slabé stránky existujúcich algoritmov a poukazujú na potrebu špecializovaných algorit- mov pre multi-modelové databázové systémy. Okrem toho formulujeme otvorené otázky a navrhnujeme budúce smerovanie výskumu v tejto oblasti. Naša práca predstavuje základný kameň pre vývoj efektívnych algoritmov na výber indexov pre multi-modelové databázové systémy. 1Katedra softwarového inženýrstvíDepartment of Software EngineeringFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    08421 Abstracts Collection -- Uncertainty Management in Information Systems

    Get PDF
    From October 12 to 17, 2008 the Dagstuhl Seminar 08421 \u27`Uncertainty Management in Information Systems \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. The abstracts of the plenary and session talks given during the seminar as well as those of the shown demos are put together in this paper

    Anatomy of a Native XML Base Management System

    Full text link
    Several alternatives to manage large XML document collections exist, ranging from file systems over relational or other database systems to specifically tailored XML repositories. In this paper we give a tour of Natix, a database management system designed from scratch for storing and processing XML data. Contrary to the common belief that management of XML data is just another application for traditional databases like relational systems, we illustrate how almost every component in a database system is affected in terms of adequacy and performance. We show how to design and optimize areas such as storage, transaction management comprising recovery and multi-user synchronisation as well as query processing for XML

    Big Data Management Challenges, Approaches, Tools and their limitations

    No full text
    International audienceBig Data is the buzzword everyone talks about. Independently of the application domain, today there is a consensus about the V's characterizing Big Data: Volume, Variety, and Velocity. By focusing on Data Management issues and past experiences in the area of databases systems, this chapter examines the main challenges involved in the three V's of Big Data. Then it reviews the main characteristics of existing solutions for addressing each of the V's (e.g., NoSQL, parallel RDBMS, stream data management systems and complex event processing systems). Finally, it provides a classification of different functions offered by NewSQL systems and discusses their benefits and limitations for processing Big Data

    Graph databases and their application to the Italian Business Register for efficient search of relationships among companies

    Get PDF
    We studied and tested three of the major graph databases, and we compared them with a relational database. We worked on a dataset representing equity participations among companies, and we found out that the strong points of graph databases are: the purposely designed storage techniques; and their query languages. The main performance increments have been obtained when heavy graph situations are queried; for simpler situations and queries, a relational database performs equally wellope

    The Family of MapReduce and Large Scale Data Processing Systems

    Full text link
    In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

    On the performance impact of using JSON, beyond impedance mismatch

    Get PDF
    NOSQL database management systems adopt semi-structured data models, such as JSON, to easily accommodate schema evolution and overcome the overhead generated from transforming internal structures to tabular data (i.e., impedance mismatch). There exist multiple, and equivalent, ways to physically represent semi-structured data, but there is a lack of evidence about the potential impact on space and query performance. In this paper, we embark on the task of quantifying that, precisely for document stores. We empirically compare multiple ways of representing semi-structured data, which allows us to derive a set of guidelines for efficient physical database design considering both JSON and relational options in the same palette.Partly funded by the European Commission through the programme “EM IT4BI-DC”.Peer ReviewedPostprint (author's final draft

    A comparison of open source object-oriented database products

    Get PDF
    Object oriented databases have been gaining popularity over the years. Their ease of use and the advantages that they offer over relational databases have made them a popular choice amongst database administrators. Their use in previous years was restricted to business and administrative applications, but improvements in technology and the emergence of new, data-intensive applications has led to the increase in the use of object databases. This study investigates four Open Source object-oriented databases on their ability to carry out the standard database operations of storing, querying, updating and deleting database objects. Each of these databases will be timed in order to measure which is capable of performing a particular function faster than the other
    corecore