940 research outputs found

    Tupleware: Redefining Modern Analytics

    Full text link
    There is a fundamental discrepancy between the targeted and actual users of current analytics frameworks. Most systems are designed for the data and infrastructure of the Googles and Facebooks of the world---petabytes of data distributed across large cloud deployments consisting of thousands of cheap commodity machines. Yet, the vast majority of users operate clusters ranging from a few to a few dozen nodes, analyze relatively small datasets of up to a few terabytes, and perform primarily compute-intensive operations. Targeting these users fundamentally changes the way we should build analytics systems. This paper describes the design of Tupleware, a new system specifically aimed at the challenges faced by the typical user. Tupleware's architecture brings together ideas from the database, compiler, and programming languages communities to create a powerful end-to-end solution for data analysis. We propose novel techniques that consider the data, computations, and hardware together to achieve maximum performance on a case-by-case basis. Our experimental evaluation quantifies the impact of our novel techniques and shows orders of magnitude performance improvement over alternative systems

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics

    On the use of a reflective architecture to augment Database Management Systems

    Get PDF
    The Database Management System (DBMS) used to be a commodity software component, with well known standard interfaces and semantics. However, the performance and reliability expectations being placed on DBMSs have increased the demand for a variety add-ons, that augment the functionality of the database in a wide range of deployment scenarios, offering support for features such as clustering, replication, and selfmanagement, among others. The effectiveness of such extensions largely rests on closely matching the actual needs of applications, hence on a wide range of tradeoffs and configuration options out of the scope of traditional client interfaces. A well known software engineering approach to systems with such requirements is reflection. Unfortunately, standard reflective interfaces in DBMSs are very limited (for instance, they often do not support the desired range of atomicity guarantees in a distributed setting). Some of these limitations may be circumvented by implementing reflective features as a wrapper to the DBMS server. Unfortunately, this solutions comes at the expense of a large development effort and significant performance penalty. In this paper we propose a general purpose DBMS reflection architecture and interface, that supports multiple extensions while, at the same time, admitting efficient implementations. We illustrate the usefulness of our proposal with concrete examples, and evaluate its cost and performance under different implementation strategies

    Performance Evaluation for Clustering Algorithms in Object-Oriented Database Systems

    Get PDF
    International audienceIt is widely acknowledged that good object clustering is critical to the performance of object-oriented databases. However, object clustering always involves some kind of overhead for the system. The aim of this paper is to propose a modelling methodology in order to evaluate the performances of different clustering policies. This methodology has been used to compare the performances of three clustering algorithms found in the literature (Cactis, CK and ORION) that we considered representative of the current research in the field of object clustering. The actual performance evaluation was performed using simulation. Simulation experiments we performed showed that the Cactis algorithm is better than the ORION algorithm and that the CK algorithm totally outperforms both other algorithms in terms of response time and clustering overhead
    corecore