658 research outputs found

    A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance

    No full text
    This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics

    A Comparison of Query Execution Speeds for Large Amounts of Data Using Various DBMS Engines Executing on Selected RAM and CPU Configurations

    Get PDF
    In modern economies, most important business decisions are based on detailed analysis of available data. In order to obtain a rapid response from analytical tools, data should be pre-aggregated over dimensions that are of most interest to each business. Sometimes however, important decisions may require analysis of business data over seemingly less important dimensions which have not been pre-aggregated during the ETL process. On these occasions, the ad-hoc "online" aggregation is performed whose execution time is dependent on the overall DBMS performance. This paper describes how the performance of several commercial and non-commercial DBMSs was tested by running queries designed for data analysis using "ad-hoc" aggregations over large volumes of data. Each DBMS was installed on a separate virtual machine and was run on several computers, and two amounts of RAM memory were allocated for each test. Measurements of query execution times were recorded which demonstrated that, as expected, column-oriented databases out-performed classical row-oriented database systems

    Enhancing Computation Pushdown for Cloud OLAP Databases

    Full text link
    Network is a major bottleneck in modern cloud databases that adopt a storage-disaggregation architecture. Computation pushdown is a promising solution to tackle this issue, which offloads some computation tasks to the storage layer to reduce network traffic. Existing cloud OLAP systems statically decide whether to push down computation during the query optimization phase and do not consider the storage layer's computational capacity and load. Besides, there is a lack of a general principle that determines which operators are amenable for pushdown. Existing systems design and implement pushdown features empirically, which ends up picking a limited set of pushdown operators respectively. In this paper, we first design Adaptive pushdown as a new mechanism to avoid throttling the storage-layer computation during pushdown, which pushes the request back to the computation layer at runtime if the storage-layer computational resource is insufficient. Moreover, we derive a general principle to identify pushdown-amenable computational tasks, by summarizing common patterns of pushdown capabilities in existing systems. We propose two new pushdown operators, namely, selection bitmap and distributed data shuffle. Evaluation results on TPC-H show that Adaptive pushdown can achieve up to 1.9x speedup over both No pushdown and Eager pushdown baselines, and the new pushdown operators can further accelerate query execution by up to 3.0x.Comment: 13 pages, 15 figure

    Using a Data Warehouse as Part of a General Business Process Data Analysis System

    Get PDF
    Data analytics queries often involve aggregating over massive amounts of data, in order to detect trends in the data, make predictions about future data, and make business decisions as a result. As such, it is important that a database management system (DBMS) handling data analytics queries perform well when those queries involve massive amounts of data. A data warehouse is a DBMS which is designed specifically to handle data analytics queries. This thesis describes the data warehouse Amazon Redshift, and how it was used to design a data analysis system for Laserfiche. Laserfiche is a software company that provides each of their clients a system to store and process business process data. Through the 2015-16 Harvey Mudd College Clinic project, the Clinic team built a data analysis system that provides Laserfiche clients with near real-time reports containing analyses of their business process data. This thesis discusses the advantages of Redshift’s data model and physical storage layout, as well as Redshift’s features directly benefit of the data analysis system

    Business Analytics in (a) Blink

    Get PDF
    The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorp

    A spatial column-store to triangulate the Netherlands on the fly

    Get PDF
    3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS

    Laying the Groundwork for the Development of the Data Archive of the New Robotic Telescope

    Get PDF
    The Liverpool Telescope has been in fully autonomous operation since 2004. The supporting data archive facility has largely been untouched. The data provision service has not been an issue although some modernisation of the system is desirable. This project is timely. Not only does it suit the upgrade of the current LT data archive, it is in line with the design phase of the New Robotic Telescope which will be online in the early-2020s; and with the development of a new data archive facility for a range of telescopes at the National Astronomical Research Institute of Thailand. The Newton Fund enabled us to collaborate in designing a new versatile generic system that serves all purposes. In the end, we conclude that a single system would not meet the needs of all parties and only adopt similar front-ends while the back-ends are bespoke to our respective systems and data-flows

    A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities

    Get PDF
    To face the tough competition, changing markets and technologies in automotive industry, automakers have to be highly innovative. In the previous decades, innovations were electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal network. Furthermore, the growing expectations and preferences of customers oblige these manufacturers to adapt their business models and to also propose mobility-based services. One other hand, there is also an increasing pressure from regulators to significantly reduce the environmental footprint in transportation and mobility, down to zero in the foreseeable future. This dissertation investigates an architecture for communication and data exchange within a complex and heterogeneous ecosystem. This communication takes place between various third-party entities on one side, and between these entities and the infrastructure on the other. The proposed solution reduces considerably the complexity of vehicle communication and within the parties involved in the ODX life cycle. In such an heterogeneous environment, a particular attention is paid to the protection of confidential and private data. Confidential data here refers to the OEM’s know-how which is enclosed in vehicle projects. The data delivered by a car during a vehicle communication session might contain private data from customers. Our solution ensures that every entity of this ecosystem has access only to data it has the right to. We designed our solution to be non-technological-coupling so that it can be implemented in any platform to benefit from the best environment suited for each task. We also proposed a data model for vehicle projects, which improves query time during a vehicle diagnostic session. The scalability and the backwards compatibility were also taken into account during the design phase of our solution. We proposed the necessary algorithms and the workflow to perform an efficient vehicle diagnostic with considerably lower latency and substantially better complexity time and space than current solutions. To prove the practicality of our design, we presented a prototypical implementation of our design. Then, we analyzed the results of a series of tests we performed on several vehicle models and projects. We also evaluated the prototype against quality attributes in software engineering
    • …
    corecore