658 research outputs found
A Nine Month Progress Report on an Investigation into Mechanisms for Improving Triple Store Performance
This report considers the requirement for fast, efficient, and scalable triple stores as part of the effort to produce the Semantic Web. It summarises relevant information in the major background field of Database Management Systems (DBMS), and provides an overview of the techniques currently in use amongst the triple store community. The report concludes that for individuals and organisations to be willing to provide large amounts of information as openly-accessible nodes on the Semantic Web, storage and querying of the data must be cheaper and faster than it is currently. Experiences from the DBMS field can be used to maximise triple store performance, and suggestions are provided for lines of investigation in areas of storage, indexing, and query optimisation. Finally, work packages are provided describing expected timetables for further study of these topics
A Comparison of Query Execution Speeds for Large Amounts of Data Using Various DBMS Engines Executing on Selected RAM and CPU Configurations
In modern economies, most important business decisions are based on detailed analysis of available data. In order to obtain a rapid response from analytical tools, data should be pre-aggregated over dimensions that are of most interest to each business. Sometimes however, important decisions may require analysis of business data over seemingly less important dimensions which have not been pre-aggregated during the ETL process. On these occasions, the ad-hoc "online" aggregation is performed whose execution time is dependent on the overall DBMS performance. This paper describes how the performance of several commercial and non-commercial DBMSs was tested by running queries designed for data analysis using "ad-hoc" aggregations over large volumes of data. Each DBMS was installed on a separate virtual machine and was run on several computers, and two amounts of RAM memory were allocated for each test. Measurements of query execution times were recorded which demonstrated that, as expected, column-oriented databases out-performed classical row-oriented database systems
Enhancing Computation Pushdown for Cloud OLAP Databases
Network is a major bottleneck in modern cloud databases that adopt a
storage-disaggregation architecture. Computation pushdown is a promising
solution to tackle this issue, which offloads some computation tasks to the
storage layer to reduce network traffic. Existing cloud OLAP systems statically
decide whether to push down computation during the query optimization phase and
do not consider the storage layer's computational capacity and load. Besides,
there is a lack of a general principle that determines which operators are
amenable for pushdown. Existing systems design and implement pushdown features
empirically, which ends up picking a limited set of pushdown operators
respectively.
In this paper, we first design Adaptive pushdown as a new mechanism to avoid
throttling the storage-layer computation during pushdown, which pushes the
request back to the computation layer at runtime if the storage-layer
computational resource is insufficient. Moreover, we derive a general principle
to identify pushdown-amenable computational tasks, by summarizing common
patterns of pushdown capabilities in existing systems. We propose two new
pushdown operators, namely, selection bitmap and distributed data shuffle.
Evaluation results on TPC-H show that Adaptive pushdown can achieve up to 1.9x
speedup over both No pushdown and Eager pushdown baselines, and the new
pushdown operators can further accelerate query execution by up to 3.0x.Comment: 13 pages, 15 figure
Using a Data Warehouse as Part of a General Business Process Data Analysis System
Data analytics queries often involve aggregating over massive amounts of data, in order to detect trends in the data, make predictions about future data, and make business decisions as a result. As such, it is important that a database management system (DBMS) handling data analytics queries perform well when those queries involve massive amounts of data. A data warehouse is a DBMS which is designed specifically to handle data analytics queries.
This thesis describes the data warehouse Amazon Redshift, and how it was used to design a data analysis system for Laserfiche. Laserfiche is a software company that provides each of their clients a system to store and process business process data. Through the 2015-16 Harvey Mudd College Clinic project, the Clinic team built a data analysis system that provides Laserfiche clients with near real-time reports containing analyses of their business process data. This thesis discusses the advantages of Redshift’s data model and physical storage layout, as well as Redshift’s features directly benefit of the data analysis system
Business Analytics in (a) Blink
The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds,
regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS
aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core
processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally,
it exploits proprietary compression technology and cache-conscious algorithms that reduce memory
bandwidth consumption and allow most SQL query processing to be performed on the compressed data.
Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or
materialized views, and without any query optimizer to choose among them. The Blink technology has
thus far been incorp
A spatial column-store to triangulate the Netherlands on the fly
3D digital city models, important for urban planning, are currently constructed from massive point clouds obtained through airborne LiDAR (Light Detection and Ranging). They are semantically enriched with information obtained from auxiliary GIS data like Cadastral data which contains information about the boundaries of properties, road networks, rivers, lakes etc. Technical advances in the LiDAR data acquisition systems made possible the rapid acquisition of high resolution topographical information for an entire country. Such data sets are now reaching the trillion points barrier. To cope with this data deluge and provide up-to-date 3D digital city models on demand current geospatial management strategies should be re-thought. This work presents a column-oriented Spatial Database Management System which provides in-situ data access, effective data skipping, efficient spatial operations, and interactive data visualization. Its efficiency and scalability is demonstrated using a dense LiDAR scan of The Netherlands consisting of 640 billion points and the latest Cadastral information, and compared with PostGIS
Laying the Groundwork for the Development of the Data Archive of the New Robotic Telescope
The Liverpool Telescope has been in fully autonomous operation since 2004. The supporting data archive facility has largely been untouched. The data provision service has not been an issue although some modernisation of the system is desirable. This project is timely. Not only does it suit the upgrade of the current LT data archive, it is in line with the design phase of the New Robotic Telescope which will be online in the early-2020s; and with the development of a new data archive facility for a range of telescopes at the National Astronomical Research Institute of Thailand. The Newton Fund enabled us to collaborate in designing a new versatile generic system that serves all purposes. In the end, we conclude that a single system would not meet the needs of all parties and only adopt similar front-ends while the back-ends are bespoke to our respective systems and data-flows
A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities
To face the tough competition, changing markets and technologies in automotive industry,
automakers have to be highly innovative. In the previous decades, innovations were
electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal
network. Furthermore, the growing expectations and preferences of customers oblige these
manufacturers to adapt their business models and to also propose mobility-based services.
One other hand, there is also an increasing pressure from regulators to significantly reduce
the environmental footprint in transportation and mobility, down to zero in the foreseeable
future.
This dissertation investigates an architecture for communication and data exchange
within a complex and heterogeneous ecosystem. This communication takes place between
various third-party entities on one side, and between these entities and the infrastructure
on the other. The proposed solution reduces considerably the complexity of vehicle
communication and within the parties involved in the ODX life cycle. In such an
heterogeneous environment, a particular attention is paid to the protection of confidential
and private data. Confidential data here refers to the OEM’s know-how which is enclosed
in vehicle projects. The data delivered by a car during a vehicle communication session
might contain private data from customers. Our solution ensures that every entity of this
ecosystem has access only to data it has the right to. We designed our solution to be
non-technological-coupling so that it can be implemented in any platform to benefit from
the best environment suited for each task. We also proposed a data model for vehicle
projects, which improves query time during a vehicle diagnostic session. The scalability and
the backwards compatibility were also taken into account during the design phase of our
solution.
We proposed the necessary algorithms and the workflow to perform an efficient vehicle
diagnostic with considerably lower latency and substantially better complexity time and
space than current solutions. To prove the practicality of our design, we presented a
prototypical implementation of our design. Then, we analyzed the results of a series of tests
we performed on several vehicle models and projects. We also evaluated the prototype
against quality attributes in software engineering
- …