3 research outputs found

    Just-in-time Analytics Over Heterogeneous Data and Hardware

    Get PDF
    Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

    A software architecture for electro-mobility services: a milestone for sustainable remote vehicle capabilities

    Get PDF
    To face the tough competition, changing markets and technologies in automotive industry, automakers have to be highly innovative. In the previous decades, innovations were electronics and IT-driven, which increased exponentially the complexity of vehicle’s internal network. Furthermore, the growing expectations and preferences of customers oblige these manufacturers to adapt their business models and to also propose mobility-based services. One other hand, there is also an increasing pressure from regulators to significantly reduce the environmental footprint in transportation and mobility, down to zero in the foreseeable future. This dissertation investigates an architecture for communication and data exchange within a complex and heterogeneous ecosystem. This communication takes place between various third-party entities on one side, and between these entities and the infrastructure on the other. The proposed solution reduces considerably the complexity of vehicle communication and within the parties involved in the ODX life cycle. In such an heterogeneous environment, a particular attention is paid to the protection of confidential and private data. Confidential data here refers to the OEM’s know-how which is enclosed in vehicle projects. The data delivered by a car during a vehicle communication session might contain private data from customers. Our solution ensures that every entity of this ecosystem has access only to data it has the right to. We designed our solution to be non-technological-coupling so that it can be implemented in any platform to benefit from the best environment suited for each task. We also proposed a data model for vehicle projects, which improves query time during a vehicle diagnostic session. The scalability and the backwards compatibility were also taken into account during the design phase of our solution. We proposed the necessary algorithms and the workflow to perform an efficient vehicle diagnostic with considerably lower latency and substantially better complexity time and space than current solutions. To prove the practicality of our design, we presented a prototypical implementation of our design. Then, we analyzed the results of a series of tests we performed on several vehicle models and projects. We also evaluated the prototype against quality attributes in software engineering

    HTAP with reactive streaming ETL

    Get PDF
    In database management systems (DBMSs), query workloads can be classified as online transactional processing (OLTP) or online analytical processing (OLAP). These often run within separate DBMSs. In hybrid transactional and analytical processing (HTAP), both workloads may execute within the same DBMS. This article shows that it is possible to run separate OLTP and OLAP DBMSs and still support timely business decisions from analytical queries running off fresh transactional data. Several setups to manage OLTP and OLAP workloads are analysed. Then, benchmarks on two industry standard DBMSs empirically show that, under an OLTP workload, a row-store DBMS sustains a 1000 times higher throughput than a columnar DBMS, whilst OLAP queries are more than four times faster on a columnar DBMS. Finally, a reactive streaming ETL pipeline is implemented which connects these two DBMSs. Separate benchmarks show that OLTP events can be streamed to an OLAP database within a few seconds.peer-reviewe
    corecore