Search CORE

254 research outputs found

Advanced analytics through FPGA based query processing and deep reinforcement learning

Author: Malazgirt Gorker Alp
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2019
Field of study

Today, vast streams of structured and unstructured data have been incorporated in databases, and analytical processes are applied to discover patterns, correlations, trends and other useful relationships that help to take part in a broad range of decision-making processes. The amount of generated data has grown very large over the years, and conventional database processing methods from previous generations have not been sufficient to provide satisfactory results regarding analytics performance and prediction accuracy metrics. Thus, new methods are needed in a wide array of fields from computer architectures, storage systems, network design to statistics and physics. This thesis proposes two methods to address the current challenges and meet the future demands of advanced analytics. First, we present AxleDB, a Field Programmable Gate Array based query processing system which constitutes the frontend of an advanced analytics system. AxleDB melds highly-efficient accelerators with memory, storage and provides a unified programmable environment. AxleDB is capable of offloading complex Structured Query Language queries from host CPU. The experiments have shown that running a set of TPC-H queries, AxleDB can perform full queries between 1.8x and 34.2x faster and 2.8x to 62.1x more energy efficient compared to MonetDB, and PostgreSQL on a single workstation node. Second, we introduce TauRieL, a novel deep reinforcement learning (DRL) based method for combinatorial problems. The design idea behind combining DRL and combinatorial problems is to apply the prediction capabilities of deep reinforcement learning and to use the universality of combinatorial optimization problems to explore general purpose predictive methods. TauRieL utilizes an actor-critic inspired DRL architecture that adopts ordinary feedforward nets. Furthermore, TauRieL performs online training which unifies training and state space exploration. The experiments show that TauRieL can generate solutions two orders of magnitude faster and performs within 3% of accuracy compared to the state-of-the-art DRL on the Traveling Salesman Problem while searching for the shortest tour. Also, we present that TauRieL can be adapted to the Knapsack combinatorial problem. With a very minimal problem specific modification, TauRieL can outperform a Knapsack specific greedy heuristics.Hoy en día, se han incorporado grandes cantidades de datos estructurados y no estructurados en las bases de datos, y se les aplican procesos analíticos para descubrir patrones, correlaciones, tendencias y otras relaciones útiles que se utilizan mayormente para la toma de decisiones. La cantidad de datos generados ha crecido enormemente a lo largo de los años, y los métodos de procesamiento de bases de datos convencionales utilizados en las generaciones anteriores no son suficientes para proporcionar resultados satisfactorios respecto al rendimiento del análisis y respecto de la precisión de las predicciones. Por lo tanto, se necesitan nuevos métodos en una amplia gama de campos, desde arquitecturas de computadoras, sistemas de almacenamiento, diseño de redes hasta estadísticas y física. Esta tesis propone dos métodos para abordar los desafíos actuales y satisfacer las demandas futuras de análisis avanzado. Primero, presentamos AxleDB, un sistema de procesamiento de consultas basado en FPGAs (Field Programmable Gate Array) que constituye la interfaz de un sistema de análisis avanzado. AxleDB combina aceleradores altamente eficientes con memoria, almacenamiento y proporciona un entorno programable unificado. AxleDB es capaz de descargar consultas complejas de lenguaje de consulta estructurado desde la CPU del host. Los experimentos han demostrado que al ejecutar un conjunto de consultas TPC-H, AxleDB puede realizar consultas completas entre 1.8x y 34.2x más rápido y 2.8x a 62.1x más eficiente energéticamente que MonetDB, y PostgreSQL en un solo nodo de una estación de trabajo. En segundo lugar, presentamos TauRieL, un nuevo método basado en Deep Reinforcement Learning (DRL) para problemas combinatorios. La idea central que está detrás de la combinación de DRL y problemas combinatorios, es aplicar las capacidades de predicción del aprendizaje de refuerzo profundo y el uso de la universalidad de los problemas de optimización combinatoria para explorar métodos predictivos de propósito general. TauRieL utiliza una arquitectura DRL inspirada en el actor-crítico que se adapta a redes feedforward. Además, TauRieL realiza el entrenamieton en línea que unifica el entrenamiento y la exploración espacial de los estados. Los experimentos muestran que TauRieL puede generar soluciones dos órdenes de magnitud más rápido y funciona con un 3% de precisión en comparación con el estado del arte en DRL aplicado al problema del viajante mientras busca el recorrido más corto. Además, presentamos que TauRieL puede adaptarse al problema de la Mochila. Con una modificación específica muy mínima del problema, TauRieL puede superar a una heurística codiciosa de Knapsack Problem.Postprint (published version

1st Workshop on Model-driven Software Adaptation : M-ADAPT'07 at ECOOP 2007 (Proceedings)

Author
Publication venue
Publication date: 01/01/2007
Field of study

Growth of relational model: Interdependence and complementary to big data

Author: Prabhu Srikanth
Rao B. Dinesh
Shetty Sucharitha
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/04/2021
Field of study

A database management system is a constant application of science that provides a platform for the creation, movement, and use of voluminous data. The area has witnessed a series of developments and technological advancements from its conventional structured database to the recent buzzword, bigdata. This paper aims to provide a complete model of a relational database that is still being widely used because of its well known ACID properties namely, atomicity, consistency, integrity and durability. Specifically, the objective of this paper is to highlight the adoption of relational model approaches by bigdata techniques. Towards addressing the reason for this in corporation, this paper qualitatively studied the advancements done over a while on the relational data model. First, the variations in the data storage layout are illustrated based on the needs of the application. Second, quick data retrieval techniques like indexing, query processing and concurrency control methods are revealed. The paper provides vital insights to appraise the efficiency of the structured database in the unstructured environment, particularly when both consistency and scalability become an issue in the working of the hybrid transactional and analytical database management system

ZENODO

Institute of Advanced Engineering and Science

Advanced analytics through FPGA based query processing and deep reinforcement learning

Author: Malazgirt Gorker Alp
Publication venue: Universitat Politècnica de Catalunya
Publication date: 12/02/2019
Field of study

On systems architecting : a study in shop floor control to determine architecting concepts and principles

Author: Zwegers A.J.R.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1998
Field of study

Repository TU/e

Methodology and tools for realising product service systems for consumer products.

Author: Yang Xiaoyu
Publication venue: 'De Montfort University'
Publication date: 01/01/2006
Field of study

EThOS - Electronic Theses Online ServiceGBUnited Kingdo

De Montfort University Open Research Archive

Density-Aware Linear Algebra in a Column-Oriented In-Memory Database System

Author: Kernert David
Publication venue
Publication date: 20/09/2016
Field of study

Linear algebra operations appear in nearly every application in advanced analytics, machine learning, and of various science domains. Until today, many data analysts and scientists tend to use statistics software packages or hand-crafted solutions for their analysis. In the era of data deluge, however, the external statistics packages and custom analysis programs that often run on single-workstations are incapable to keep up with the vast increase in data volume and size. In particular, there is an increasing demand of scientists for large scale data manipulation, orchestration, and advanced data management capabilities. These are among the key features of a mature relational database management system (DBMS). With the rise of main memory database systems, it now has become feasible to also consider applications that built up on linear algebra. This thesis presents a deep integration of linear algebra functionality into an in-memory column-oriented database system. In particular, this work shows that it has become feasible to execute linear algebra queries on large data sets directly in a DBMS-integrated engine (LAPEG), without the need of transferring data and being restricted by hard disc latencies. From various application examples that are cited in this work, we deduce a number of requirements that are relevant for a database system that includes linear algebra functionality. Beside the deep integration of matrices and numerical algorithms, these include optimization of expressions, transparent matrix handling, scalability and data-parallelism, and data manipulation capabilities. These requirements are addressed by our linear algebra engine. In particular, the core contributions of this thesis are: firstly, we show that the columnar storage layer of an in-memory DBMS yields an easy adoption of efficient sparse matrix data types and algorithms. Furthermore, we show that the execution of linear algebra expressions significantly benefits from different techniques that are inspired from database technology. In a novel way, we implemented several of these optimization strategies in LAPEG’s optimizer (SpMachO), which uses an advanced density estimation method (SpProdest) to predict the matrix density of intermediate results. Moreover, we present an adaptive matrix data type AT Matrix to obviate the need of scientists for selecting appropriate matrix representations. The tiled substructure of AT Matrix is exploited by our matrix multiplication to saturate the different sockets of a multicore main-memory platform, reaching up to a speed-up of 6x compared to alternative approaches. Finally, a major part of this thesis is devoted to the topic of data manipulation; where we propose a matrix manipulation API and present different mutable matrix types to enable fast insertions and deletes. We finally conclude that our linear algebra engine is well-suited to process dynamic, large matrix workloads in an optimized way. In particular, the DBMS-integrated LAPEG is filling the linear algebra gap, and makes columnar in-memory DBMS attractive as efficient, scalable ad-hoc analysis platform for scientists

Technische Universität Dresden: Qucosa

Multi-Schema-Version Data Management

Author: Herrmann Kai
Publication venue: Aalborg Universitetsforlag
Publication date: 01/01/2017
Field of study

VBN

New Concepts for Virtual Testbeds : Data Mining Algorithms for Blackbox Optimization based on Wait-Free Concurrency and Generative Simulation

Author: Draheim Patrick
Publication venue
Publication date: 01/01/2018
Field of study

Virtual testbeds have emerged as a key technology for improving and streamlining complex engineering processes by delivering long-term simulation and assessment of complex designs in virtual environments. In contrast to existing simulation technology, virtual testbeds focus on long-term physically-based simulation of the overall design in its (virtual) environment instead of only focussing on isolated, specific parts for short periods of time. This technology has the major advantage that costly testing, prototyping, and assessment in real-life environments are replaced by a cost-efficient simulation in virtual worlds for comprehensive and long-term analysis of designs. For this purpose, engineering models and their requirements are abstracted into software simulation models and objectives which are executed in virtual assessments. Simulation models are used to predict complex, real systems which can be further a subject to random influences. These predictions are used to examine the effects of individual configuration alternatives without actually realizing them and causing possible negative effects on the real system. Virtual testbeds further offer engineers the opportunity to immersively and naturally interact with their simulation model in these virtual assessments. This enables a greater and comprehensive understanding of possible design flaws early-on in the design process for engineers because they can directly assess their design in the virtual environment, based on the simulation objectives. The fact that virtual testbeds enable these realtime interactive virtual assessments, makes their underlying software infrastructure very complex. One major challenge is to minimize the development time of virtual testbeds in order to efficiently integrate them into the overall engineering process. Usually, this can be achieved by minimizing the underlying concurrency of the testbed and by simplifying its software architecture. However, this may result in a degradation of their very concurrent and asynchronous behavior, which is usually required for immersive and natural virtual interaction. A major goal of virtual testbeds in the engineering process is to find a set of optimal configurations of the simulation model which maximizes all simulation objectives for the specified virtual assessments. Once such a set has been computed, engineers can interactively explore it in the virtual environment. The main challenge is that sophisticated simulation models and their configuration are subject to a multiobjective optimization problem, which usually can not be solved manually by engineers or simulation analysts in feasible time. This is further aggravated because the relationships between simulation model configurations and simulation objectives are mostly unknown, leading to what is known as blackbox simulations. In this thesis, I propose novel data mining algorithms for computing Pareto optimal simulation model configurations, based on an approximation of the feasible design space, for deterministic and stochastic blackbox simulations in virtual testbeds for achieving above stated goal. These novel data mining algorithms lead to an automatic knowledge discovery process that does not need any supervision for its data analysis and assessment for multiobjective optimization problems of simulation model configurations. This achieves the previously stated goal of computing optimal configurations of simulation models for long-term simulations and assessments. Furthermore, I propose two complementary solutions for efficiently integrating massively-parallel virtual testbeds into engineering processes. First, I propose a novel multiversion wait-free data and concurrency management based on hash maps. These wait-free hash maps do not require any standard locking mechanisms and enable low-latency data generation, management and distribution for massively-parallel applications. Second, I propose novel concepts for efficiently code generating above wait-free data and concurrency management for arbitrary massively-parallel simulation applications of virtual testbeds. My generative simulation concept combines a state-of-the-art realtime interactive system design pattern for high maintainability with template code generation based on domain specific modelling. This concept is able to generate massively-parallel simulations and, at the same time, model checks its internal dataflow for possible interface errors. These generative concept overcomes the challenge of efficiently integrating virtual testbeds into engineering processes. These contributions enable for the first time a powerful collaboration between simulation, optimization, visualization and data analysis for novel virtual testbed applications but also overcome and achieve the presented challenges and goals

Computer support for conceptual process design

Author: Marsh Elizabeth Caroline
Publication venue: The University of Edinburgh
Publication date: 01/01/1997
Field of study