179 research outputs found

    Big Data meets High Performance Computing: Genomics and Natural Language Processing as case studies

    Get PDF
    The main objective of this thesis is to clarify a way to the convergence between the Big Data and the High Performance Computing world. In order to do this, a study of the application of this kind of technologies to two real world scientific problems is performed. These two problems are the sequence alignment in genomics and the natural language processing. These problems have a very big input and output size, and are computationally intensive, requiring a very high execution time. By facing these problems, also new tools that can be used by professionals in the areas are developed. Conclusions about convergence between these two worlds are presented, taking into account results from this study

    PiCo: A Domain-Specific Language for Data Analytics Pipelines

    Get PDF
    In the world of Big Data analytics, there is a series of tools aiming at simplifying programming applications to be executed on clusters. Although each tool claims to provide better programming, data and execution models—for which only informal (and often confusing) semantics is generally provided—all share a common under- lying model, namely, the Dataflow model. Using this model as a starting point, it is possible to categorize and analyze almost all aspects about Big Data analytics tools from a high level perspective. This analysis can be considered as a first step toward a formal model to be exploited in the design of a (new) framework for Big Data analytics. By putting clear separations between all levels of abstraction (i.e., from the runtime to the user API), it is easier for a programmer or software designer to avoid mixing low level with high level aspects, as we are often used to see in state-of-the-art Big Data analytics frameworks. From the user-level perspective, we think that a clearer and simple semantics is preferable, together with a strong separation of concerns. For this reason, we use the Dataflow model as a starting point to build a programming environment with a simplified programming model implemented as a Domain-Specific Language, that is on top of a stack of layers that build a prototypical framework for Big Data analytics. The contribution of this thesis is twofold: first, we show that the proposed model is (at least) as general as existing batch and streaming frameworks (e.g., Spark, Flink, Storm, Google Dataflow), thus making it easier to understand high-level data-processing applications written in such frameworks. As result of this analysis, we provide a layered model that can represent tools and applications following the Dataflow paradigm and we show how the analyzed tools fit in each level. Second, we propose a programming environment based on such layered model in the form of a Domain-Specific Language (DSL) for processing data collections, called PiCo (Pipeline Composition). The main entity of this programming model is the Pipeline, basically a DAG-composition of processing elements. This model is intended to give the user an unique interface for both stream and batch processing, hiding completely data management and focusing only on operations, which are represented by Pipeline stages. Our DSL will be built on top of the FastFlow library, exploiting both shared and distributed parallelism, and implemented in C++11/14 with the aim of porting C++ into the Big Data world

    Crowd simulation and visualization

    Get PDF
    Large-scale simulation and visualization are essential topics in areas as different as sociology, physics, urbanism, training, entertainment among others. This kind of systems requires a vast computational power and memory resources commonly available in High Performance Computing HPC platforms. Currently, the most potent clusters have heterogeneous architectures with hundreds of thousands and even millions of cores. The industry trends inferred that exascale clusters would have thousands of millions. The technical challenges for simulation and visualization process in the exascale era are intertwined with difficulties in other areas of research, including storage, communication, programming models and hardware. For this reason, it is necessary prototyping, testing, and deployment a variety of approaches to address the technical challenges identified and evaluate the advantages and disadvantages of each proposed solution. The focus of this research is interactive large-scale crowd simulation and visualization. To exploit to the maximum the capacity of the current HPC infrastructure and be prepared to take advantage of the next generation. The project develops a new approach to scale crowd simulation and visualization on heterogeneous computing cluster using a task-based technique. Its main characteristic is hardware agnostic. It abstracts the difficulties that imply the use of heterogeneous architectures like memory management, scheduling, communications, and synchronization — facilitating development, maintenance, and scalability. With the goal of flexibility and take advantage of computing resources as best as possible, the project explores different configurations to connect the simulation with the visualization engine. This kind of system has an essential use in emergencies. Therefore, urban scenes were implemented as realistic as possible; in this way, users will be ready to face real events. Path planning for large-scale crowds is a challenge to solve, due to the inherent dynamism in the scenes and vast search space. A new path-finding algorithm was developed. It has a hierarchical approach which offers different advantages: it divides the search space reducing the problem complexity, it can obtain a partial path instead of wait for the complete one, which allows a character to start moving and compute the rest asynchronously. It can reprocess only a part if necessary with different levels of abstraction. A case study is presented for a crowd simulation in urban scenarios. Geolocated data are used, they were produced by mobile devices to predict individual and crowd behavior and detect abnormal situations in the presence of specific events. It was also address the challenge of combining all these individual’s location with a 3D rendering of the urban environment. The data processing and simulation approach are computationally expensive and time-critical, it relies thus on a hybrid Cloud-HPC architecture to produce an efficient solution. Within the project, new models of behavior based on data analytics were developed. It was developed the infrastructure to be able to consult various data sources such as social networks, government agencies or transport companies such as Uber. Every time there is more geolocation data available and better computation resources which allow performing analysis of greater depth, this lays the foundations to improve the simulation models of current crowds. The use of simulations and their visualization allows to observe and organize the crowds in real time. The analysis before, during and after daily mass events can reduce the risks and associated logistics costs.La simulación y visualización a gran escala son temas esenciales en áreas tan diferentes como la sociología, la física, el urbanismo, la capacitación, el entretenimiento, entre otros. Este tipo de sistemas requiere una gran capacidad de cómputo y recursos de memoria comúnmente disponibles en las plataformas de computo de alto rendimiento. Actualmente, los equipos más potentes tienen arquitecturas heterogéneas con cientos de miles e incluso millones de núcleos. Las tendencias de la industria infieren que los equipos en la era exascale tendran miles de millones. Los desafíos técnicos en el proceso de simulación y visualización en la era exascale se entrelazan con dificultades en otras áreas de investigación, incluidos almacenamiento, comunicación, modelos de programación y hardware. Por esta razón, es necesario crear prototipos, probar y desplegar una variedad de enfoques para abordar los desafíos técnicos identificados y evaluar las ventajas y desventajas de cada solución propuesta. El foco de esta investigación es la visualización y simulación interactiva de multitudes a gran escala. Aprovechar al máximo la capacidad de la infraestructura actual y estar preparado para aprovechar la próxima generación. El proyecto desarrolla un nuevo enfoque para escalar la simulación y visualización de multitudes en un clúster de computo heterogéneo utilizando una técnica basada en tareas. Su principal característica es que es hardware agnóstico. Abstrae las dificultades que implican el uso de arquitecturas heterogéneas como la administración de memoria, las comunicaciones y la sincronización, lo que facilita el desarrollo, el mantenimiento y la escalabilidad. Con el objetivo de flexibilizar y aprovechar los recursos informáticos lo mejor posible, el proyecto explora diferentes configuraciones para conectar la simulación con el motor de visualización. Este tipo de sistemas tienen un uso esencial en emergencias. Por lo tanto, se implementaron escenas urbanas lo más realistas posible, de esta manera los usuarios estarán listos para enfrentar eventos reales. La planificación de caminos para multitudes a gran escala es un desafío a resolver, debido al dinamismo inherente en las escenas y el vasto espacio de búsqueda. Se desarrolló un nuevo algoritmo de búsqueda de caminos. Tiene un enfoque jerárquico que ofrece diferentes ventajas: divide el espacio de búsqueda reduciendo la complejidad del problema, puede obtener una ruta parcial en lugar de esperar a la completa, lo que permite que un personaje comience a moverse y calcule el resto de forma asíncrona, puede reprocesar solo una parte si es necesario con diferentes niveles de abstracción. Se presenta un caso de estudio para una simulación de multitud en escenarios urbanos. Se utilizan datos geolocalizados producidos por dispositivos móviles para predecir el comportamiento individual y público y detectar situaciones anormales en presencia de eventos específicos. También se aborda el desafío de combinar la ubicación de todos estos individuos con una representación 3D del entorno urbano. Dentro del proyecto, se desarrollaron nuevos modelos de comportamiento basados ¿¿en el análisis de datos. Se creo la infraestructura para poder consultar varias fuentes de datos como redes sociales, agencias gubernamentales o empresas de transporte como Uber. Cada vez hay más datos de geolocalización disponibles y mejores recursos de cómputo que permiten realizar un análisis de mayor profundidad, esto sienta las bases para mejorar los modelos de simulación de las multitudes actuales. El uso de simulaciones y su visualización permite observar y organizar las multitudes en tiempo real. El análisis antes, durante y después de eventos multitudinarios diarios puede reducir los riesgos y los costos logísticos asociadosPostprint (published version

    QoE on media deliveriy in 5G environments

    Get PDF
    231 p.5G expandirá las redes móviles con un mayor ancho de banda, menor latencia y la capacidad de proveer conectividad de forma masiva y sin fallos. Los usuarios de servicios multimedia esperan una experiencia de reproducción multimedia fluida que se adapte de forma dinámica a los intereses del usuario y a su contexto de movilidad. Sin embargo, la red, adoptando una posición neutral, no ayuda a fortalecer los parámetros que inciden en la calidad de experiencia. En consecuencia, las soluciones diseñadas para realizar un envío de tráfico multimedia de forma dinámica y eficiente cobran un especial interés. Para mejorar la calidad de la experiencia de servicios multimedia en entornos 5G la investigación llevada a cabo en esta tesis ha diseñado un sistema múltiple, basado en cuatro contribuciones.El primer mecanismo, SaW, crea una granja elástica de recursos de computación que ejecutan tareas de análisis multimedia. Los resultados confirman la competitividad de este enfoque respecto a granjas de servidores. El segundo mecanismo, LAMB-DASH, elige la calidad en el reproductor multimedia con un diseño que requiere una baja complejidad de procesamiento. Las pruebas concluyen su habilidad para mejorar la estabilidad, consistencia y uniformidad de la calidad de experiencia entre los clientes que comparten una celda de red. El tercer mecanismo, MEC4FAIR, explota las capacidades 5G de analizar métricas del envío de los diferentes flujos. Los resultados muestran cómo habilita al servicio a coordinar a los diferentes clientes en la celda para mejorar la calidad del servicio. El cuarto mecanismo, CogNet, sirve para provisionar recursos de red y configurar una topología capaz de conmutar una demanda estimada y garantizar unas cotas de calidad del servicio. En este caso, los resultados arrojan una mayor precisión cuando la demanda de un servicio es mayor

    Constructive Synthesis of Memory-Intensive Accelerators for FPGA From Nested Loop Kernels

    Get PDF

    Database System Acceleration on FPGAs

    Get PDF
    Relational database systems provide various services and applications with an efficient means for storing, processing, and retrieving their data. The performance of these systems has a direct impact on the quality of service of the applications that rely on them. Therefore, it is crucial that database systems are able to adapt and grow in tandem with the demands of these applications, ensuring that their performance scales accordingly. In the past, Moore's law and algorithmic advancements have been sufficient to meet these demands. However, with the slowdown of Moore's law, researchers have begun exploring alternative methods, such as application-specific technologies, to satisfy the more challenging performance requirements. One such technology is field-programmable gate arrays (FPGAs), which provide ideal platforms for developing and running custom architectures for accelerating database systems. The goal of this thesis is to develop a domain-specific architecture that can enhance the performance of in-memory database systems when executing analytical queries. Our research is guided by a combination of academic and industrial requirements that seek to strike a balance between generality and performance. The former ensures that our platform can be used to process a diverse range of workloads, while the latter makes it an attractive solution for high-performance use cases. Throughout this thesis, we present the development of a system-on-chip for database system acceleration that meets our requirements. The resulting architecture, called CbMSMK, is capable of processing the projection, sort, aggregation, and equi-join database operators and can also run some complex TPC-H queries. CbMSMK employs a shared sort-merge pipeline for executing all these operators, which results in an efficient use of FPGA resources. This approach enables the instantiation of multiple acceleration cores on the FPGA, allowing it to serve multiple clients simultaneously. CbMSMK can process both arbitrarily deep and wide tables efficiently. The former is achieved through the use of the sort-merge algorithm which utilizes the FPGA RAM for buffering intermediate sort results. The latter is achieved through the use of KeRRaS, a novel variant of the forward radix sort algorithm introduced in this thesis. KeRRaS allows CbMSMK to process a table a few columns at a time, incrementally generating the final result through multiple iterations. Given that acceleration is a key objective of our work, CbMSMK benefits from many performance optimizations. For instance, multi-way merging is employed to reduce the number of merge passes required for the execution of the sort-merge algorithm, thus improving the performance of all our pipeline-breaking operators. Another example is our in-depth analysis of early aggregation, which led to the development of a novel cache-based algorithm that significantly enhances aggregation performance. Our experiments demonstrate that CbMSMK performs on average 5 times faster than the state-of-the-art CPU-based database management system MonetDB.:I Database Systems & FPGAs 1 INTRODUCTION 1.1 Databases & the Importance of Performance 1.2 Accelerators & FPGAs 1.3 Requirements 1.4 Outline & Summary of Contributions 2 BACKGROUND ON DATABASE SYSTEMS 2.1 Databases 2.1.1 Storage Model 2.1.2 Storage Medium 2.2 Database Operators 2.2.1 Projection 2.2.2 Filter 2.2.3 Sort 2.2.4 Aggregation 2.2.5 Join 2.2.6 Operator Classification 2.3 Database Queries 2.4 Impact of Acceleration 3 BACKGROUND ON FPGAS 3.1 FPGA 3.1.1 Logic Element 3.1.2 Block RAM (BRAM) 3.1.3 Digital Signal Processor (DSP) 3.1.4 IO Element 3.1.5 Programmable Interconnect 3.2 FPGADesignFlow 3.2.1 Specifications 3.2.2 RTL Description 3.2.3 Verification 3.2.4 Synthesis, Mapping, Placement, and Routing 3.2.5 TimingAnalysis 3.2.6 Bitstream Generation and FPGA Programming 3.3 Implementation Quality Metrics 3.4 FPGA Cards 3.5 Benefits of Using FPGAs 3.6 Challenges of Using FPGAs 4 RELATED WORK 4.1 Summary of Related Work 4.2 Platform Type 4.2.1 Accelerator Card 4.2.2 Coprocessor 4.2.3 Smart Storage 4.2.4 Network Processor 4.3 Implementation 4.3.1 Loop-based implementation 4.3.2 Sort-based Implementation 4.3.3 Hash-based Implementation 4.3.4 Mixed Implementation 4.4 A Note on Quantitative Performance Comparisons II Cache-Based Morphing Sort-Merge with KeRRaS (CbMSMK) 5 OBJECTIVES AND ARCHITECTURE OVERVIEW 5.1 From Requirements to Objectives 5.2 Architecture Overview 5.3 Outlineof Part II 6 COMPARATIVE ANALYSIS OF OPENCL AND RTL FOR SORT-MERGE PRIMITIVES ON FPGAS 6.1 Programming FPGAs 6.2 RelatedWork 6.3 Architecture 6.3.1 Global Architecture 6.3.2 Sorter Architecture 6.3.3 Merger Architecture 6.3.4 Scalability and Resource Adaptability 6.4 Experiments 6.4.1 OpenCL Sort-Merge Implementation 6.4.2 RTLSorters 6.4.3 RTLMergers 6.4.4 Hybrid OpenCL-RTL Sort-Merge Implementation 6.5 Summary & Discussion 7 RESOURCE-EFFICIENT ACCELERATION OF PIPELINE-BREAKING DATABASE OPERATORS ON FPGAS 7.1 The Case for Resource Efficiency 7.2 Related Work 7.3 Architecture 7.3.1 Sorters 7.3.2 Sort-Network 7.3.3 X:Y Mergers 7.3.4 Merge-Network 7.3.5 Join Materialiser (JoinMat) 7.4 Experiments 7.4.1 Experimental Setup 7.4.2 Implementation Description & Tuning 7.4.3 Sort Benchmarks 7.4.4 Aggregation Benchmarks 7.4.5 Join Benchmarks 7. Summary 8 KERRAS: COLUMN-ORIENTED WIDE TABLE PROCESSING ON FPGAS 8.1 The Scope of Database System Accelerators 8.2 Related Work 8.3 Key-Reduce Radix Sort(KeRRaS) 8.3.1 Time Complexity 8.3.2 Space Complexity (Memory Utilization) 8.3.3 Discussion and Optimizations 8.4 Architecture 8.4.1 MSM 8.4.2 MSMK: Extending MSM with KeRRaS 8.4.3 Payload, Aggregation and Join Processing 8.4.4 Limitations 8.5 Experiments 8.5.1 Experimental Setup 8.5.2 Datasets 8.5.3 MSMK vs. MSM 8.5.4 Payload-Less Benchmarks 8.5.5 Payload-Based Benchmarks 8.5.6 Flexibility 8.6 Summary 9 A STUDY OF EARLY AGGREGATION IN DATABASE QUERY PROCESSING ON FPGAS 9.1 Early Aggregation 9.2 Background & Related Work 9.2.1 Sort-Based Early Aggregation 9.2.2 Cache-Based Early Aggregation 9.3 Simulations 9.3.1 Datasets 9.3.2 Metrics 9.3.3 Sort-Based Versus Cache-Based Early Aggregation 9.3.4 Comparison of Set-Associative Caches 9.3.5 Comparison of Cache Structures 9.3.6 Comparison of Replacement Policies 9.3.7 Cache Selection Methodology 9.4 Cache System Architecture 9.4.1 Window Aggregator 9.4.2 Compressor & Hasher 9.4.3 Collision Detector 9.4.4 Collision Resolver 9.4.5 Cache 9.5 Experiments 9.5.1 Experimental Setup 9.5.2 Resource Utilization and Parameter Tuning 9.5.3 Datasets 9.5.4 Benchmarks on Synthetic Data 9.5.5 Benchmarks on Real Data 9.6 Summary 10 THE FULL PICTURE 10.1 System Architecture 10.2 Benchmarks 10.3 Meeting the Objectives III Conclusion 11 SUMMARY AND OUTLOOK ON FUTURE RESEARCH 11.1 Summary 11.2 Future Work BIBLIOGRAPHY LIST OF FIGURES LIST OF TABLE

    Accelerating Event Stream Processing in On- and Offline Systems

    Get PDF
    Due to a growing number of data producers and their ever-increasing data volume, the ability to ingest, analyze, and store potentially never-ending streams of data is a mission-critical task in today's data processing landscape. A widespread form of data streams are event streams, which consist of continuously arriving notifications about some real-world phenomena. For example, a temperature sensor naturally generates an event stream by periodically measuring the temperature and reporting it with measurement time in case of a substantial change to the previous measurement. In this thesis, we consider two kinds of event stream processing: online and offline. Online refers to processing events solely in main memory as soon as they arrive, while offline means processing event data previously persisted to non-volatile storage. Both modes are supported by widely used scale-out general-purpose stream processing engines (SPEs) like Apache Flink or Spark Streaming. However, such engines suffer from two significant deficiencies that severely limit their processing performance. First, for offline processing, they load the entire stream from non-volatile secondary storage and replay all data items into the associated online engine in order of their original arrival. While this naturally ensures unified query semantics for on- and offline processing, the costs for reading the entire stream from non-volatile storage quickly dominate the overall processing costs. Second, modern SPEs focus on scaling out computations across the nodes of a cluster, but use only a fraction of the available resources of individual nodes. This thesis tackles those problems with three different approaches. First, we present novel techniques for the offline processing of two important query types (windowed aggregation and sequential pattern matching). Our methods utilize well-understood indexing techniques to reduce the total amount of data to read from non-volatile storage. We show that this improves the overall query runtime significantly. In particular, this thesis develops the first index-based algorithms for pattern queries expressed with the Match_Recognize clause, a new and powerful language feature of SQL that has received little attention so far. Second, we show how to maximize resource utilization of single nodes by exploiting the capabilities of modern hardware. Therefore, we develop a prototypical shared-memory CPU-GPU-enabled event processing system. The system provides implementations of all major event processing operators (filtering, windowed aggregation, windowed join, and sequential pattern matching). Our experiments reveal that regarding resource utilization and processing throughput, such a hardware-enabled system is superior to hardware-agnostic general-purpose engines. Finally, we present TPStream, a new operator for pattern matching over temporal intervals. TPStream achieves low processing latency and, in contrast to sequential pattern matching, is easily parallelizable even for unpartitioned input streams. This results in maximized resource utilization, especially for modern CPUs with multiple cores
    corecore