784 research outputs found

    FPGA acceleration of sequence analysis tools in bioinformatics

    Full text link
    Thesis (Ph.D.)--Boston UniversityWith advances in biotechnology and computing power, biological data are being produced at an exceptional rate. The purpose of this study is to analyze the application of FPGAs to accelerate high impact production biosequence analysis tools. Compared with other alternatives, FPGAs offer huge compute power, lower power consumption, and reasonable flexibility. BLAST has become the de facto standard in bioinformatic approximate string matching and so its acceleration is of fundamental importance. It is a complex highly-optimized system, consisting of tens of thousands of lines of code and a large number of heuristics. Our idea is to emulate the main phases of its algorithm on FPGA. Utilizing our FPGA engine, we quickly reduce the size of the database to a small fraction, and then use the original code to process the query. Using a standard FPGA-based system, we achieved 12x speedup over a highly optimized multithread reference code. Multiple Sequence Alignment (MSA)--the extension of pairwise Sequence Alignment to multiple Sequences--is critical to solve many biological problems. Previous attempts to accelerate Clustal-W, the most commonly used MSA code, have directly mapped a portion of the code to the FPGA. We use a new approach: we apply prefiltering of the kind commonly used in BLAST to perform the initial all-pairs alignments. This results in a speedup of from 8Ox to 190x over the CPU code (8 cores). The quality is comparable to the original according to a commonly used benchmark suite evaluated with respect to multiple distance metrics. The challenge in FPGA-based acceleration is finding a suitable application mapping. Unfortunately many software heuristics do not fall into this category and so other methods must be applied. One is restructuring: an entirely new algorithm is applied. Another is to analyze application utilization and develop accuracy/performance tradeoffs. Using our prefiltering approach and novel FPGA programming models we have achieved significant speedup over reference programs. We have applied approximation, seeding, and filtering to this end. The bulk of this study is to introduce the pros and cons of these acceleration models for biosequence analysis tools

    A survey of real-time crowd rendering

    Get PDF
    In this survey we review, classify and compare existing approaches for real-time crowd rendering. We first overview character animation techniques, as they are highly tied to crowd rendering performance, and then we analyze the state of the art in crowd rendering. We discuss different representations for level-of-detail (LoD) rendering of animated characters, including polygon-based, point-based, and image-based techniques, and review different criteria for runtime LoD selection. Besides LoD approaches, we review classic acceleration schemes, such as frustum culling and occlusion culling, and describe how they can be adapted to handle crowds of animated characters. We also discuss specific acceleration techniques for crowd rendering, such as primitive pseudo-instancing, palette skinning, and dynamic key-pose caching, which benefit from current graphics hardware. We also address other factors affecting performance and realism of crowds such as lighting, shadowing, clothing and variability. Finally we provide an exhaustive comparison of the most relevant approaches in the field.Peer ReviewedPostprint (author's final draft

    SCALABLE TECHNIQUES FOR SCHEDULING AND MAPPING DSP APPLICATIONS ONTO EMBEDDED MULTIPROCESSOR PLATFORMS

    Get PDF
    A variety of multiprocessor architectures has proliferated even for off-the-shelf computing platforms. To make use of these platforms, traditional implementation frameworks focus on implementing Digital Signal Processing (DSP) applications using special platform features to achieve high performance. However, due to the fast evolution of the underlying architectures, solution redevelopment is error prone and re-usability of existing solutions and libraries is limited. In this thesis, we facilitate an efficient migration of DSP systems to multiprocessor platforms while systematically leveraging previous investment in optimized library kernels using dataflow design frameworks. We make these library elements, which are typically tailored to specialized architectures, more amenable to extensive analysis and optimization using an efficient and systematic process. In this thesis we provide techniques to allow such migration through four basic contributions: 1. We propose and develop a framework to explore efficient utilization of Single Instruction Multiple Data (SIMD) cores and accelerators available in heterogeneous multiprocessor platforms consisting of General Purpose Processors (GPPs) and Graphics Processing Units (GPUs). We also propose new scheduling techniques by applying extensive block processing in conjunction with appropriate task mapping and task ordering methods that match efficiently with the underlying architecture. The approach gives the developer the ability to prototype a GPU-accelerated application and explore its design space efficiently and effectively. 2. We introduce the concept of Partial Expansion Graphs (PEGs) as an implementation model and associated class of scheduling strategies. PEGs are designed to help realize DSP systems in terms of forms and granularities of parallelism that are well matched to the given applications and targeted platforms. PEGs also facilitate derivation of both static and dynamic scheduling techniques, depending on the amount of variability in task execution times and other operating conditions. We show how to implement efficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSP components within such implementations. 3. We develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. This allows accurate fitness functions to be measured and used to drive runtime adaptation of schedules. 4. In addition to providing scheduling techniques for the mentioned applications and platforms, we also show how to integrate the resulting solution in the underlying environment. This is achieved by leveraging existing libraries and applying the GPP-GPU scheduling framework to augment a popular existing Software Defined Radio (SDR) development environment -- GNU Radio -- with a dataflow foundation and a stand-alone GPU-accelerated library. We also show how to realize the PEG model on real time operating system libraries, such as the Texas Instruments DSP/BIOS. A code generator that accepts a manual system designer solution as well as automatically configured solutions is provided to complete the design flow starting from application model to running system

    Large scale geostatistics with locally varying anisotropy

    Get PDF
    Classical geostatistical methods are based on the hypothesis of stationarity, which allows to apply repetitive sampling in different locations of the spatial domain, in order to obtain enough information to infer cumulative distributions. In case of non stationarity, anisotropy is observed in the underlying physical phenomena. This feature manifest itself as preferential directions of continuity in the phenomena, i.e. properties are more continuous in one orientation than in another. In the case of local anisotropy, each location of the domain in study presents different preferential directions of continuity. The locally varying anisotropy (LVA) approach in geostatistics allows to incorporate a field of local anisotropy parameters defined for each domain point. With this additional input, more realistic spatial simulations can be generated, including geological features to the computational model such as folds, veins, faults, among others. Since the seminal article published by Boisvert and Deutsch (2011), to the best of the author's knowledge, no further analysis or public code improvements were developed. This is in part because acceleration and parallelization techniques must be applied to the inner kernels of the baseline LVA codes. Large execution time is needed to generate small-scale domain simulations, making large-scale domain simulations a prohibitive task. The contributions of this thesis are accelerating and parallelizing classical and LVA-based geostatistical simulation methods, particularly sequential simulation, which is one of the most common and computationally intensive methods in the field. This fact was recently remarked by some of the main authors in the field, Gómez-Hernández and Srivastava (2021), which shows the relevance of this work today. Two main parallel algorithms and an optimized version of a kd-tree search implementation are presented, all of them applied to both classical and LVA-based sequential simulation implementations. The first parallel algorithm is related to the parallel simulation of different domain points, after rearranging the order of simulation but preserving the exact results of a single-thread execution. The second parallel algorithm is related to the parallel search of neighbour points in the domain, which will be used to build data dependencies for the parallel simulation of points. The optimized kd-tree search was used in each test case in order to reduce the computational complexity of neighbour search tasks. Its modified implementation reduces the number of branching instructions and introduces specialized code sections to accelerate the execution. The main focus is on multi-core architectures using OpenMP and optimization techniques applied to Fortran and C++ codes. Additionally, acceleration and parallelization techniques were also applied to auxiliary applications, such as shortest path and variogram calculation on hybrid CPU/GPU architectures using Fortran, C++ and CUDA codes. In the last application, an analytical and heuristic model was developed to estimate the optimal workload distribution between CPU and GPU in the hybrid context. The overall results of this work are a set of applications that will allow researchers and practitioners to accelerate dramatically the execution of their experiments and simulations, being sgsim, sisim, sgs-lva and sisim-lva the accelerated codes presented. Final speedup results of 11x and 50x are obtained for non-LVA codes using 16 threads, and 56x and 1822x are obtained for LVA codes using 20 threads. These tools can be combined with other geostatistical tools, in order to improve the existing landscape of open source codes that can be used in practical scenarios.Los métodos geoestadísticos clásicos se basan en la hipótesis de la estacionariedad, que permite aplicar muestreos repetitivos en diferentes lugares del dominio espacial, con el fin de obtener información suficiente para inferir distribuciones acumuladas. En caso de no estacionariedad, se observa anisotropía en los fenómenos físicos subyacentes. Esta característica se manifiesta como direcciones preferenciales de continuidad en los fenómenos, es decir, las propiedades son más continuas en una orientación que en otra. En el caso de la anisotropía local, cada ubicación del dominio en estudio puede presentar diferentes direcciones preferenciales de continuidad. El enfoque de anisotropía localmente variable (LVA) en geoestadística permite incorporar un campo de parámetros de anisotropía locales definidos para cada punto de dominio. Con esta entrada adicional, se pueden generar simulaciones espaciales más realistas, incluyendo características geológicas al modelo computacional como pliegues, vetas, fallas, entre otras. Desde el artículo seminal publicado por Boisvert y Deutsch (2011), según el conocimiento del autor, no se han desarrollado más análisis ni mejoras en el código público. Esto se debe en parte a que se deben aplicar técnicas de aceleración y paralelización a los núcleos internos de los códigos LVA de referencia. Se necesita mucho tiempo de ejecución para generar simulaciones de dominio a pequeña escala, lo que hace que las simulaciones de dominio a gran escala sean una tarea prohibitiva. Las contribuciones de esta tesis consisten en acelerar y paralelizar métodos de simulación geoestadística clásicos y basados en LVA, particularmente la simulación secuencial, que es uno de los métodos más comunes e intensivos en computación en el campo. Este hecho fue señalado recientemente por algunos de los principales autores en el campo, Gómez-Hernández y Srivastava (2021), lo que demuestra la relevancia de este trabajo en la actualidad. Se presentan dos algoritmos paralelos principales y una versión optimizada de una implementación de búsqueda de árbol kd, todos ellos aplicados a implementaciones de simulación secuencial clásicas y basadas en LVA. El primer algoritmo paralelo está relacionado con la simulación paralela de diferentes puntos del dominio, después de reorganizar el orden de simulación pero conservando los resultados exactos de una ejecución de un solo hilo. El segundo algoritmo paralelo está relacionado con la búsqueda paralela de puntos vecinos en el dominio, que se utilizará para resolver dependencias de datos para la simulación paralela de puntos. La búsqueda optimizada de kd-tree se utilizó en cada caso de prueba para reducir la complejidad computacional de las tareas de búsqueda de vecinos. Su implementación modificada reduce el número de instrucciones branching e introduce código especializado para acelerar la ejecución. El foco principal está en arquitecturas multi-núcleo usando OpenMP y técnicas de optimización aplicadas a códigos Fortran y C++. Además, también se aplicaron técnicas de aceleración y paralelización a aplicaciones auxiliares, como el cálculo de la ruta más corta en un grafo y el cálculo de variogramas en arquitecturas híbridas CPU/GPU utilizando códigos Fortran, C++ y CUDA. En la última aplicación, se desarrolló un modelo analítico y heurístico para estimar la distribución óptima de la carga de trabajo entre CPU y GPU en el contexto híbrido. Los resultados generales de este trabajo son un conjunto de aplicaciones que permitirán a los investigadores y profesionales acelerar la ejecución de sus experimentos, siendo sgsim, sisim, sgs-lva y sisim-lva los códigos acelerados. Se obtienen resultados finales de aceleración de 11x y 50x para códigos que no son LVA usando 16 hilos, y se obtienen 56x y 1822x para códigos LVA usando 20 hilos. Estas herramientas se pueden combinar con otras herramientas geoestadícasPostprint (published version

    Real-time multi-domain optimization controller for multi-motor electric vehicles using automotive-suitable methods and heterogeneous embedded platforms

    Get PDF
    Los capítulos 2,3 y 7 están sujetos a confidencialidad por el autor. 145 p.In this Thesis, an elaborate control solution combining Machine Learning and Soft Computing techniques has been developed, targeting a chal lenging vehicle dynamics application aiming to optimize the torque distribution across the wheels with four independent electric motors.The technological context that has motivated this research brings together potential -and challenges- from multiple dom ains: new automotive powertrain topologies with increased degrees of freedom and controllability, which can be approached with innovative Machine Learning algorithm concepts, being implementable by exploiting the computational capacity of modern heterogeneous embedded platforms and automated toolchains. The complex relations among these three domains that enable the potential for great enhancements, do contrast with the fourth domain in this context: challenging constraints brought by industrial aspects and safe ty regulations. The innovative control architecture that has been conce ived combines Neural Networks as Virtual Sensor for unmeasurable forces , with a multi-objective optimization function driven by Fuzzy Logic , which defines priorities basing on the real -time driving situation. The fundamental principle is to enhance vehicle dynamics by implementing a Torque Vectoring controller that prevents wheel slip using the inputs provided by the Neural Network. Complementary optimization objectives are effici ency, thermal stress and smoothness. Safety -critical concerns are addressed through architectural and functional measures.Two main phases can be identified across the activities and milestones achieved in this work. In a first phase, a baseline Torque Vectoring controller was implemented on an embedded platform and -benefiting from a seamless transition using Hardware-in -the -Loop - it was integrated into a real Motor -in -Wheel vehicle for race track tests. Having validated the concept, framework, methodology and models, a second simulation-based phase proceeds to develop the more sophisticated controller, targeting a more capable vehicle, leading to the final solution of this work. Besides, this concept was further evolved to support a joint research work which lead to outstanding FPGA and GPU based embedded implementations of Neural Networks. Ultimately, the different building blocks that compose this work have shown results that have met or exceeded the expectations, both on technical and conceptual level. The highly non-linear multi-variable (and multi-objective) control problem was tackled. Neural Network estimations are accurate, performance metrics in general -and vehicle dynamics and efficiency in particular- are clearly improved, Fuzzy Logic and optimization behave as expected, and efficient embedded implementation is shown to be viable. Consequently, the proposed control concept -and the surrounding solutions and enablers- have proven their qualities in what respects to functionality, performance, implementability and industry suitability.The most relevant contributions to be highlighted are firstly each of the algorithms and functions that are implemented in the controller solutions and , ultimately, the whole control concept itself with the architectural approaches it involves. Besides multiple enablers which are exploitable for future work have been provided, as well as an illustrative insight into the intricacies of a vivid technological context, showcasing how they can be harmonized. Furthermore, multiple international activities in both academic and professional contexts -which have provided enrichment as well as acknowledgement, for this work-, have led to several publications, two high-impact journal papers and collateral work products of diverse nature

    Real-time multi-domain optimization controller for multi-motor electric vehicles using automotive-suitable methods and heterogeneous embedded platforms

    Get PDF
    Los capítulos 2,3 y 7 están sujetos a confidencialidad por el autor. 145 p.In this Thesis, an elaborate control solution combining Machine Learning and Soft Computing techniques has been developed, targeting a chal lenging vehicle dynamics application aiming to optimize the torque distribution across the wheels with four independent electric motors.The technological context that has motivated this research brings together potential -and challenges- from multiple dom ains: new automotive powertrain topologies with increased degrees of freedom and controllability, which can be approached with innovative Machine Learning algorithm concepts, being implementable by exploiting the computational capacity of modern heterogeneous embedded platforms and automated toolchains. The complex relations among these three domains that enable the potential for great enhancements, do contrast with the fourth domain in this context: challenging constraints brought by industrial aspects and safe ty regulations. The innovative control architecture that has been conce ived combines Neural Networks as Virtual Sensor for unmeasurable forces , with a multi-objective optimization function driven by Fuzzy Logic , which defines priorities basing on the real -time driving situation. The fundamental principle is to enhance vehicle dynamics by implementing a Torque Vectoring controller that prevents wheel slip using the inputs provided by the Neural Network. Complementary optimization objectives are effici ency, thermal stress and smoothness. Safety -critical concerns are addressed through architectural and functional measures.Two main phases can be identified across the activities and milestones achieved in this work. In a first phase, a baseline Torque Vectoring controller was implemented on an embedded platform and -benefiting from a seamless transition using Hardware-in -the -Loop - it was integrated into a real Motor -in -Wheel vehicle for race track tests. Having validated the concept, framework, methodology and models, a second simulation-based phase proceeds to develop the more sophisticated controller, targeting a more capable vehicle, leading to the final solution of this work. Besides, this concept was further evolved to support a joint research work which lead to outstanding FPGA and GPU based embedded implementations of Neural Networks. Ultimately, the different building blocks that compose this work have shown results that have met or exceeded the expectations, both on technical and conceptual level. The highly non-linear multi-variable (and multi-objective) control problem was tackled. Neural Network estimations are accurate, performance metrics in general -and vehicle dynamics and efficiency in particular- are clearly improved, Fuzzy Logic and optimization behave as expected, and efficient embedded implementation is shown to be viable. Consequently, the proposed control concept -and the surrounding solutions and enablers- have proven their qualities in what respects to functionality, performance, implementability and industry suitability.The most relevant contributions to be highlighted are firstly each of the algorithms and functions that are implemented in the controller solutions and , ultimately, the whole control concept itself with the architectural approaches it involves. Besides multiple enablers which are exploitable for future work have been provided, as well as an illustrative insight into the intricacies of a vivid technological context, showcasing how they can be harmonized. Furthermore, multiple international activities in both academic and professional contexts -which have provided enrichment as well as acknowledgement, for this work-, have led to several publications, two high-impact journal papers and collateral work products of diverse nature
    corecore