101 research outputs found

    Analysis and performance of a UPC implementation of a parallel longest common subsequence algorithm

    Get PDF
    An important problem in computational biology is finding the longest common subsequence (LCS) of two nucleotide sequences. This paper examines the correctness and performance of a recently proposed parallel LCS algorithm that uses successor tables and pruning rules to construct a list of sets from which an LCS can be easily reconstructed. Counterexamples are given for two pruning rules that were given with the original algorithm. Because of these errors, performance measurements originally reported cannot be validated. The work presented here shows that speedup can be reliably achieved by an implementation in Unified Parallel C that runs on an Infiniband cluster. This performance is partly facilitated by exploiting the software cache of the MuPC runtime system. In addition, this implementation achieved speedup without bulk memory copy operations and the associated programming complexity of message passing

    Accelerating edit-distance sequence alignment on GPU using the wavefront algorithm

    Get PDF
    Sequence alignment remains a fundamental problem with practical applications ranging from pattern recognition to computational biology. Traditional algorithms based on dynamic programming are hard to parallelize, require significant amounts of memory, and fail to scale for large inputs. This work presents eWFA-GPU, a GPU (graphics processing unit)-accelerated tool to compute the exact edit-distance sequence alignment based on the wavefront alignment algorithm (WFA). This approach exploits the similarities between the input sequences to accelerate the alignment process while requiring less memory than other algorithms. Our implementation takes full advantage of the massive parallel capabilities of modern GPUs to accelerate the alignment process. In addition, we propose a succinct representation of the alignment data that successfully reduces the overall amount of memory required, allowing the exploitation of the fast shared memory of a GPU. Our results show that our GPU implementation outperforms by 3- 9× the baseline edit-distance WFA implementation running on a 20 core machine. As a result, eWFA-GPU is up to 265 times faster than state-of-the-art CPU implementation, and up to 56 times faster than state-of-the-art GPU implementations.This work was supported in part by the European Unions’s Horizon 2020 Framework Program through the DeepHealth Project under Grant 825111; in part by the European Union Regional Development Fund within the Framework of the European Regional Development Fund (ERDF) Operational Program of Catalonia 2014–2020 with a Grant of 50% of Total Cost Eligible through the Designing RISC-V-based Accelerators for next-generation Computers Project under Grant 001-P-001723; in part by the Ministerio de Ciencia e Innovacion (MCIN) Agencia Estatal de Investigación (AEI)/10.13039/501100011033 under Contract PID2020-113614RB-C21 and Contract TIN2015-65316-P; and in part by the Generalitat de Catalunya (GenCat)-Departament de Recerca i Universitats (DIUiE) (GRR) under Contract 2017-SGR-313, Contract 2017-SGR-1328, and Contract 2017-SGR-1414. The work of Miquel Moreto was supported in part by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal Fellowship under Grant RYC-2016-21104.Peer ReviewedPostprint (published version

    Constant-time sliding window framework with reduced memory footprint and efficient bulk evictions

    Get PDF
    The fast evolution of data analytics platforms has resulted in an increasing demand for real-time data stream processing. From Internet of Things applications to the monitoring of telemetry generated in large data centers, a common demand for currently emerging scenarios is the need to process vast amounts of data with low latencies, generally performing the analysis process as close to the data source as possible. Stream processing platforms are required to be malleable and absorb spikes generated by fluctuations of data generation rates. Data is usually produced as time series that have to be aggregated using multiple operators, being sliding windows one of the most common abstractions used to process data in real-time. To satisfy the above-mentioned demands, efficient stream processing techniques that aggregate data with minimal computational cost need to be developed. In this paper we present the Monoid Tree Aggregator general sliding window aggregation framework, which seamlessly combines the following features: amortized O(1) time complexity and a worst-case of O(log n) between insertions; it provides both a window aggregation mechanism and a window slide policy that are user programmable; the enforcement of the window sliding policy exhibits amortized O(1) computational cost for single evictions and supports bulk evictions with cost O(log n) ; and it requires a local memory space of O(log n) . The framework can compute aggregations over multiple data dimensions, and has been designed to support decoupling computation and data storage through the use of distributed Key-Value Stores to keep window elements and partial aggregations.This project is partially supported by the European Research Council (ERC), Spain under the European Unions Horizon 2020 research and innovation programme (grant agreement No 639595). It is also partially supported by the Ministry of Economy of Spain under contract TIN2015- 65316-P and Generalitat de Catalunya, Spain under contract 2014SGR1051, by the ICREA Academia program, and by the BSC-CNS Severo Ochoa program (SEV-2015-0493).Peer ReviewedPostprint (published version

    A Theory of Partitioned Global Address Spaces

    Get PDF
    Partitioned global address space (PGAS) is a parallel programming model for the development of applications on clusters. It provides a global address space partitioned among the cluster nodes, and is supported in programming languages like C, C++, and Fortran by means of APIs. In this paper we provide a formal model for the semantics of single instruction, multiple data programs using PGAS APIs. Our model reflects the main features of popular real-world APIs such as SHMEM, ARMCI, GASNet, GPI, and GASPI. A key feature of PGAS is the support for one-sided communication: a node may directly read and write the memory located at a remote node, without explicit synchronization with the processes running on the remote side. One-sided communication increases performance by decoupling process synchronization from data transfer, but requires the programmer to reason about appropriate synchronizations between reads and writes. As a second contribution, we propose and investigate robustness, a criterion for correct synchronization of PGAS programs. Robustness corresponds to acyclicity of a suitable happens-before relation defined on PGAS computations. The requirement is finer than the classical data race freedom and rules out most false error reports. Our main result is an algorithm for checking robustness of PGAS programs. The algorithm makes use of two insights. Using combinatorial arguments we first show that, if a PGAS program is not robust, then there are computations in a certain normal form that violate happens-before acyclicity. Intuitively, normal-form computations delay remote accesses in an ordered way. We then devise an algorithm that checks for cyclic normal-form computations. Essentially, the algorithm is an emptiness check for a novel automaton model that accepts normal-form computations in streaming fashion. Altogether, we prove the robustness problem is PSpace-complete

    Accelerating pairwise sequence alignment on GPUs using the Wavefront Algorithm

    Get PDF
    Advances in genomics and sequencing technologies demand faster and more scalable analysis methods that can process longer sequences with higher accuracy. However, classical pairwise alignment methods, based on dynamic programming (DP), impose impractical computational requirements to align long and noisy sequences like those produced by PacBio, and Nanopore technologies. The recently proposed Wavefront Alignment (WFA) algorithm paves the way for more efficient alignment tools, improving time and memory complexity over previous methods. Notwithstanding the advantages of the WFA algorithm, modern high performance computing (HPC) platforms rely on accelerator-based architectures that exploit parallel computing resources to improve over classical computing CPUs. Hence, a GPU-enabled implementation of the WFA could exploit the hardware resources of modern GPUs and further accelerate sequence alignment in current genome analysis pipelines. This thesis presents two GPU-accelerated implementations based on the WFA for fast pairwise DNA sequence alignment: eWFA-GPU and WFA-GPU. Our first proposal, eWFA-GPU, computes the exact edit-distance alignment between two short sequences (up to a few thousand bases), taking full advantage of the massive parallel capabilities of modern GPUs. We propose a succinct representation of the alignment data that successfully reduces the overall amount of memory required, allowing the exploitation of the fast on-chip memory of a GPU. Our results show that eWFA-GPU outperforms by 3-9X the edit-distance WFA implementation running on a 20 core machine. Compared to other state-of-the-art tools computing the edit-distance, eWFA-GPU is up to 265X faster than CPU tools and up to 56 times faster than other GPU-enabled implementations. Our second contribution, the WFA-GPU tool, extends the work of eWFA-GPU to compute the exact gap-affine distance (i.e., a more general alignment problem) between arbitrary long sequences. In this work, we propose a CPU-GPU co-design capable of performing inter and intra-sequence parallel alignment of multiple sequences, combining a succinct WFA-data representation with an efficient GPU implementation. As a result, we demonstrate that our implementation outperforms the original WFA implementation between 1.5-7.7X times when computing the alignment path, and between 2.6-16X when computing only the alignment score. Moreover, compared to other state-of-the-art tools, the WFA-GPU is up to 26.7X faster than other GPU implementations and up to four orders of magnitude faster than other CPU implementations

    Automatic learning of 3D pose variability in walking performances for gait analysis

    Get PDF
    This paper proposes an action specific model which automatically learns the variability of 3D human postures observed in a set of training sequences. First, a Dynamic Programing synchronization algorithm is presented in order to establish a mapping between postures from different walking cycles, so the whole training set can be synchronized to a common time pattern. Then, the model is trained using the public CMU motion capture dataset for the walking action, and a mean walking performance is automatically learnt. Additionally statistics about the observed variability of the postures and motion direction are also computed at each time step. As a result, in this work we have extended a similar action model successfully used for tracking, by providing facilities for gait analysis and gait recognition applications.Peer ReviewedPreprin

    FPGA Acceleration of Pre-Alignment Filters for Short Read Mapping With HLS

    Get PDF
    Pre-alignment filters are useful for reducing the computational requirements of genomic sequence mappers. Most of them are based on estimating or computing the edit distance between sequences and their candidate locations in a reference genome using a subset of the dynamic programming table used to compute Levenshtein distance. Some of their FPGA implementations of use classic HDL toolchains, thus limiting their portability. Currently, most FPGA accelerators offered by heterogeneous cloud providers support C/C++ HLS. In this work, we implement and optimize several state-of-the-art pre-alignment filters using C/C++ based-HLS to expand their portability to a wide range of systems supporting the OpenCL runtime. Moreover, we perform a complete analysis of the performance and accuracy of the filters and analyze the implications of the results. The maximum throughput obtained by an exact filter is 95.1 MPairs/s including memory transfers using 100 bp sequences, which is the highest ever reported for a comparable system and more than two times faster than previous HDL-based results. The best energy efficiency obtained from the accelerator (not considering host CPU) is 2.1 MPairs/J, more than one order of magnitude higher than other accelerator-based comparable approaches from the state of the art.10.13039/501100008530-European Union Regional Development Fund (ERDF) within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of the total cost eligible under the Designing RISC-V based Accelerators for next generation computers project (DRAC) (Grant Number: [001-P-001723]) 10.13039/501100002809-Catalan Government (Grant Number: 2017-SGR-313 and 2017-SGR-1624) 10.13039/501100004837-Spanish Ministry of Science, Innovation and Universities (Grant Number: PID2020-113614RB-C21 and RTI2018-095209-B-C22)Peer ReviewedPostprint (published version

    An FPGA accelerator of the wavefront algorithm for genomics pairwise alignment

    Get PDF
    In the last years, advances in next-generation sequencing technologies have enabled the proliferation of genomic applications that guide personalized medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. The first step in many of these applications consists in aligning reads against a reference genome. Very recently, the wavefront alignment algorithm has been introduced, significantly reducing the execution time of the read alignment process. This paper presents the first FPGA- based hardware/software co-designed accelerator of such relevant algorithm. Compared to the reference WFA CPU-only implementation, the proposed FPGA accelerator achieves performance speedups of up to 13.5× while consuming up to 14.6× less energy.ed medicine. These applications have an enormous computational cost due to the large amount of genomic data they process. The first step in many of these applications consists in aligning reads against a reference genome. Very recently, the wavefront alignment algorithm has been introduced, significantly reducing the execution time of the read alignment process. This paper presents the first FPGA- based hardware/software co-designed accelerator of such relevant algorithm. Compared to the reference WFA CPU-only imple- mentation, the proposed FPGA accelerator achieves performance speedups of up to 13.5× while consuming up to 14.6× less energy.This work has been supported by the European HiPEAC Network of Excellence, by the Spanish Ministry of Science and Innovation (contract PID2019-107255GB-C21/AEI/10.13039/501100011033), by the Generalitat de Catalunya (contracts 2017-SGR-1414 and 2017-SGR-1328), by the IBM/BSC Deep Learning Center initiative, and by the DRAC project, which is co-financed by the European Union Regional Development Fund within the framework of the ERDF Operational Program of Catalonia 2014-2020 with a grant of 50% of total eligible cost. Ll. Alvarez has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under the Juan de la Cierva Formacion fellowship No. FJCI-2016-30984. M. Moreto has been partially supported by the Spanish Ministry of Economy, Industry and Competitiveness under Ramon y Cajal fellowship No. RYC-2016-21104.Peer ReviewedPostprint (author's final draft

    Time series forecasting using SARIMA and SANN models

    Get PDF
    Information and communications technology has evolved to the point of being present in most things in our daily lives. Even the simplest object that everyone has in their home is getting smarter, like toothbrushes, cars, phones and so on. All that devices are connected to the Internet to make our life easier. The question is, how is all that amount of data processed? Here is when Artificial Intelligence appears. AI is the part of ICT dedicated to the development of algorithms that allows a machine to make intelligent decisions or, at least, behave as if it has a human-like intelligence. The use of AI is present in many sectors such finance, health, transport, or even agriculture. Machine Learning is a branch of AI based on the idea that computer systems can learn on their own from data. Data science has implemented Machine Learning algorithm such as the Artificial Neural Network to work with Statistics and Linear Regression for data processing. An ANN is the piece of a computing system designed to simulate the way the human brain analyses and processes information. It is the foundation of AI and solves problems that would prove impossible or difficult by human or statistical standards. But, is this resource always the best solution? This paper is about a comparison between Seasonal Artificial Neural Network with classic models as Seasonal Autoregressive Integrated Moving Average for rainfall forecasting. The project started by doing an introduction to Deep Learning and Machine Learning. Afterwards, the process of obtaining an adequate amount of data to create a proper dataset began. To do that, we used data from of some pluviometers distributed over the Hauts-de-Seineterritory from French government. With data from 2009 to 2020 of 19 sensors, the dataset was used to experiment with different algorithms and different configurations to obtain different predictions. The forecasting performance of SARIMA model and that of SANN were compared with four forecast performance measures: - Mean Forecast Error, Mean Absolute Error, Mean Squared Error and Root Mean Squared Error. Not only will the accuracy of the model be taken into account, but also the runtime and implementation requirements will be used as a benchmark. Finally, all models were tested in the same work environment and a conclusion was reached thanks to the results obtained from different reference points.Las Tecnologías de la información y la comunicación han evolucionado hasta el punto de estar presentes en la mayoría de las cosas de nuestra vida diaria. Incluso el objeto más simple que todos tienen en su hogar se está volviendo más inteligente, como cepillos de dientes, automóviles, teléfonos, etc. Todos esos dispositivos están conectados a Internet para hacernos la vida más fácil. La pregunta es, ¿cómo se procesa toda esa cantidad de datos? Aquí es cuando aparece la Inteligencia Artificial. La IA es la parte de las TIC dedicada al desarrollo de algoritmos que permite que una máquina tome decisiones inteligentes o, al menos, se comporte como si tuviera una inteligencia similar a la humana. El uso de la IA está presente en muchos sectores como las finanzas, la salud, el transporte o incluso la agricultura. El aprendizaje automático es una rama de la inteligencia artificial basada en la idea de que los sistemas informáticos pueden aprender por sí mismos a partir de datos. La ciencia de datos ha implementado un algoritmo de aprendizaje automático como la Red Neuronal Artificial para trabajar con estadísticas y regresión lineal para el procesamiento de datos. Una ANN es la parte de un sistema informático diseñado para simular la forma en que el cerebro humano analiza y procesa la información. Es la base de la IA y resuelve problemas que resultarían imposibles o difíciles según los estándares humanos o estadísticos. Pero, ¿Es este recurso siempre la mejor solución? Este artículo trata de una comparación entre la Red Neural Artificial Estacional con modelos clásicos como Modelo Autorregresivo Integrado de Media Móvil Estacional para el pronóstico de lluvia. El proyecto comenzó con una introducción al Aprendizaje Profundo y al Aprendizaje Automático. Posteriormente, comenzó el proceso de obtener una cantidad adecuada de datos para crear un conjunto de datos necesario. Para ello, utilizamos datos de algunos pluviómetros distribuidos en el territorio de Hauts-de-Seine del gobierno de Francia. Con datos desde el 2009 hasta 2020 de 19 sensores, el conjunto de datos se utilizó para experimentar con diferentes algoritmos y diferentes configuraciones para obtener diferentes predicciones. El rendimiento de pronóstico del modelo SARIMA y el de SANN se compararon con cuatro medidas de rendimiento: - Error de Pronóstico Medio, Error de Pronóstico Absoluto, Error Cuadrático Medio y Raíz del Error Cuadrático Medio. No solo se tendrá en cuenta la precisión del modelo, sino que también se utilizarán como punto de referencia el tiempo de ejecución y los requisitos de implementación. Finalmente, todos los modelos fueron probados en el mismo entorno de trabajo y se llegó a una conclusión gracias a los resultados obtenidos de diferentes puntos de referencia.Les tecnologies de la informació i de la comunicació han evolucionat fins al punt d'estar presents en la majoria de coses de la nostra vida quotidiana. Fins i tot l'objecte més senzill que tothom té a casa és cada vegada més intel·ligent, com ara raspalls de dents, cotxes, telèfons, etc. Tots aquests dispositius estan connectats a Internet per facilitar-nos la vida. La pregunta és: com es processa tota aquesta quantitat de dades? Aquí és quan apareix la Intel·ligència Artificial. La IA és la part de les TIC dedicada al desenvolupament d'algoritmes que permet a una màquina prendre decisions intel·ligents o, si més no, comportar-se com si tingués una intel·ligència semblant a la humana. L'ús de la IA està present en molts sectors, com el financer, la salut, el transport o fins i tot l'agricultura. L'aprenentatge automàtic és una branca de la IA basada en la idea que els sistemes informàtics poden aprendre sols a partir de dades. La ciència de les dades ha implementat un algoritme d'aprenentatge automàtic, com és la Xarxa Neuronal Artificial, per treballar amb estadístiques i regressió lineal per al processament de dades. Una ANN és la part d'un sistema informàtic dissenyat per simular la manera com el cervell humà analitza i processa la informació. És el fonament de la IA i resol problemes que resultarien impossibles o difícils per als estàndards humans o estadístics. Però, aquest recurs és sempre la millor solució? Aquest article tracta sobre una comparació entre la Xarxa Neuronal Artificial Estacionals amb models clàssics com la Model Auto Regressiu Integrat de Mitjans Mòbils Estacional per a la predicció de pluges. El projecte va començar fent una introducció a l'Aprenentatge Profund i l'Aprenentatge Automàtic. Després, es va iniciar el procés d'obtenció d'una quantitat adequada de dades per crear un conjunt de dades necessari. Per fer-ho, hem utilitzat les dades d'alguns pluviòmetres distribuïts pel territori dels Hauts-de-Seine del govern de França. Amb dades des del 2009 fins al 2020 de 19 sensors, el conjunt de dades es va utilitzar per experimentar amb diferents algoritmes i diferents configuracions per obtenir prediccions diferents. El rendiment de la predicció del model SARIMA i el de SANN es van comparar mitjançant quatre mesures de rendiment: - L'Error de Previsió Mitjà, l'Error de Previsió Absolut, l'Error Quadràtic Mig i l'Arrel de l'Error Quadràtic Mig. No només es tindrà en compte la precisió del model, sinó que també s'utilitzaran els requisits d'execució i d'implementació com a benckmark. Finalment, tots els models es van provar en el mateix entorn de treball i es va arribar a una conclusió gràcies als resultats obtinguts de diferents punts de referència
    corecore