120 research outputs found


    Get PDF
    eScience fields which include areas such as spatial data, electromagnetic,bioinformatics, energy, social sciences, simulation, physical science have on the course of recent years a significant development regarding the complexity of algorithms and applications for data analysis. Information data has also evolved with an explosion in term of data volume and datasets for the scientific community. This has led researchers to identify new necessity regarding tools analysis, applications, by a profound change in computing infrastructures utilization. The field of eScience is constantly evolving through the creation of ever more growing scientific community who have a real needs in availability in computational resources ever more powerful calculations. Another important issue is the ability to be able to share results, this is why cloud technology through virtualization can be an important help for the scientist community for giving a flexible and scalable IT infrastructure depending on necessities. Indeed, cloud computing allows for the provision of computing resources, storage in an easy configurable way and adaptable in functions of real needs. Researchers often do not have all the computing capacities to meet their needs, so cloud technology and cloud models as Private, Public and Hybrid is an enable technology for having a guarantee of service availability, scalability and flexibility. The transition from traditional infrastructure to new virtualized with distributed models allows researchers to have access to an environment extremely flexible allowing an optimization of the use of hardware for having more available resources. However, the computational needs on e-Science have a direct effect regarding the way that applications are developed. The approach of writing algorithm and applications is still too tied to a model centered on a workstation for example. The vast majority of researchers conducts the writing process of their applications on their laptop or workstation in a limited context of computing power, storage and in a non-distributed way


    Get PDF
    eScience fields which include areas such as spatial data, electromagnetic,bioinformatics, energy, social sciences, simulation, physical science have on the course of recent years a significant development regarding the complexity of algorithms and applications for data analysis. Information data has also evolved with an explosion in term of data volume and datasets for the scientific community. This has led researchers to identify new necessity regarding tools analysis, applications, by a profound change in computing infrastructures utilization. The field of eScience is constantly evolving through the creation of ever more growing scientific community who have a real needs in availability in computational resources ever more powerful calculations. Another important issue is the ability to be able to share results, this is why cloud technology through virtualization can be an important help for the scientist community for giving a flexible and scalable IT infrastructure depending on necessities. Indeed, cloud computing allows for the provision of computing resources, storage in an easy configurable way and adaptable in functions of real needs. Researchers often do not have all the computing capacities to meet their needs, so cloud technology and cloud models as Private, Public and Hybrid is an enable technology for having a guarantee of service availability, scalability and flexibility. The transition from traditional infrastructure to new virtualized with distributed models allows researchers to have access to an environment extremely flexible allowing an optimization of the use of hardware for having more available resources. However, the computational needs on e-Science have a direct effect regarding the way that applications are developed. The approach of writing algorithm and applications is still too tied to a model centered on a workstation for example. The vast majority of researchers conducts the writing process of their applications on their laptop or workstation in a limited context of computing power, storage and in a non-distributed wa

    Research And Application Of Parallel Computing Algorithms For Statistical Phylogenetic Inference

    Get PDF
    Estimating the evolutionary history of organisms, phylogenetic inference, is a critical step in many analyses involving biological sequence data such as DNA. The likelihood calculations at the heart of the most effective methods for statistical phylogenetic analyses are extremely computationally intensive, and hence these analyses become a bottleneck in many studies. Recent progress in computer hardware, specifically the increase in pervasiveness of highly parallel, many-core processors has created opportunities for new approaches to computationally intensive methods, such as those in phylogenetic inference. We have developed an open source library, BEAGLE, which uses parallel computing methods to greatly accelerate statistical phylogenetic inference, for both maximum likelihood and Bayesian approaches. BEAGLE defines a uniform application programming interface and includes a collection of efficient implementations that use NVIDIA CUDA, OpenCL, and C++ threading frameworks for evaluating likelihoods under a wide variety of evolutionary models, on GPUs as well as on multi-core CPUs. BEAGLE employs a number of different parallelization techniques for phylogenetic inference, at different granularity levels and for distinct processor architectures. On CUDA and OpenCL devices, the library enables concurrent computation of site likelihoods, data subsets, and independent subtrees. The general design features of the library also provide a model for software development using parallel computing frameworks that is applicable to other domains. BEAGLE has been integrated with some of the leading programs in the field, such as MrBayes and BEAST, and is used in a diverse range of evolutionary studies, including those of disease causing viruses. The library can provide significant performance gains, with the exact increase in performance depending on the specific properties of the data set, evolutionary model, and hardware. In general, nucleotide analyses are accelerated on the order of 10-fold and codon analyses on the order of 100-fold

    Inexact Mapping of Short Biological Sequences in High Performance Computational Environments

    Full text link
    La bioinformática es la aplicación de las ciencias computacionales a la gestión y análisis de datos biológicos. A partir de 2005, con la aparición de los secuenciadores de ADN de nueva generación surge lo que se conoce como Next Generation Sequencing o NGS. Un único experimento biológico puesto en marcha en una máquina de secuenciación NGS puede producir fácilmente cientos de gigabytes o incluso terabytes de datos. Dependiendo de la técnica elegida este proceso puede realizarse en unas pocas horas o días. La disponibilidad de recursos locales asequibles, tales como los procesadores multinúcleo o las nuevas tarjetas gráfi cas preparadas para el cálculo de propósito general GPGPU (General Purpose Graphic Processing Unit ), constituye una gran oportunidad para hacer frente a estos problemas. En la actualidad, un tema abordado con frecuencia es el alineamiento de secuencias de ADN. En bioinformática, el alineamiento permite comparar dos o más secuencias de ADN, ARN, o estructuras primarias proteicas, resaltando sus zonas de similitud. Dichas similitudes podrían indicar relaciones funcionales o evolutivas entre los genes o proteínas consultados. Además, la existencia de similitudes entre las secuencias de un individuo paciente y de otro individuo con una enfermedad genética detectada podría utilizarse de manera efectiva en el campo de la medicina diagnóstica. El problema en torno al que gira el desarrollo de la tesis doctoral consiste en la localización de fragmentos de secuencia cortos dentro del ADN. Esto se conoce bajo el sobrenombre de mapeo de secuencia o sequence mapping. Dicho mapeo debe permitir errores, pudiendo mapear secuencias incluso existiendo variabilidad genética o errores de lectura en el mapeo. Existen diversas técnicas para abordar el mapeo, pero desde la aparición de la NGS destaca la búsqueda por pre jos indexados y agrupados mediante la transformada de Burrows-Wheeler [28] (o BWT en lo sucesivo). Dicha transformada se empleó originalmente en técnicas de compresión de datos, como es el caso del algoritmo bzip2. Su utilización como herramienta para la indización y búsqueda posterior de información es más reciente [22]. La ventaja es que su complejidad computacional depende únicamente de la longitud de la secuencia a mapear. Por otra parte, una gran cantidad de técnicas de alineamiento se basan en algoritmos de programación dinámica, ya sea Smith-Watterman o modelos ocultos de Markov. Estos proporcionan mayor sensibilidad, permitiendo mayor cantidad de errores, pero su coste computacional es mayor y depende del tamaño de la secuencia multiplicado por el de la cadena de referencia. Muchas herramientas combinan una primera fase de búsqueda con la BWT de regiones candidatas al alineamiento y una segunda fase de alineamiento local en la que se mapean cadenas con Smith-Watterman o HMM. Cuando estamos mapeando permitiendo pocos errores, una segunda fase con un algoritmo de programación dinámica resulta demasiado costosa, por lo que una búsqueda inexacta basada en BWT puede resultar más e ficiente. La principal motivación de la tesis doctoral es la implementación de un algoritmo de búsqueda inexacta basado únicamente en la BWT, adaptándolo a las arquitecturas paralelas modernas, tanto en CPU como en GPGPU. El algoritmo constituirá un método nuevo de rami cación y poda adaptado a la información genómica. Durante el periodo de estancia se estudiarán los Modelos ocultos de Markov y se realizará una implementación sobre modelos de computación funcional GTA (Aggregate o Test o Generate), así como la paralelización en memoria compartida y distribuida de dicha plataforma de programación funcional.Salavert Torres, J. (2014). Inexact Mapping of Short Biological Sequences in High Performance Computational Environments [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/43721TESI
    • …