5 research outputs found

    Efficient Storage of Genomic Sequences in High Performance Computing Systems

    Get PDF
    ABSTRACT: In this dissertation, we address the challenges of genomic data storage in high performance computing systems. In particular, we focus on developing a referential compression approach for Next Generation Sequence data stored in FASTQ format files. The amount of genomic data available for researchers to process has increased exponentially, bringing enormous challenges for its efficient storage and transmission. General-purpose compressors can only offer limited performance for genomic data, thus the need for specialized compression solutions. Two trends have emerged as alternatives to harness the particular properties of genomic data: non-referential and referential compression. Non-referential compressors offer higher compression rations than general purpose compressors, but still below of what a referential compressor could theoretically achieve. However, the effectiveness of referential compression depends on selecting a good reference and on having enough computing resources available. This thesis presents one of the first referential compressors for FASTQ files. We first present a comprehensive analytical and experimental evaluation of the most relevant tools for genomic raw data compression, which led us to identify the main needs and opportunities in this field. As a consequence, we propose a novel compression workflow that aims at improving the usability of referential compressors. Subsequently, we discuss the implementation and performance evaluation for the core of the proposed workflow: a referential compressor for reads in FASTQ format that combines local read-to-reference alignments with a specialized binary-encoding strategy. The compression algorithm, named UdeACompress, achieved very competitive compression ratios when compared to the best compressors in the current state of the art, while showing reasonable execution times and memory use. In particular, UdeACompress outperformed all competitors when compressing long reads, typical of the newest sequencing technologies. Finally, we study the main aspects of the data-level parallelism in the Intel AVX-512 architecture, in order to develop a parallel version of the UdeACompress algorithms to reduce the runtime. Through the use of SIMD programming, we managed to significantly accelerate the main bottleneck found in UdeACompress, the Suffix Array Construction

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study

    Modélisation des problèmes de grandes déformations multi-domaines par une approche Eulérienne monolithique massivement parallèle

    Get PDF
    Modeling of multi-domain problems is addressed in a Purely Eulerian framework. A single mesh is used all over the domain. The evolution of the different interacting bodies is described using numerical tools such as the Level Set method. The characteristics of the subdomains, considered as heterogeneities in the mesh, are determined using mixture laws.This work is one of the first attempts applying fully Eulerian Approach to Model large deformation problems. Therefore, the capacity of this approach is tested to determine necessary developments. The friction between the different objects is managed by adding a boundary layer implying the presence of a lubricant. Combined with an identification technique, a new quadratic mixture Law is introduced to determine the lubricant viscosity. Comparisons have been performed with Forge® and results were found satisfactory. To treat the contact problem between the different objects, a directional solver was developed. Despite the interesting results, it remains the topic of further improvements. The scalability of the approach in a massively parallel environment is tested as well. Several recommendations were proposed to ensure an optimal performance. The technique of a single mesh guarantees a very good scalability since the efficiency of parallelism depends of the partition of a single mesh (unlike the Lagrangian Methods). The proposed method presents undeniable capacities but remains far from being complete. Ideas for future Improvements are proposed accordingly.La modélisation des problèmes multi-domaine est abordée dans un cadre purement Eulérien. Un maillage unique, ne représentant plus la matière, est utilisé. Les différentes frontières et leur évolution sont décrites via des outils numériques tels que la méthode Level Set. Les caractéristiques locales de chaque sous domaines sont déterminées par des lois de mélange.Ce travail est une des premières tentations appliquant une approche Eulérienne pour modéliser de problèmes de grandes déformations. Dans un premier temps, la capacité de l'approche est testée afin de déterminer les développements nécessaires.Le frottement entre les différents objets est géré par un lubrifiant ajouté dans une couche limite. Combinée avec une technique d'identification, une nouvelle loi de mélange quadratique est introduite pour décrire la viscosité du lubrifiant. Des comparaisons ont été effectuées avec Forge® et les résultats sont trouvés satisfaisants. Pour traiter le contact entre les différents objets, un solveur directionnel a été développé. Malgré que les résultats soient intéressants, il reste le sujet de nouvelles améliorations. La scalabilité de l'approche dans un environnement massivement parallèle est testée aussi. Plusieurs recommandations ont été proposées pour s'assurer d'une performance optimale. La technique du maillage unique permet d'obtenir une très bonne scalabilité. L'efficacité du parallélisme ne dépend que de la partition d'un seul maillage (contrairement aux méthodes Lagrangiennes). La méthode proposée présente des capacités indéniables mais reste loin d'être complète. Des pistes d'amélioration sont proposées en conséquence

    XXIII Edición del Workshop de Investigadores en Ciencias de la Computación : Libro de actas

    Get PDF
    Compilación de las ponencias presentadas en el XXIII Workshop de Investigadores en Ciencias de la Computación (WICC), llevado a cabo en Chilecito (La Rioja) en abril de 2021.Red de Universidades con Carreras en Informátic

    Safety and Reliability - Safe Societies in a Changing World

    Get PDF
    The contributions cover a wide range of methodologies and application areas for safety and reliability that contribute to safe societies in a changing world. These methodologies and applications include: - foundations of risk and reliability assessment and management - mathematical methods in reliability and safety - risk assessment - risk management - system reliability - uncertainty analysis - digitalization and big data - prognostics and system health management - occupational safety - accident and incident modeling - maintenance modeling and applications - simulation for safety and reliability analysis - dynamic risk and barrier management - organizational factors and safety culture - human factors and human reliability - resilience engineering - structural reliability - natural hazards - security - economic analysis in risk managemen
    corecore