199 research outputs found

    Fortran FLUSH statement SYNC= specifier proposal

    Get PDF
    This paper describes a proposed change to the Fortran standard to provide a mechanism for Fortran programs to ensure that the transfer of data to an external file via preceding WRITE statement(s) has completed

    Optimization of SAMtools sorting using OpenMP tasks

    Get PDF
    SAMtools is a widely-used genomics application for post-processing high-throughput sequence alignment data. Such sequence alignment data are commonly sorted to make downstream analysis more efficient. However, this sorting process itself can be computationally- and I/O-intensive: high-throughput sequence alignment files in the de facto standard binary alignment/map (BAM) format can be many gigabytes in size, and may need to be decompressed before sorting and compressed afterwards. As a result, BAM-file sorting can be a bottleneck in genomics workflows. This paper describes a case study on the performance analysis and optimization of SAMtools for sorting large BAM files. OpenMP task parallelism and memory optimization techniques resulted in a speedup of 5.9X versus the upstream SAMtools 1.3.1 for an internal (in-memory) sort of 24.6 GiB of compressed BAM data (102.6 GiB uncompressed) with 32 processor cores, while a 1.98X speedup was achieved for an external (out-of-core) sort of a 271.4 GiB BAM file

    Recovery from Fail-Stop Failures in Parallel Fortran Applications

    Get PDF
    The Fortran 2018 standard defines syntax and semantics to allow a parallel application to recover from failed images (processes) during execution. This poster presents work to extend the GFortran compiler front end and OpenCoarrays library to support fault tolerant teams of images, enabling use of collective routines after an image failure

    High-performance epistasis detection in quantitative trait GWAS

    Get PDF
    epiSNP is a program for identifying pairwise single nucleotide polymorphism (SNP) interactions (epistasis) in quantitative-trait genome-wide association studies (GWAS). A parallel MPI version (EPISNPmpi) was created in 2008 to address this computationally expensive analysis on large data sets with many quantitative traits and SNP markers. However, the falling cost of genotyping has led to an explosion of large-scale GWAS data sets that challenge EPISNPmpi’s ability to compute results in a reasonable amount of time. Therefore, we optimized epiSNP for modern multi-core and highly parallel many-core processors to efficiently handle these large data sets. This paper describes the serial optimizations, dynamic load balancing using MPI-3 RMA operations, and shared-memory parallelization with OpenMP to further enhance load balancing and allow execution on the Intel Xeon Phi coprocessor (MIC). For a large GWAS data set, our optimizations provided a 38.43× speedup over EPISNPmpi on 126 nodes using 2 MICs on TACC’s Stampede Supercomputer. We also describe a Coarray Fortran (CAF) version that demonstrates the suitability of PGAS languages for problems with this computational pattern. We show that the Coarray version performs competitively with the MPI version on the NERSC Edison Cray XC30 supercomputer. Finally, the performance benefits of hyper-threading for this application on Edison (average 1.35× speedup) are demonstrated

    Fatty Acid SNP Interaction Analysis in Angus Sired Beef Cattle

    Get PDF
    The triacylglyceride (TAG) fatty acid content in meat from Angus-sired cattle was analyzed for non-additive genetic effects. A total of 11,482 significant DNA marker interactions (false discovery rate [FDR] \u3c 0.05) were detected across thirty-seven different TAG fatty acids. Interactions were not evenly distributed amongst all fatty acids analyzed, and types of interactions (additive-by-additive, additive-by-dominance, and dominance-by-dominance) varied within each individual fatty acid. These results indicate that it may be possible to account for additional genetic variance amongst TAG fatty acids over and above individual markers

    Gene expression patterns are correlated with genomic and genic structure in soybean

    Get PDF
    Studies have indicated that exon and intron size and intergenic distance are correlated with gene expression levels and expression breadth. Previous reports on these correlations in plants and animals have been conflicting. In this study, next-generation sequence data, which has been shown to be more sensitive than previous expression profiling technologies, were generated and analyzed from 14 tissues. Our results revealed a novel dichotomy. At the low expression level, an increase in expression breadth correlated with an increase in transcript size because of an increase in the number of exons and introns. No significant changes in intron or exon sizes were noted. Conversely, genes expressed at the intermediate to high expression levels displayed a decrease in transcript size as their expression breadth increased. This was due to smaller exons, with no significant change in the number of exons. Taking advantage of the known gene space of soybean, we evaluated the positioning of genes and found significant clustering of similarly expressed genes. Identifying the correlations between the physical parameters of individual genes could lead to uncovering the role of regulation owing to nucleotide composition, which might have potential impacts in discerning the role of the noncoding regions

    The genetic code as expressed through relationships between mRNA structure and protein function

    Get PDF
    Structured RNA elements within messenger RNA often direct or modulate the cellular production of active proteins. As reviewed here, RNA structures have been discovered that govern nearly every step in protein production: mRNA production and stability; translation initiation, elongation, and termination; protein folding; and cellular localization. Regulatory RNA elements are common within RNAs from every domain of life. This growing body of RNA-mediated mechanisms continues to reveal new ways in which mRNA structure regulates translation. We integrate examples from several different classes of RNA structure-mediated regulation to present a global perspective that suggests that the secondary and tertiary structure of RNA ultimately constitutes an additional level of the genetic code that both guides and regulates protein biosynthesis

    Selective 2′-hydroxyl acylation analyzed by protection from exoribonuclease (RNase-detected SHAPE) for direct analysis of covalent adducts and of nucleotide flexibility in RNA

    Get PDF
    RNA SHAPE chemistry yields quantitative, single nucleotide resolution structural information based on the reaction of the 2′-hydroxyl group of conformationally flexible nucleotides with electrophilic SHAPE reagents. However, SHAPE technology has been limited by the requirement that sites of RNA modification be detected by primer extension. Primer extension results in loss of information at both the 5′ and 3′ ends of an RNA and requires multiple experimental steps. Here we describe RNase-detected SHAPE (Selective 2′-Hydroxyl Acylation analyzed by Protection from Exoribonuclease) that uses a processive, 3′→5′ exoribonuclease, RNase R, to detect covalent adducts in 5′-end labeled RNA in a one-tube experiment. RNase R degrades RNA but stops quantitatively three and four nucleotides 3′ of a nucleotide containing a covalent adduct at the ribose 2′-hydroxyl or the pairing face of a nucleobase, respectively. We illustrate this technology by characterizing ligand-induced folding for the E. coli thiamine pyrophosphate riboswitch RNA. RNase-detected SHAPE is a facile, two-day approach that can be used to analyze diverse covalent adducts in any RNA molecule, including short RNAs not amenable to analysis by primer extension and RNAs with functionally important structures at their 5′ or 3′ ends

    Selective 2′-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis

    Get PDF
    SHAPE chemistries exploit small electrophilic reagents that react with the 2′-hydroxyl group to interrogate RNA structure at single-nucleotide resolution. Mutational profiling (MaP) identifies modified residues based on the ability of reverse transcriptase to misread a SHAPE-modified nucleotide and then counting the resulting mutations by massively parallel sequencing. The SHAPE-MaP approach measures the structure of large and transcriptome-wide systems as accurately as for simple model RNAs. This protocol describes the experimental steps, implemented over three days, required to perform SHAPE probing and construct multiplexed SHAPE-MaP libraries suitable for deep sequencing. These steps include RNA folding and SHAPE structure probing, mutational profiling by reverse transcription, library construction, and sequencing. Automated processing of MaP sequencing data is accomplished using two software packages. ShapeMapper converts raw sequencing files into mutational profiles, creates SHAPE reactivity plots, and provides useful troubleshooting information, often within an hour. SuperFold uses these data to model RNA secondary structures, identify regions with well-defined structures, and visualize probable and alternative helices, often in under a day. We illustrate these algorithms with the E. coli thiamine pyrophosphate riboswitch, E. coli 16S rRNA, and HIV-1 genomic RNAs. SHAPE-MaP can be used to make nucleotide-resolution biophysical measurements of individual RNA motifs, rare components of complex RNA ensembles, and entire transcriptomes. The straightforward MaP strategy greatly expands the number, length, and complexity of analyzable RNA structures
    • …
    corecore