14 research outputs found

    Protein alignment algorithms with an efficient backtracking routine on multiple GPUs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Pairwise sequence alignment methods are widely used in biological research. The increasing number of sequences is perceived as one of the upcoming challenges for sequence alignment methods in the nearest future. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have been proposed lately. These solutions show a great potential of a GPU platform but in most cases address the problem of sequence database scanning and computing only the alignment score whereas the alignment itself is omitted. Thus, the need arose to implement the global and semiglobal Needleman-Wunsch, and Smith-Waterman algorithms with a backtracking procedure which is needed to construct the alignment.</p> <p>Results</p> <p>In this paper we present the solution that performs the alignment of every given sequence pair, which is a required step for progressive multiple sequence alignment methods, as well as for DNA recognition at the DNA assembly stage. Performed tests show that the implementation, with performance up to 6.3 GCUPS on a single GPU for affine gap penalties, is very efficient in comparison to other CPU and GPU-based solutions. Moreover, multiple GPUs support with load balancing makes the application very scalable.</p> <p>Conclusions</p> <p>The article shows that the backtracking procedure of the sequence alignment algorithms may be designed to fit in with the GPU architecture. Therefore, our algorithm, apart from scores, is able to compute pairwise alignments. This opens a wide range of new possibilities, allowing other methods from the area of molecular biology to take advantage of the new computational architecture. Performed tests show that the efficiency of the implementation is excellent. Moreover, the speed of our GPU-based algorithms can be almost linearly increased when using more than one graphics card.</p

    Measures for interoperability of phenotypic data: minimum information requirements and formatting

    Get PDF
    BackgroundPlant phenotypic data shrouds a wealth of information which, when accurately analysed and linked to other data types, brings to light the knowledge about the mechanisms of life. As phenotyping is a field of research comprising manifold, diverse and time-consuming experiments, the findings can be fostered by reusing and combining existing datasets. Their correct interpretation, and thus replicability, comparability and interoperability, is possible provided that the collected observations are equipped with an adequate set of metadata. So far there have been no common standards governing phenotypic data description, which hampered data exchange and reuse.ResultsIn this paper we propose the guidelines for proper handling of the information about plant phenotyping experiments, in terms of both the recommended content of the description and its formatting. We provide a document called “Minimum Information About a Plant Phenotyping Experiment”, which specifies what information about each experiment should be given, and a Phenotyping Configuration for the ISA-Tab format, which allows to practically organise this information within a dataset. We provide examples of ISA-Tab-formatted phenotypic data, and a general description of a few systems where the recommendations have been implemented.ConclusionsAcceptance of the rules described in this paper by the plant phenotyping community will help to achieve findable, accessible, interoperable and reusable data

    G-MAPSEQ – a new method for mapping reads to a reference genome

    No full text
    The problem of reads mapping to a reference genome is one of the most essential problems in modern computational biology. The most popular algorithms used to solve this problem are based on the Burrows-Wheeler transform and the FM-index. However, this causes some issues with highly mutated sequences due to a limited number of mutations allowed. G-MAPSEQ is a novel, hybrid algorithm combining two interesting methods: alignment-free sequence comparison and an ultra fast sequence alignment. The former is a fast heuristic algorithm which uses k-mer characteristics of nucleotide sequences to find potential mapping places. The latter is a very fast GPU implementation of sequence alignment used to verify the correctness of these mapping positions. The source code of G-MAPSEQ along with other bioinformatic software is available at: http://gpualign.cs.put.poznan.pl

    GRASShopPER-An algorithm for de novo assembly based on GPU alignments.

    No full text
    Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of overlap-layout-consensus. It uses an efficient GPU implementation for the sequence alignment during the graph construction stage and a greedy hyper-heuristic algorithm at the fork detection stage. A two-part fork detection method allows us to identify repeated fragments of a genome and to reconstruct them without misassemblies. The assemblies of data sets of bacteria Candidatus Microthrix, nematode Caenorhabditis elegans, and human chromosome 14 were evaluated with the golden standard tool QUAST. In comparison with other assemblers, GRASShopPER provided contigs that covered the largest part of the genomes and, at the same time, kept good values of other metrics, e.g., NG50 and misassembly rate

    Quantitative Trait Loci for Yield and Yield-Related Traits in Spring Barley Populations Derived from Crosses between European and Syrian Cultivars

    Get PDF
    <div><p>In response to climatic changes, breeding programmes should be aimed at creating new cultivars with improved resistance to water scarcity. The objective of this study was to examine the yield potential of barley recombinant inbred lines (RILs) derived from three cross-combinations of European and Syrian spring cultivars, and to identify quantitative trait loci (QTLs) for yield-related traits in these populations. RILs were evaluated in field experiments over a period of three years (2011 to 2013) and genotyped with simple sequence repeat (SSR) and single nucleotide polymorphism (SNP) markers; a genetic map for each population was constructed and then one consensus map was developed. Biological interpretation of identified QTLs was achieved by reference to Ensembl Plants barley gene space. Twelve regions in the genomes of studied RILs were distinguished after QTL analysis. Most of the QTLs were identified on the 2H chromosome, which was the hotspot region in all three populations. Syrian parental cultivars contributed alleles decreasing traits' values at majority of QTLs for grain weight, grain number, spike length and time to heading, and numerous alleles increasing stem length. The phenomic and molecular approaches distinguished the lines with an acceptable grain yield potential combining desirable features or alleles from their parents, that is, early heading from the Syrian breeding line (Cam/B1/CI08887//CI05761) and short plant stature from the European semidwarf cultivar (Maresi).</p></div
    corecore