1,796 research outputs found

    CLARK: fast and accurate classification of metagenomic and genomic sequences using discriminative k-mers.

    Get PDF
    BackgroundThe problem of supervised DNA sequence classification arises in several fields of computational molecular biology. Although this problem has been extensively studied, it is still computationally challenging due to size of the datasets that modern sequencing technologies can produce.ResultsWe introduce CLARK a novel approach to classify metagenomic reads at the species or genus level with high accuracy and high speed. Extensive experimental results on various metagenomic samples show that the classification accuracy of CLARK is better or comparable to the best state-of-the-art tools and it is significantly faster than any of its competitors. In its fastest single-threaded mode CLARK classifies, with high accuracy, about 32 million metagenomic short reads per minute. CLARK can also classify BAC clones or transcripts to chromosome arms and centromeric regions.ConclusionsCLARK is a versatile, fast and accurate sequence classification method, especially useful for metagenomics and genomics applications. It is freely available at http://clark.cs.ucr.edu/

    Lettura di "Valmorbia..."

    Get PDF
    Abstrac

    PuFFIN--a parameter-free method to build nucleosome maps from paired-end reads.

    Get PDF
    BackgroundWe introduce a novel method, called PuFFIN, that takes advantage of paired-end short reads to build genome-wide nucleosome maps with larger numbers of detected nucleosomes and higher accuracy than existing tools. In contrast to other approaches that require users to optimize several parameters according to their data (e.g., the maximum allowed nucleosome overlap or legal ranges for the fragment sizes) our algorithm can accurately determine a genome-wide set of non-overlapping nucleosomes without any user-defined parameter. This feature makes PuFFIN significantly easier to use and prevents users from choosing the "wrong" parameters and obtain sub-optimal nucleosome maps.ResultsPuFFIN builds genome-wide nucleosome maps using a multi-scale (or multi-resolution) approach. Our algorithm relies on a set of nucleosome "landscape" functions at different resolution levels: each function represents the likelihood of each genomic location to be occupied by a nucleosome for a particular value of the smoothing parameter. After a set of candidate nucleosomes is computed for each function, PuFFIN produces a consensus set that satisfies non-overlapping constraints and maximizes the number of nucleosomes.ConclusionsWe report comprehensive experimental results that compares PuFFIN with recently published tools (NOrMAL, TEMPLATE FILTERING, and NucPosSimulator) on several synthetic datasets as well as real data for S. cerevisiae and P. falciparum. Experimental results show that our approach produces more accurate nucleosome maps with a higher number of non-overlapping nucleosomes than other tools

    RasBhari: optimizing spaced seeds for database searching, read mapping and alignment-free sequence comparison

    Full text link
    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de

    Compression of Biological Sequences by Greedy Off-Line Textual Subsitution

    Get PDF

    Identification of candidate genes and molecular markers for heat-induced brown discoloration of seed coats in cowpea [Vigna unguiculata (L.) Walp].

    Get PDF
    BackgroundHeat-induced browning (Hbs) of seed coats is caused by high temperatures which discolors the seed coats of many legumes, affecting the visual appearance and quality of seeds. The genetic determinants underlying Hbs in cowpea are unknown.ResultsWe identified three QTL associated with the heat-induced browning of seed coats trait, Hbs-1, Hbs-2 and Hbs-3, using cowpea RIL populations IT93K-503-1 (Hbs positive) x CB46 (hbs negative) and IT84S-2246 (Hbs positive) x TVu14676 (hbs negative). Hbs-1 was identified in both populations, accounting for 28.3% -77.3% of the phenotypic variation. SNP markers 1_0032 and 1_1128 co-segregated with the trait. Within the syntenic regions of Hbs-1 in soybean, Medicago and common bean, several ethylene forming enzymes, ethylene responsive element binding factors and an ACC oxidase 2 were observed. Hbs-1 was identified in a BAC clone in contig 217 of the cowpea physical map, where ethylene forming enzymes were present. Hbs-2 was identified in the IT93K-503-1 x CB46 population and accounted for of 9.5 to 12.3% of the phenotypic variance. Hbs-3 was identified in the IT84S-2246 x TVu14676 population and accounted for 6.2 to 6.8% of the phenotypic variance. SNP marker 1_0640 co-segregated with the heat-induced browning phenotype. Hbs-3 was positioned on BAC clones in contig512 of the cowpea physical map, where several ACC synthase 1 genes were present.ConclusionThe identification of loci determining heat-induced browning of seed coats and co-segregating molecular markers will enable transfer of hbs alleles into cowpea varieties, contributing to higher quality seeds

    Efficient and Accurate Detection of Topologically Associating Domains from Contact Maps

    Get PDF
    Continuous improvements to high-throughput conformation capture (Hi-C) are revealing richerinformation about the spatial organization of the chromatin and its role in cellular functions.Several studies have confirmed the existence of structural features of the genome 3D organiza-tion that are stable across cell types and conserved across species, calledtopological associatingdomains(TADs). The detection of TADs has become a critical step in the analysis of Hi-C data,e.g., to identify enhancer-promoter associations. Here we presentEast, a novel TAD identifi-cation algorithm based on fast 2D convolution of Haar-like features, that is as accurate as thestate-of-the-art method based on the directionality index, but 75-80x faster.Eastis availablein the public domain at https://github.com/ucrbioinfo/EAST
    • …
    corecore