17 research outputs found

    True positives versus total number of predicted candidates for the <i>S</i>. <i>cerevisiae</i> genome.

    No full text
    <p>True positives versus total number of predicted candidates for the <i>S</i>. <i>cerevisiae</i> genome.</p

    True positives versus total number of predicted candidates curve for the <i>S</i>. <i>coelicolor</i> genome.

    No full text
    <p>True positives versus total number of predicted candidates curve for the <i>S</i>. <i>coelicolor</i> genome.</p

    ROC curves for benchmarks on high and low entropy ranges of the fourth RFAM test.

    No full text
    <p>(A) ROC curves for Multifind, Multifind trained without ensemble defect Z score, RNAz, LocARNATE+RNAz and Dynalign/SVM on the low-entropy range (<0.3) of the 4th testing set. (B) The high-specificity range of the ROC curves for Multifind, Multifind trained without ensemble defect Z score, RNAz, LocARNATE+RNAz and Dynalign/SVM on the low-entropy range (<0.3) of the fourth testing set. (C) ROC curves for Multifind, Multifind trained without ensemble defect Z score, RNAz, LocARNATE+RNAz and Dynalign/SVM on the high-entropy range (>0.3) of the fourth testing set. (D) The high specificity range of the ROC curves for Multifind, Multifind trained without ensemble defect Z score, RNAz, LocARNATE+RNAz and Dynalign/SVM on the high-entropy range (>0.3) of the fourth testing set.</p

    Overlap of known ncRNAs discovered by three methods.

    No full text
    <p>(A) The Venn diagram of the known ncRNAs predicted on (A) <i>S</i>. <i>cerevisiae</i> genome among the top 500 candidates by each method. (B) <i>E</i>. <i>coli</i> genome among the top 500 candidates by each method. (C) <i>S</i>. <i>coelicolor</i> genome among the top 100 candidates by each method.</p

    ROC curves for benchmarks on the third RFAM testing set.

    No full text
    <p>(A) ROC curves for Multifind, Multifind trained without ensemble defect Z score, RNAz, LocaRNATE+RNAz and Dynalign/SVM on the 3rd testing set. (B) The high-specificity range of the ROC curves for Multifind, Multifind trained without ensemble defect Z score, RNAz, LocARNATE+RNAz and Dynalign/SVM on the third testing set.</p

    Discovery of Novel ncRNA Sequences in Multiple Genome Alignments on the Basis of Conserved and Stable Secondary Structures

    No full text
    <div><p>Recently, non-coding RNAs (ncRNAs) have been discovered with novel functions, and it has been appreciated that there is pervasive transcription of genomes. Moreover, many novel ncRNAs are not conserved on the primary sequence level. Therefore, de novo computational ncRNA detection that is accurate and efficient is desirable. The purpose of this study is to develop a ncRNA detection method based on conservation of structure in more than two genomes. A new method called Multifind, using Multilign, was developed. Multilign predicts the common secondary structure for multiple input sequences. Multifind then uses measures of structure conservation to estimate the probability that the input sequences are a conserved ncRNA using a classification support vector machine. Multilign is based on Dynalign, which folds and aligns two sequences simultaneously using a scoring scheme that does not include sequence identity; its structure prediction quality is therefore not affected by input sequence diversity. Additionally, ensemble defect was introduced to Multifind as an additional discriminating feature that quantifies the compactness of the folding space for a sequence. Benchmarks showed Multifind performs better than RNAz and LocARNATE+RNAz, a method that uses RNAz on structure alignments generated by LocARNATE, on testing sequences extracted from the Rfam database. For de novo ncRNA discovery in three genomes, Multifind and LocARNATE+RNAz had an advantage over RNAz in low similarity regions of genome alignments. Additionally, Multifind and LocARNATE+RNAz found different subsets of known ncRNA sequences, suggesting the two approaches are complementary.</p></div

    True positives versus total number of predicted candidates for the <i>E</i>. <i>coli</i> genome.

    No full text
    <p>True positives versus total number of predicted candidates for the <i>E</i>. <i>coli</i> genome.</p

    Time consumption of Multifind, LocARNATE+RNAz and RNAz on 100 Rfam alignments and 100 yeast alignments on a single core of an Intel Xeon CPU E5450 @ 3.00GHz.

    No full text
    <p>Time consumption of Multifind, LocARNATE+RNAz and RNAz on 100 Rfam alignments and 100 yeast alignments on a single core of an Intel Xeon CPU E5450 @ 3.00GHz.</p

    Benchmarks for ncRNA discovery in yeast.

    No full text
    <p>(A) True positives versus total number of predicted candidates for the <i>S</i>. <i>cerevisiae</i> genome on low similarity (S>0.3) alignment windows. (B) True positives versus total number of predicted candidates curve for the <i>S</i>. <i>cerevisiae</i> genome on high similarity (S<0.3) alignment windows.</p

    Unweighted UniFrac distances

    No full text
    The unweighted UniFrac distance (Lozupone and Knight AEM 2005) matrix of the 9511 fecal samples used in the American Gut paper. UniFrac was computed using Striped UniFrac (https://github.com/biocore/unifrac). Prior to execution of UniFrac, Deblur (Amir et al mSystems 2017) was run on the samples, all bloom sOTUs were removed (Amir et al mSystems 2017), and samples were rarefied to a depth of 1250 reads (Weiss et al Microbiome 2017). For the phylogeny, fragments were inserted using SEPP (Mirarab et al Pac Symp Biocomput 2012) into the Greengenes 13_5 99% OTU tree (McDonald et al ISME 2012)
    corecore