91 research outputs found

    Four assessment metrics of assemblies on S.cerevisiae datasets with different lengths.

    No full text
    <p>Four assessment metrics of assemblies on S.cerevisiae datasets with different lengths.</p

    L'Écho : grand quotidien d'information du Centre Ouest

    No full text
    23 septembre 19341934/09/23 (A63).Appartient à l’ensemble documentaire : PoitouCh

    The Impacts of Read Length and Transcriptome Complexity for <i>De Novo</i> Assembly: A Simulation Study

    No full text
    <div><p>Transcriptome assembly using RNA-seq data - particularly in non-model organisms has been dramatically improved, but only recently have the pre-assembly procedures, such as sequencing depth and error correction, been studied. Increasing read length is viewed as a crucial condition to further improve transcriptome assembly, but it is unknown whether the read length really matters. In addition, though many assembly tools are available now, it is unclear whether the existing assemblers perform well enough for all data with different transcriptome complexities. In this paper, we studied these two open problems using two high-performing assemblers, Velvet/Oases and Trinity, on several simulated datasets of human, mouse and S.cerevisiae. The results suggest that (1) the read length of paired reads does not matter once it exceeds a certain threshold, and interestingly, the threshold is distinct in different organisms; (2) the quality of <i>de novo</i> assembly decreases sharply with the increase of transcriptome complexity, all existing <i>de novo</i> assemblers tend to corrupt whenever the genes contain a large number of alternative splicing events.</p></div

    Comparison of <i>de novo</i> assemblies on five human datasets with different numbers of spliced isoforms.

    No full text
    <p>Comparison of <i>de novo</i> assemblies on five human datasets with different numbers of spliced isoforms.</p

    Analyses of five human datasets with different numbers of spliced isoforms.

    No full text
    <p>(A) Boxplot of exon number of each gene. (B) Boxplot of spliced isoform length.</p

    Elucidation of Operon Structures across Closely Related Bacterial Genomes

    No full text
    <div><p>About half of the protein-coding genes in prokaryotic genomes are organized into operons to facilitate co-regulation during transcription. With the evolution of genomes, operon structures are undergoing changes which could coordinate diverse gene expression patterns in response to various stimuli during the life cycle of a bacterial cell. Here we developed a graph-based model to elucidate the diversity of operon structures across a set of closely related bacterial genomes. In the constructed graph, each node represents one orthologous gene group (OGG) and a pair of nodes will be connected if any two genes, from the corresponding two OGGs respectively, are located in the same operon as immediate neighbors in any of the considered genomes. Through identifying the connected components in the above graph, we found that genes in a connected component are likely to be functionally related and these identified components tend to form treelike topology, such as paths and stars, corresponding to different biological mechanisms in transcriptional regulation as follows. Specifically, (i) a path-structure component integrates genes encoding a protein complex, such as ribosome; and (ii) a star-structure component not only groups related genes together, but also reflects the key functional roles of the central node of this component, such as the ABC transporter with a transporter permease and substrate-binding proteins surrounding it. Most interestingly, the genes from organisms with highly diverse living environments, i.e., biomass degraders and animal pathogens of clostridia in our study, can be clearly classified into different topological groups on some connected components.</p></div

    Motifs related to genes in COC 27.

    No full text
    <p>The first motif is identified from the promoters of genes in biomass degraders; and the other one is for pathogen.</p

    Six typical connected operon components.

    No full text
    <p>The size of node is proportional to the number of genes in corresponding orthologous gene group, the larger the more genes. The color indicates the proportion of genes from biomass degraders or pathogens in this group, where red color means more biomass-degrader genes while blue color represents more pathogen genes. The weights of edges are shown as numbers on the components. COC #1 in (A) is the largest COC, which contains 58 nodes, most of the genes are related to porphyrin metabolism; COC #6 in (B) contains a long path structure and mainly contains ribosomal proteins; COC #13, #54, #29 in (C), (D) and (E) respectively form the star structure; and COC #27 in (F) shows the biomass-degrader genes and pathogen genes as different topological parts.</p

    Gene count and in-operon ratio for each organism.

    No full text
    <p>Genome IDs are listed in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0100999#pone-0100999-t001" target="_blank">Table 1</a> and the operons are retrieved from DOOR2.0 database.</p
    • …
    corecore