518 research outputs found

    Unbiased taxonomic annotation of metagenomic samples

    Get PDF
    The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then, classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this paper, we show that the Rand index is a better indicator of classification error than the often used area under the ROC curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time.Peer ReviewedPostprint (author's final draft

    ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

    Get PDF
    BACKGROUND: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems – hence the need to develop novel strategies. RESULTS: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. CONCLUSION: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at

    Single cell transcriptomics reveals specific RNA editing signatures in the human brain

    Get PDF
    While RNA editing by A-to-I deamination is a requisite for neuronal function in humans, it is under investigated in single cells. Here we fill this gap by analysing RNA editing profiles of single cells from the brain cortex of living human subjects. We show that RNA editing levels per cell are bimodally distributed and distinguish between major brain cell types thus providing new insights into neuronal dynamics

    Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding.</p> <p>Results</p> <p>Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score.</p> <p>Conclusion</p> <p>We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.</p

    Glutamine synthetase gene evolution in bacteria.

    Get PDF
    The evolution of the prokaryotic glutamine synthase (GS) genes, namely the GSI and GSII isoforms, has been investigated using the second codon positions, which have previously proven to behave as a good molecular clock. Our data confirm the early divergence between prokaryotic and eukaryotic GSII before the splitting between plants and animals. The phylogenetic tree of the GSI isoforms shows Archaebacteria to be more closely related to Eubacteria than to Eukaryotes. This finding is confirmed by the phylogenetic analysis carried out on both large and small subunits of rRNA. However, differently from the rRNA analyses, Crenarchaeota and Euryarchaeota Archaebacteria, as well as high- and low-GC gram-positive bacteria, appear to be polyphyletic. We provide evidence that the observed polyphyly of Archaebacteria might be only apparent, resulting from a gene duplication event preceding the split between Archaebacteria and Eubacteria and followed by the retention of only one isoform in the extant lineages. Both gram-negative bacteria and high-GC gram-positive bacteria, which appear closely related, have GS activity regulated by an adenylylation/deadenylylation mechanism. A lateral gene transfer from Archaebacteria to low-GC eubacteria is invoked to explain the observed polyphyly of gram-positive bacteria

    The mitochondrial genome of Phallusia mammillata and Phallusia fumigata (Tunicata, Ascidiacea): high genome plasticity at intra-genus level

    Get PDF
    Background: Within Chordata, the subphyla Vertebrata and Cephalochordata (Iancelets) are characterized by a remarkable stability of the mitochondrial (mt) genome, with constancy of gene content and almost invariant gene order, whereas the limited mitochondrial data on the subphylum Tunicata suggest frequent and extensive gene rearrangements, observed also within ascidians of the same genus. Results: To confirm this evolutionary trend and to better understand the evolutionary dynamics of the mitochondrial genome in Tunicata Ascidiacea, we have sequenced and characterized the complete mt genome of two congeneric ascidian species, Phallusia mammillata and Phallusia fumigata (Phlebobranchiata, Ascidiidae). The two mtDNAs are surprisingly rearranged, both with respect to one another and relative to those of other tunicates and chordates, with gene rearrangements affecting both protein-coding and tRNA genes. The new data highlight the extraordinary variability of ascidian mt genome in base composition, tRNA secondary structure, tRNA gene content, and non-coding regions (number, size, sequence and location). Indeed, both Phallusia genomes lack the trnD gene, show loss/acquisition of DHU-arm in two tRNAs, and have a G+C content two-fold higher than other ascidians. Moreover, the mt genome of P. fumigata presents two identical copies of trnI, an extra tRNA gene with uncertain amino acid specificity, and four almost identical sequence regions. In addition, a truncated cytochrome b, lacking a C-terminal tail that commonly protrudes into the mt matrix, has been identified as a new mt feature probably shared by all tunicates. Conclusion: The frequent occurrence of major gene order rearrangements in ascidians both at high taxonomic level and within the same genus makes this taxon an excellent model to study the mechanisms of gene rearrangement, and renders the mt genome an invaluable phylogenetic marker to investigate molecular biodiversity and speciation events in this largely unexplored group of basal chordates

    Huntingtin gene evolution in Chordata and its peculiar features in the ascidian Ciona genus

    Get PDF
    BACKGROUND: To gain insight into the evolutionary features of the huntingtin (htt) gene in Chordata, we have sequenced and characterized the full-length htt mRNA in the ascidian Ciona intestinalis, a basal chordate emerging as new invertebrate model organism. Moreover, taking advantage of the availability of genomic and EST sequences, the htt gene structure of a number of chordate species, including the cogeneric ascidian Ciona savignyi, and the vertebrates Xenopus and Gallus was reconstructed. RESULTS: The C. intestinalis htt transcript exhibits some peculiar features, such as spliced leader trans-splicing in the 98 nt-long 5' untranslated region (UTR), an alternative splicing in the coding region, eight alternative polyadenylation sites, and no similarities of both 5' and 3'UTRs compared to homologs of the cogeneric C. savignyi. The predicted protein is 2946 amino acids long, shorter than its vertebrate homologs, and lacks the polyQ and the polyP stretches found in the the N-terminal regions of mammalian homologs. The exon-intron organization of the htt gene is almost identical among vertebrates, and significantly conserved between Ciona and vertebrates, allowing us to hypothesize an ancestral chordate gene consisting of at least 40 coding exons. CONCLUSION: During chordate diversification, events of gain/loss, sliding, phase changes, and expansion of introns occurred in both vertebrate and ascidian lineages predominantly in the 5'-half of the htt gene, where there is also evidence of lineage-specific evolutionary dynamics in vertebrates. On the contrary, the 3'-half of the gene is highly conserved in all chordates at the level of both gene structure and protein sequence. Between the two Ciona species, a fast evolutionary rate and/or an early divergence time is suggested by the absence of significant similarity between UTRs, protein divergence comparable to that observed between mammals and fishes, and different distribution of repetitive elements

    The evolution of the mitochondrial D-loop region and the origin of modern man.

    Get PDF
    The origin of modern man is a highly debated issue that has recently been tackled by using mitochondrial DNA sequences. The limited genetic variability of human mtDNA has been explained in terms of a recent common genetic ancestry, thus implying that all modern-population mtDNAs originated from a single woman who lived in Africa less than 0.2 Mya. This divergence time is based on both the estimation of the rate of mtDNA change and its calibration date. Because different estimates of the rate of mtDNA evolution can completely change the scenario of the origin of modern man, we have reanalyzed the available mitochondrial sequence data by using an improved version of the statistical model, the "Markov clock," devised in our laboratory. Our analysis supports the African origin of modern man, but we found that the ancestral female from which all extant human mtDNAs originated lived in a time span of 0.3-0.8 Mya. Pushing back the date of the deepest root of the human implies that the earliest divergence would have been in the Homo erectus population
    • …
    corecore