1,449 research outputs found

    Unbiased taxonomic annotation of metagenomic samples

    Get PDF
    The classification of reads from a metagenomic sample using a reference taxonomy is usually based on first mapping the reads to the reference sequences and then, classifying each read at a node under the lowest common ancestor of the candidate sequences in the reference taxonomy with the least classification error. However, this taxonomic annotation can be biased by an imbalanced taxonomy and also by the presence of multiple nodes in the taxonomy with the least classification error for a given read. In this paper, we show that the Rand index is a better indicator of classification error than the often used area under the ROC curve and F-measure for both balanced and imbalanced reference taxonomies, and we also address the second source of bias by reducing the taxonomic annotation problem for a whole metagenomic sample to a set cover problem, for which a logarithmic approximation can be obtained in linear time.Peer ReviewedPostprint (author's final draft

    Productivity and innovation in UK financial services: an intangible assets approach

    No full text
    Working Pape

    A Support Vector Machine for the Discrimination of MicroRNA Precursors from Other Genomic Hairpin Structures

    Get PDF
    Motivation: MicroRNAs (miRNAs) are endogenous, small (~ 20 nt), single-stranded, non-coding RNAs (ncRNAs) that result from the nuclear and cytoplasmic processing of transcribed precursor hairpin structures. They are increasingly recognized as playing crucial roles as post-transcriptional antisense regulators of gene expression through regulation of mRNA stability or translational efficiency. miRNAs, first reported in Caenorhabditis elegans, have been identified in the genomes of most higher organisms, including worms, flies, plants, mammals and recently in viruses. Functional studies have shown that miRNAs play important roles in processes such as, cell proliferation, fat metabolism, apoptosis, neuronal cell fate, insulin secretion, haematopoietic differentiation and developmental regulation. The detection of homologs of known miRNAs through comparative genomic approaches has proved relatively tractable. However, the ab-initio prediction of miRNA precursors through computational methods poses several additional difficulties, not least the fact that not all thermodynamically plausible transcribed hairpins are processed to yield mature miRNAs. It has not until now been possible to identify conserved sequence or structural elements that define consensus recognition elements for the enzymes that process miRNA precursors. In the light of these observations we wished to develop and improve methods for the discrimination of true miRNA precursor hairpins from spurious hairpins Methods: We have developed a SVM (Support Vector Machine) that considers up to 74 features associated with the primary and secondary structures and thermodynamic characteristics of candidate hairpin structures. We use a standard heuristic approach to optimize combinations of features used and train the SVM with sets of characterized hairpin miRNA precursors and known non-miRNA hairpins. Results: Our SVM shows highly promising results in the discrimination of true miRNA precursors from \u201cspurious\u201d hairpins (typically around 95% sensitivity) in various species. In particular, our levels of false positive predictions appear to be low relative to comparable methods

    Detection of a-to-i rna editing in sars-cov-2

    Get PDF
    ADAR1-mediated deamination of adenosines in long double-stranded RNAs plays an important role in modulating the innate immune response. However, recent investigations based on metatranscriptomic samples of COVID-19 patients and SARS-COV-2-infected Vero cells have recovered contrasting findings. Using RNAseq data from time course experiments of infected human cell lines and transcriptome data from Vero cells and clinical samples, we prove that A-to-G changes observed in SARS-COV-2 genomes represent genuine RNA editing events, likely mediated by ADAR1. While the A-to-I editing rate is generally low, changes are distributed along the entire viral genome, are overrepresented in exonic regions, and are (in the majority of cases) nonsynonymous. The impact of RNA editing on virus–host interactions could be relevant to identify potential targets for therapeutic interventions

    ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences

    Get PDF
    BACKGROUND: Currently available methods to predict splice sites are mainly based on the independent and progressive alignment of transcript data (mostly ESTs) to the genomic sequence. Apart from often being computationally expensive, this approach is vulnerable to several problems – hence the need to develop novel strategies. RESULTS: We propose a method, based on a novel multiple genome-EST alignment algorithm, for the detection of splice sites. To avoid limitations of splice sites prediction (mainly, over-predictions) due to independent single EST alignments to the genomic sequence our approach performs a multiple alignment of transcript data to the genomic sequence based on the combined analysis of all available data. We recast the problem of predicting constitutive and alternative splicing as an optimization problem, where the optimal multiple transcript alignment minimizes the number of exons and hence of splice site observations. We have implemented a splice site predictor based on this algorithm in the software tool ASPIC (Alternative Splicing PredICtion). It is distinguished from other methods based on BLAST-like tools by the incorporation of entirely new ad hoc procedures for accurate and computationally efficient transcript alignment and adopts dynamic programming for the refinement of intron boundaries. ASPIC also provides the minimal set of non-mergeable transcript isoforms compatible with the detected splicing events. The ASPIC web resource is dynamically interconnected with the Ensembl and Unigene databases and also implements an upload facility. CONCLUSION: Extensive bench marking shows that ASPIC outperforms other existing methods in the detection of novel splicing isoforms and in the minimization of over-predictions. ASPIC also requires a lower computation time for processing a single gene and an EST cluster. The ASPIC web resource is available at

    How much does the UK employ, spend and invest in design?

    No full text
    Working Pape

    Accurate discrimination of conserved coding and non-coding regions through multiple indicators of evolutionary dynamics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The conservation of sequences between related genomes has long been recognised as an indication of functional significance and recognition of sequence homology is one of the principal approaches used in the annotation of newly sequenced genomes. In the context of recent findings that the number non-coding transcripts in higher organisms is likely to be much higher than previously imagined, discrimination between conserved coding and non-coding sequences is a topic of considerable interest. Additionally, it should be considered desirable to discriminate between coding and non-coding conserved sequences without recourse to the use of sequence similarity searches of protein databases as such approaches exclude the identification of novel conserved proteins without characterized homologs and may be influenced by the presence in databases of sequences which are erroneously annotated as coding.</p> <p>Results</p> <p>Here we present a machine learning-based approach for the discrimination of conserved coding sequences. Our method calculates various statistics related to the evolutionary dynamics of two aligned sequences. These features are considered by a Support Vector Machine which designates the alignment coding or non-coding with an associated probability score.</p> <p>Conclusion</p> <p>We show that our approach is both sensitive and accurate with respect to comparable methods and illustrate several situations in which it may be applied, including the identification of conserved coding regions in genome sequences and the discrimination of coding from non-coding cDNA sequences.</p

    Single cell transcriptomics reveals specific RNA editing signatures in the human brain

    Get PDF
    While RNA editing by A-to-I deamination is a requisite for neuronal function in humans, it is under investigated in single cells. Here we fill this gap by analysing RNA editing profiles of single cells from the brain cortex of living human subjects. We show that RNA editing levels per cell are bimodally distributed and distinguish between major brain cell types thus providing new insights into neuronal dynamics

    Glutamine synthetase gene evolution in bacteria.

    Get PDF
    The evolution of the prokaryotic glutamine synthase (GS) genes, namely the GSI and GSII isoforms, has been investigated using the second codon positions, which have previously proven to behave as a good molecular clock. Our data confirm the early divergence between prokaryotic and eukaryotic GSII before the splitting between plants and animals. The phylogenetic tree of the GSI isoforms shows Archaebacteria to be more closely related to Eubacteria than to Eukaryotes. This finding is confirmed by the phylogenetic analysis carried out on both large and small subunits of rRNA. However, differently from the rRNA analyses, Crenarchaeota and Euryarchaeota Archaebacteria, as well as high- and low-GC gram-positive bacteria, appear to be polyphyletic. We provide evidence that the observed polyphyly of Archaebacteria might be only apparent, resulting from a gene duplication event preceding the split between Archaebacteria and Eubacteria and followed by the retention of only one isoform in the extant lineages. Both gram-negative bacteria and high-GC gram-positive bacteria, which appear closely related, have GS activity regulated by an adenylylation/deadenylylation mechanism. A lateral gene transfer from Archaebacteria to low-GC eubacteria is invoked to explain the observed polyphyly of gram-positive bacteria

    The mitochondrial genome of Phallusia mammillata and Phallusia fumigata (Tunicata, Ascidiacea): high genome plasticity at intra-genus level

    Get PDF
    Background: Within Chordata, the subphyla Vertebrata and Cephalochordata (Iancelets) are characterized by a remarkable stability of the mitochondrial (mt) genome, with constancy of gene content and almost invariant gene order, whereas the limited mitochondrial data on the subphylum Tunicata suggest frequent and extensive gene rearrangements, observed also within ascidians of the same genus. Results: To confirm this evolutionary trend and to better understand the evolutionary dynamics of the mitochondrial genome in Tunicata Ascidiacea, we have sequenced and characterized the complete mt genome of two congeneric ascidian species, Phallusia mammillata and Phallusia fumigata (Phlebobranchiata, Ascidiidae). The two mtDNAs are surprisingly rearranged, both with respect to one another and relative to those of other tunicates and chordates, with gene rearrangements affecting both protein-coding and tRNA genes. The new data highlight the extraordinary variability of ascidian mt genome in base composition, tRNA secondary structure, tRNA gene content, and non-coding regions (number, size, sequence and location). Indeed, both Phallusia genomes lack the trnD gene, show loss/acquisition of DHU-arm in two tRNAs, and have a G+C content two-fold higher than other ascidians. Moreover, the mt genome of P. fumigata presents two identical copies of trnI, an extra tRNA gene with uncertain amino acid specificity, and four almost identical sequence regions. In addition, a truncated cytochrome b, lacking a C-terminal tail that commonly protrudes into the mt matrix, has been identified as a new mt feature probably shared by all tunicates. Conclusion: The frequent occurrence of major gene order rearrangements in ascidians both at high taxonomic level and within the same genus makes this taxon an excellent model to study the mechanisms of gene rearrangement, and renders the mt genome an invaluable phylogenetic marker to investigate molecular biodiversity and speciation events in this largely unexplored group of basal chordates
    • …
    corecore