4,325 research outputs found

    On the weight of indels in genomic distances

    Get PDF
    Dias Vieira Braga M, Machado R, Ribeiro LC, Stoye J. On the weight of indels in genomic distances. BMC Bioinformatics. 2011;12(Suppl 9: RECOMB-CG 2011): S13.Background: Classical approaches to compute the genomic distance are usually limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. A few polynomial time algorithms that include genome rearrangements, insertions and deletions (or substitutions) were already proposed. These methods often allow a block of contiguous markers to be inserted, deleted or substituted at once but result in distance functions that do not respect the triangular inequality and hence do not constitute metrics. Results: In the present study we discuss the disruption of the triangular inequality in some of the available methods and give a framework to establish an efficient correction for two models recently proposed, one that includes insertions, deletions and double cut and join (DCJ) operations, and one that includes substitutions and DCJ operations. Conclusions: We show that the proposed framework establishes the triangular inequality in both distances, by summing a surcharge on indel operations and on substitutions that depends only on the number of markers affected by these operations. This correction can be applied a posteriori, without interfering with the already available formulas to compute these distances. We claim that this correction leads to distances that are biologically more plausible

    GISMO—gene identification using a support vector machine for ORF classification

    Get PDF
    We present the novel prokaryotic gene finder GISMO, which combines searches for protein family domains with composition-based classification based on a support vector machine. GISMO is highly accurate; exhibiting high sensitivity and specificity in gene identification. We found that it performs well for complete prokaryotic chromosomes, irrespective of their GC content, and also for plasmids as short as 10 kb, short genes and for genes with atypical sequence composition. Using GISMO, we found several thousand new predictions for the published genomes that are supported by extrinsic evidence, which strongly suggest that these are very likely biologically active genes. The source code for GISMO is freely available under the GPL license

    Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment

    Get PDF
    Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique in bioinformatics used to infer related residues among biological sequences. Thus alignment accuracy is crucial to a vast range of analyses, often in ways difficult to assess in those analyses. To compare the performance of different aligners and help detect systematic errors in alignments, a number of benchmarking strategies have been pursued. Here we present an overview of the main strategies--based on simulation, consistency, protein structure, and phylogeny--and discuss their different advantages and associated risks. We outline a set of desirable characteristics for effective benchmarking, and evaluate each strategy in light of them. We conclude that there is currently no universally applicable means of benchmarking MSA, and that developers and users of alignment tools should base their choice of benchmark depending on the context of application--with a keen awareness of the assumptions underlying each benchmarking strategy.Comment: Revie

    Louse (Insecta : Phthiraptera) mitochondrial 12S rRNA secondary structure is highly variable

    Get PDF
    Lice are ectoparasitic insects hosted by birds and mammals. Mitochondrial 12S rRNA sequences obtained from lice show considerable length variation and are very difficult to align. We show that the louse 12S rRNA domain III secondary structure displays considerable variation compared to other insects, in both the shape and number of stems and loops. Phylogenetic trees constructed from tree edit distances between louse 12S rRNA structures do not closely resemble trees constructed from sequence data, suggesting that at least some of this structural variation has arisen independently in different louse lineages. Taken together with previous work on mitochondrial gene order and elevated rates of substitution in louse mitochondrial sequences, the structural variation in louse 12S rRNA confirms the highly distinctive nature of molecular evolution in these insects

    BACCardI - a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison

    Get PDF
    Bartels D, Kespohl S, Albaum S, et al. BACCardI - a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison. Bioinformatics. 2005;21(7):853-859.Summary: We provide the graphical tool BACCardI for the construction of virtual clone maps from standard assembler output files or BLAST based sequence comparisons. This new tool has been applied to numerous genome projects to solve various problems including (a) validation of whole genome shotgun assemblies, (b) support for contig ordering in the finishing phase of a genome project, and (c) intergenome comparison between related strains when only one of the strains has been sequenced and a large insert library is available for the other. The BACCardI software can seamlessly interact with various sequence assembly packages. Motivation: Genomic assemblies generated from sequence information need to be validated by independent methods such as physical maps. The time-consuming task of building physical maps can be circumvented by virtual clone maps derived from read pair information of large insert libraries

    Genomic distance under gene substitutions

    Get PDF
    Dias Vieira Braga M, Machado R, Ribeiro LC, Stoye J. Genomic distance under gene substitutions. BMC Bioinformatics. 2011;12(Suppl 9: Proc. of RECOMB-CG 2011): S8.Background: The distance between two genomes is often computed by comparing only the common markers between them. Some approaches are also able to deal with non-common markers, allowing the insertion or the deletion of such markers. In these models, a deletion and a subsequent insertion that occur at the same position of the genome count for two sorting steps. Results: Here we propose a new model that sorts non-common markers with substitutions, which are more powerful operations that comprehend insertions and deletions. A deletion and an insertion that occur at the same position of the genome can be modeled as a substitution, counting for a single sorting step. Conclusions: Comparing genomes with unequal content, but without duplicated markers, we give a linear time algorithm to compute the genomic distance considering substitutions and double-cut-and-join (DCJ) operations. This model provides a parsimonious genomic distance to handle genomes free of duplicated markers, that is in practice a lower bound to the real genomic distances. The method could also be used to refine orthology assignments, since in some cases a substitution could actually correspond to an unannotated orthology

    The extraordinary evolutionary history of the reticuloendotheliosis viruses

    Get PDF
    The reticuloendotheliosis viruses (REVs) comprise several closely related amphotropic retroviruses isolated from birds. These viruses exhibit several highly unusual characteristics that have not so far been adequately explained, including their extremely close relationship to mammalian retroviruses, and their presence as endogenous sequences within the genomes of certain large DNA viruses. We present evidence for an iatrogenic origin of REVs that accounts for these phenomena. Firstly, we identify endogenous retroviral fossils in mammalian genomes that share a unique recombinant structure with REVs—unequivocally demonstrating that REVs derive directly from mammalian retroviruses. Secondly, through sequencing of archived REV isolates, we confirm that contaminated Plasmodium lophurae stocks have been the source of multiple REV outbreaks in experimentally infected birds. Finally, we show that both phylogenetic and historical evidence support a scenario wherein REVs originated as mammalian retroviruses that were accidentally introduced into avian hosts in the late 1930s, during experimental studies of P. lophurae, and subsequently integrated into the fowlpox virus (FWPV) and gallid herpesvirus type 2 (GHV-2) genomes, generating recombinant DNA viruses that now circulate in wild birds and poultry. Our findings provide a novel perspective on the origin and evolution of REV, and indicate that horizontal gene transfer between virus families can expand the impact of iatrogenic transmission events

    SAMHD1 enhances nucleosideanalogue efficacy against HIV-1 in myeloid cells

    Get PDF
    SAMHD1 is an intracellular enzyme that specifically degrades deoxynucleoside triphosphates into component nucleoside and inorganic triphosphate. In myeloid-derived dendritic cells and macrophages as well as resting T-cells, SAMHD1 blocks HIV-1 infection through this dNTP triphosphohydrolase activity by reducing the cellular dNTP pool to a level that cannot support productive reverse transcription. We now show that, in addition to this direct effect on virus replication, manipulating cellular SAMHD1 activity can significantly enhance or decrease the anti-HIV-1 efficacy of nucleotide analogue reverse transcription inhibitors presumably as a result of modulating dNTP pools that compete for recruitment by viral polymerases. Further, a variety of other nucleotide-based analogues, not normally considered antiretrovirals, such as the anti-herpes drugs Aciclovir and Ganciclovir and the anti-cancer drug Clofarabine are now revealed as potent anti-HIV-1 agents, under conditions of low dNTPs. This in turn suggests novel uses for nucleotide analogues to inhibit HIV-1 in differentiated cells low in dNTPs.This work was supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001042, FC001162, FC001178), the UK Medical Research Council (FC001042, FC001162, FC001178), and the Wellcome Trust (FC001042, FC001162, FC001178); and by the Wellcome Trust (108014/Z/15/Z and 108012/Z/15/Z)
    corecore