73 research outputs found

    PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change

    Get PDF
    BACKGROUND: Phylogenies are commonly used to analyse the differences between genes, genomes and species. Patristic distances calculated from tree branch lengths describe the amount of genetic change represented by a tree and are commonly compared with other measures of mutation to investigate the substitutional processes or the goodness of fit of a tree to the raw data. Up until now no universal tool has been available for calculating patristic distances and correlating them with other genetic distance measures. RESULTS: PATRISTICv1.0 is a java program that calculates patristic distances from large trees in a range of file formats and allows graphical and statistical interpretation of distance matrices calculated by other programs. CONCLUSION: The software overcomes some logistic barriers to analysing signals in sequences. In additional to calculating patristic distances, it provides plots for any combination of matrices, calculates commonly used statistics, allows data such as isolation dates to be entered and reorders matrices with matching species or gene labels. It will be used to analyse rates of mutation and substitutional saturation and the evolution of viruses. It is available at and requires the Java runtime environment

    Fidelity of Hyperbolic Space for Bayesian Phylogenetic Inference

    Full text link
    Bayesian inference for phylogenetics is a gold standard for computing distributions of phylogenies. It faces the challenging problem of. moving throughout the high-dimensional space of trees. However, hyperbolic space offers a low dimensional representation of tree-like data. In this paper, we embed genomic sequences into hyperbolic space and perform hyperbolic Markov Chain Monte Carlo for Bayesian inference. The posterior probability is computed by decoding a neighbour joining tree from proposed embedding locations. We empirically demonstrate the fidelity of this method on eight data sets. The sampled posterior distribution recovers the splits and branch lengths to a high degree. We investigated the effects of curvature and embedding dimension on the Markov Chain's performance. Finally, we discuss the prospects for adapting this method to navigate tree space with gradients

    Automatic differentiation is no panacea for phylogenetic gradient computation

    Full text link
    Gradients of probabilistic model likelihoods with respect to their parameters are essential for modern computational statistics and machine learning. These calculations are readily available for arbitrary models via automatic differentiation implemented in general-purpose machine-learning libraries such as TensorFlow and PyTorch. Although these libraries are highly optimized, it is not clear if their general-purpose nature will limit their algorithmic complexity or implementation speed for the phylogenetic case compared to phylogenetics-specific code. In this paper, we compare six gradient implementations of the phylogenetic likelihood functions, in isolation and also as part of a variational inference procedure. We find that although automatic differentiation can scale approximately linearly in tree size, it is much slower than the carefully-implemented gradient calculation for tree likelihood and ratio transformation operations. We conclude that a mixed approach combining phylogenetic libraries with machine learning libraries will provide the optimal combination of speed and model flexibility moving forward.Comment: 15 pages and 2 figures in main text, plus supplementary material

    Escherichia coli ST8196 is a novel, locally evolved, and extensively drug resistant pathogenic lineage within the ST131 clonal complex

    Get PDF
    The H30Rx subclade of Escherichia coli ST131 is a clinically important, globally dispersed pathogenic lineage that typically displays resistance to fluoroquinolones and extended spectrum β-lactams. Isolates EC233 and EC234, variants of ST131-H30Rx with a novel sequence type (ST) 8196, isolated from unrelated patients presenting with bacteraemia at a Sydney Hospital in 2014 are characterised here. EC233 and EC234 are phylogroup B2, serotype O25:H4A, and resistant to ampicillin, amoxicillin, cefoxitin, ceftazidime, ceftriaxone, ciprofloxacin, norfloxacin and gentamicin and are likely clonal. Both harbour an IncFII_2 plasmid (pSPRC_Ec234-FII) that carries most of the resistance genes on an IS26 associated translocatable unit, two small plasmids and a novel IncI1 plasmid (pSPRC_Ec234-I). SNP-based phylogenetic analysis of the core genome of representatives within the ST131 clonal complex places both isolates in a subclade with three clinical Australian ST131-H30Rx clade-C isolates. A MrBayes phylogeny analysis of EC233 and EC234 indicates ST8196 share a most recent common ancestor with ST131-H30Rx strain EC70 isolated from the same hospital in 2013. Our study identified genomic hallmarks that define the ST131-H30Rx subclade in the ST8196 isolates and highlights a need for unbiased genomic surveillance approaches to identify novel high-risk MDR E. coli pathogens that impact healthcare facilities

    Molecular epidemiology of clade 1 influenza A viruses (H5N1), southern Indochina Peninsula, 2004-2007

    Get PDF
    To determine the origin of influenza A virus (H5N1) epizootics in Cambodia, we used maximum-likelihood and Bayesian methods to analyze the genetic sequences of subtype H5N1 strains from Cambodia and neighboring areas. Poultry movements, rather than repeated reintroduction of subtype H5N1 viruses by wild birds, appear to explain virus circulation and perpetuation

    A comparison of common programming languages used in bioinformatics

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The performance of different programming languages has previously been benchmarked using abstract mathematical algorithms, but not using standard bioinformatics algorithms. We compared the memory usage and speed of execution for three standard bioinformatics methods, implemented in programs using one of six different programming languages. Programs for the Sellers algorithm, the Neighbor-Joining tree construction algorithm and an algorithm for parsing BLAST file outputs were implemented in C, C++, C#, Java, Perl and Python.</p> <p>Results</p> <p>Implementations in C and C++ were fastest and used the least memory. Programs in these languages generally contained more lines of code. Java and C# appeared to be a compromise between the flexibility of Perl and Python and the fast performance of C and C++. The relative performance of the tested languages did not change from Windows to Linux and no clear evidence of a faster operating system was found.</p> <p>Source code and additional information are available from <url>http://www.bioinformatics.org/benchmark/</url></p> <p>Conclusion</p> <p>This benchmark provides a comparison of six commonly used programming languages under two different operating systems. The overall comparison shows that a developer should choose an appropriate language carefully, taking into account the performance expected and the library availability for each language.</p

    The VirusBanker database uses a Java program to allow flexible searching through Bunyaviridae sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Viruses of the <it>Bunyaviridae </it>have segmented negative-stranded RNA genomes and several of them cause significant disease. Many partial sequences have been obtained from the segments so that GenBank searches give complex results. Sequence databases usually use HTML pages to mediate remote sorting, but this approach can be limiting and may discourage a user from exploring a database.</p> <p>Results</p> <p>The VirusBanker database contains <it>Bunyaviridae </it>sequences and alignments and is presented as two spreadsheets generated by a Java program that interacts with a MySQL database on a server. Sequences are displayed in rows and may be sorted using information that is displayed in columns and includes data relating to the segment, gene, protein, species, strain, sequence length, terminal sequence and date and country of isolation. <it>Bunyaviridae </it>sequences and alignments may be downloaded from the second spreadsheet with titles defined by the user from the columns, or viewed when passed directly to the sequence editor, Jalview.</p> <p>Conclusion</p> <p>VirusBanker allows large datasets of aligned nucleotide and protein sequences from the <it>Bunyaviridae </it>to be compiled and winnowed rapidly using criteria that are formulated heuristically.</p
    • …
    corecore