18 research outputs found

    Development of computational techniques for genomic data analysis and visualisation in model and non-model organisms

    Get PDF
    This thesis describes the work undertaken by the author between 2011 and 2018. With technological development, genome sequencing became affordable and accessible to the scientific communities. This led to the generation of an enormous amount of genomic data and bioinformatics tools to analyse and visualise these data. However, most of the public resources are designed for model organisms, and gold standard curated genomes. These tools are designed to run in a specifically configured environment as well as dependent on specific data formats. Chapter 1 of my thesis introduces the state of the field, the existing tools, their functionalities, and their limitations that prompted the software developments presented in the following chapters. In chapter 2, I discuss the TGAC Browser, an open-source genome browser and wigExplorer, a BioJS plugin to visualise expression data. In chapter 3, I move towards finding gene families using GeneSeqToFamily, a Galaxy workflow based on the EnsemblCompara GeneTree pipeline. In chapter 4, I focus on a tool developed for visualisation of gene families - Aequatus, an open-source homology browser and ViCTreeView, a plugin developed as a part of the ViCTree project to visualise and explore phylogenetic trees. In chapter 5, I discuss the availability and accessibility of these tools. All the tools and workflows I have developed are open-source, under a free licence, and are available in GitHub and/or the Galaxy ToolShed. I will also discuss the impact that these tools have made on various research projects. I also take this opportunity to discuss the possibilities of future developments of these tools

    ViCTree: an automated framework for taxonomic classification from protein sequences

    Get PDF
    Motivation: The increasing rate of submission of genetic sequences into public databases is providing a growing resource for classifying the organisms that these sequences represent. To aid viral classification, we have developed ViCTree, which automatically integrates the relevant sets of sequences in NCBI GenBank and transforms them into an interactive maximum likelihood phylogenetic tree that can be updated automatically. ViCTree incorporates ViCTreeView, which is a JavaScript-based visualisation tool that enables the tree to be explored interactively in the context of pairwise distance data. Results: To demonstrate utility, ViCTree was applied to subfamily Densovirinae of family Parvoviridae. This led to the identification of six new species of insect virus. Availability: ViCTree is open-source and can be run on any Linux- or Unix-based computer or cluster. A tutorial, the documentation and the source code are available under a GPL3 license, and can be accessed at http://bioinformatics.cvr.ac.uk/victree_web/

    PANC Study (Pancreatitis: A National Cohort Study): national cohort study examining the first 30 days from presentation of acute pancreatitis in the UK

    Get PDF
    Abstract Background Acute pancreatitis is a common, yet complex, emergency surgical presentation. Multiple guidelines exist and management can vary significantly. The aim of this first UK, multicentre, prospective cohort study was to assess the variation in management of acute pancreatitis to guide resource planning and optimize treatment. Methods All patients aged greater than or equal to 18 years presenting with acute pancreatitis, as per the Atlanta criteria, from March to April 2021 were eligible for inclusion and followed up for 30 days. Anonymized data were uploaded to a secure electronic database in line with local governance approvals. Results A total of 113 hospitals contributed data on 2580 patients, with an equal sex distribution and a mean age of 57 years. The aetiology was gallstones in 50.6 per cent, with idiopathic the next most common (22.4 per cent). In addition to the 7.6 per cent with a diagnosis of chronic pancreatitis, 20.1 per cent of patients had a previous episode of acute pancreatitis. One in 20 patients were classed as having severe pancreatitis, as per the Atlanta criteria. The overall mortality rate was 2.3 per cent at 30 days, but rose to one in three in the severe group. Predictors of death included male sex, increased age, and frailty; previous acute pancreatitis and gallstones as aetiologies were protective. Smoking status and body mass index did not affect death. Conclusion Most patients presenting with acute pancreatitis have a mild, self-limiting disease. Rates of patients with idiopathic pancreatitis are high. Recurrent attacks of pancreatitis are common, but are likely to have reduced risk of death on subsequent admissions. </jats:sec

    GeneSeqToFamily:A Galaxy workflow to find gene families based on the Ensembl Compara GeneTrees pipeline

    No full text
    Background: Gene duplication is a major factor contributing to evolutionary novelty, and the contraction or expansion of gene families has often been associated with morphological, physiological, and environmental adaptations. The study of homologous genes helps us to understand the evolution of gene families. It plays a vital role in finding ancestral gene duplication events as well as identifying genes that have diverged from a common ancestor under positive selection. There are various tools available, such as MSOAR, OrthoMCL, and HomoloGene, to identify gene families and visualize syntenic information between species, providing an overview of syntenic regions evolution at the family level. Unfortunately, none of them provide information about structural changes within genes, such as the conservation of ancestral exon boundaries among multiple genomes. The Ensembl GeneTrees computational pipeline generates gene trees based on coding sequences, provides details about exon conservation, and is used in the Ensembl Compara project to discover gene families. Findings: A certain amount of expertise is required to configure and run the Ensembl Compara GeneTrees pipeline via command line. Therefore, we converted this pipeline into a Galaxy workflow, called GeneSeqToFamily, and provided additional functionality. This workflow uses existing tools from the Galaxy ToolShed, as well as providing additional wrappers and tools that are required to run the workflow. Conclusions: GeneSeqToFamily represents the Ensembl GeneTrees pipeline as a set of interconnected Galaxy tools, so they can be run interactively within the Galaxy's user-friendly workflow environment while still providing the flexibility to tailor the analysis by changing configurations and tools if necessary. Additional tools allow users to subsequently visualize the gene families produced by the workflow, using the Aequatus.js interactive tool, which has been developed as part of the Aequatus software project

    Aequatus:An open-source homology browser

    No full text
    Phylogenetic information inferred from the study of homologous genes helps us to understand the evolution of genes and gene families, including the identification of ancestral gene duplication events as well as regions under positive or purifying selection within lineages. Gene family and orthogroup characterization enables the identification of syntenic blocks, which can then be visualized with various tools. Unfortunately, currently available tools display only an overview of syntenic regions as a whole, limited to the gene level, and none provide further details about structural changes within genes, such as the conservation of ancestral exon boundaries amongst multiple genomes. Findings: We present Aequatus, an open-source web-based tool that provides an in-depth view of gene structure across gene families, with various options to render and filter visualizations. It relies on precalculated alignment and gene feature information typically held in, but not limited to, the Ensembl Compara and Core databases. We also offer Aequatus.js, a reusable JavaScript module that fulfills the visualization aspects of Aequatus, available within the Galaxy web platform as a visualization plug-in, which can be used to visualize gene trees generated by the Gene Seq To Family workflow

    wigExplorer, a BioJS component to visualise wig data [version 3; referees: 1 approved, 2 approved with reservations, 1 not approved]

    No full text
    Summary: wigExplorer is a BioJS component whose main purpose is to provide a platform for visualisation of wig-formatted data. Wig files are extensively used by genome browsers such as the UCSC Genome Browser. wigExplorer follows the BioJS standard specification, requiring a simple configuration and installation. wigExplorer provides an easy way to navigate the visible region of the canvas and allows interaction with other components via predefined events. Availability: http://biojs.io/d/biojs-vis-wigexplorer;http://dx.doi.org/10.5281/zenodo.851

    GeneSeqToFamily.zip

    No full text
    Supporting data for GeneSeqToFamily, the Ensembl Compara GeneTrees pipeline as a Galaxy workflow<br

    Convergent loss of an EDS1/PAD4 signaling pathway in several plant lineages reveals coevolved components of plant immunity and drought response

    No full text
    Plant innate immunity relies on nucleotide binding leucine-rich repeat receptors (NLRs) that recognize pathogen-derived molecules and activate downstream signaling pathways. We analyzed the variation in NLR gene copy number and identified plants with a low number of NLR genes relative to sister species. We specifically focused on four plants from two distinct lineages, one monocot lineage (Alismatales) and one eudicot lineage (Lentibulariaceae). In these lineages, the loss of NLR genes coincides with loss of the well-known downstream immune signaling complex ENHANCED DISEASE SUSCEPTIBILITY 1 (EDS1)/PHYTOALEXIN DEFICIENT 4 (PAD4). We expanded our analysis across whole proteomes and found that other characterized immune genes were absent only in Lentibulariaceae and Alismatales. Additionally, we identified genes of unknown function that were convergently lost together with EDS1/PAD4 in five plant species. Gene expression analyses in Arabidopsis (Arabidopsis thaliana) and Oryza sativa revealed that several homologs of the candidates are differentially expressed during pathogen infection, drought, and abscisic acid treatment. Our analysis provides evolutionary evidence for the rewiring of plant immunity in some plant lineages, as well as the coevolution of the EDS1/PAD4 pathway and drought responses

    StatsDB: platform-agnostic storage and understanding of next generation sequencing run metrics

    No full text
    <p>StatsDB is an open-source software package for storage and analysis of next generation sequencing run metrics, allowing consolidated multi-faceted querying and visualisation of QC and primary analysis data via concise APIs in Java and Perl.</p
    corecore