11 research outputs found

    Applications of Next-Generation Sequencing Technologies to Diagnostic Virology

    Get PDF
    Novel DNA sequencing techniques, referred to as “next-generation” sequencing (NGS), provide high speed and throughput that can produce an enormous volume of sequences with many possible applications in research and diagnostic settings. In this article, we provide an overview of the many applications of NGS in diagnostic virology. NGS techniques have been used for high-throughput whole viral genome sequencing, such as sequencing of new influenza viruses, for detection of viral genome variability and evolution within the host, such as investigation of human immunodeficiency virus and human hepatitis C virus quasispecies, and monitoring of low-abundance antiviral drug-resistance mutations. NGS techniques have been applied to metagenomics-based strategies for the detection of unexpected disease-associated viruses and for the discovery of novel human viruses, including cancer-related viruses. Finally, the human virome in healthy and disease conditions has been described by NGS-based metagenomics

    An investigation into the biosynthesis of proximicins

    Get PDF
    PhD ThesisThe proximicins are a family of three compounds – A-C – produced by two marine Actinomycete Verrucosispora strains – V. maris AB18-032 and V. sp. str. 37 - and are characterised by the presence of 2,4-disubstituted furan rings. Proximicins demonstrate cell-arresting and antimicrobial ability, making them interesting leads for clinical drug development. Proximicin research has been largely overshadowed by other Verrucosispora strain secondary metabolites (SM), and despite the publication of the V. maris AB18-032 draft, the enzymatic machinery responsible for their production has not been established. It has been noted in related research into a pyrrole-containing homolog – congocidine –due to the structural similarity exhibited, proximicins likely utilise a similar biosynthetic route. The initial aim of this research was to confirm the presumed pathway to proximicin biosynthesis. Following the sequencing, assembly and annotation of the second proximicin producer, Verrucosispora sp. str. MG37, and genome mining of V. maris AB18-032, no common clusters mimicked that of congocidine, casting doubt on the previously assumed analogous biosynthetic routes. A putative proximicin biosynthesis (ppb) cluster was identified, containing non-ribosomal peptide synthetase (NRPS) enzymes, exhibiting some homology with congocidine. NRPSsystems represent a network of interacting proteins, which act as a SM assembly line: crucially, adenylation (A)- domain enzymes act as the ‘gate-keeper’, determining which precursors are included into the elongating peptide. To elucidate the route to proximicins, activity characterisation of the four A-domains present in ppb cluster was attempted. The A-domain Ppb120 was shown to possess novel activity, demonstrating a high promiscuity towards heterocycle containing precursors, in addition to the absence of an apparent essential domain. This discovery refutes previous work outlining the core residues which dictate A-domain activity, while also presenting a facile route to novel heterocycle-containing compounds. Despite extensive work, A-domains ppb195 and ppb210, were ineffectively purified in the active form – informing future work into A-domains activity characterisation. Finally, the ppb220 A-domain which lies at the border of ppb, was inactive suggesting over-estimation of the cluster margins. To confirm ppb220 redundancy and confirm ppb boundaries, CRISPR/Cas gene editing studies were done. The gene responsible for the orange pigment of Verrucosispora strains was initially targeted and successfully deleted, and ppb studies commenced. The research here refutes the previously presumed route to proximicin biosynthesis; the ppb cluster instead comprises enzymes exhibiting unique activity and structure. The findings represent the foundations for allowing exploitation of chemistry exhibited within the proximicin family. The novelty exhibited can be utilised in the search for antimicrobial clinical leads, by allowing the production of compounds containing previously inaccessible heterocycle chemistry

    Novel Molecular Strategies for the Detection and Characterization of Tick-Borne Pathogens in Domestic Dogs

    Get PDF
    Tick-borne diseases (TBD) are common across the United States and can result in critical and chronic disease states in a variety of veterinary patients, specifically domesticated dogs. Borreliosis, anaplasmosis, rickettsiosis, ehrlichiosis, and babesiosis have been cited as the most common TBDs. Despite recent reports revealing past exposure of TBD, there are no molecular epidemiological reports for dogs in Texas. Therefore, data to support the level of actively infected dogs in the population is inadequate. Limited molecular data for TBDs is due, in part, to the lack of consolidated molecular tools available to researchers. Real-time PCR (qPCR) assays are a commonly utilized tool for molecular detection of TBDs, and achieve species specificity by assigning each pathogen a unique fluorogenic label. However, current limitations of qPCR instruments include restricting the number of fluorogenic labels that can be differentiated by the instrument per a given reaction. As such, this dissertation explored the development of a qPCR methodology, termed layerplexing, that would allow for the simultaneous detection and characterization of 11 pathogens responsible for causing common TBDs in domestic dogs. Additionally, an endogenous internal positive control was designed and integrated into the assay for quality assurance of attained molecular results. Analysis revealed that the layerplex assay format was comparable in terms of target sensitivity and specificity to other qPCR assays utilized in the field. The layerplex assay was then applied to conducting a molecular prevalence investigation of TBDs affecting dogs across Texas ecoregions. By conducting molecular prevalence studies for TBDs, updated rates of active exposure and specific regions that may contain sentinels of disease could be identified. Results obtained indicated molecular prevalence of borrelial, rickettsial, and babesial pathogens varied across the Texas study area and indicated specific regions where susceptible hosts may be at higher risk for infection with TBD. Furthermore, the layerplex assay lead to the first reported molecular detection of Anaplasma platys in Texas and coinfection with Ehrlichia canis and A. platys in Texas dogs. Overall, findings from this dissertation provided substantial evidence that the layerplex technique can be utilized for grouping multiple targets under a single fluorogenic label without impeding diagnostic efficacy. The layerplex technique also demonstrated utility in facilitating large scale molecular analyses of animals in Texas. Surveillance data obtained from this study may aid public health agencies in updating maps depicting high-risk areas of disease and provide baseline data for future research aiming to characterize TBDs in additional animal models

    The development of computational methods for large-scale comparisons and analyses of genome evolution

    Get PDF
    The last four decades have seen the development of a number of experimental methods for the deduction of the whole genome sequences of an ever-increasing number of organisms. These sequences have in the first instance, allowed their investigators the opportunity to examine the molecular primary structure of areas of scientific interest, but with the increased sampling of organisms across the phylogenetic tree and the improved quality and coverage of genome sequences and their associated annotations, the opportunity to undertake detailed comparisons both within and between taxonomic groups has presented itself. The work described in this thesis details the application of comparative bioinformatics analyses on inter- and intra-genomic datasets, to elucidate those genomic changes, which may underlie organismal adaptations and contribute to changes in the complexity of genome content and structure over time. The results contained herein demonstrate the power and flexibility of the comparative approach, utilising whole genome data, to elucidate the answers to some of the most pressing questions in the biological sciences today.As the volume of genomic data increases, both as a result of increased sampling of the tree of life and due to an increase in the quality and throughput of the sequencing methods, it has become clear that there is a necessity for computational analyses of these data. Manual analysis of this volume of data, which can extend beyond petabytes of storage space, is now impossible. Automated computational pipelines are therefore required to retrieve, categorise and analyse these data. Chapter two discusses the development of a computational pipeline named the Genome Comparison and Analysis Toolkit (GCAT). The pipeline was developed using the Perl programming language and is tightly integrated with the Ensembl Perl API allowing for the retrieval and analyses of their rich genomic resources. In the first instance the pipeline was tested for its robustness by retrieving and describing various components of genomic architecture across a number of taxonomic groups. Additionally, the need for programmatically independent means of accessing data and in particular the need for Semantic Web based protocols and tools for the sharing of genomics resources is highlighted. This is not just for the requirements of researchers, but for improved communication and sharing between computational infrastructure. A prototype Ensembl REST web service was developed in collaboration with the European Bioinformatics Institute (EBI) to provide a means of accessing Ensembl’s genomic data without having to rely on their Perl API. A comparison of the runtime and memory usage of the Ensembl Perl API and prototype REST API were made relative to baseline raw SQL queries, which highlights the overheads inherent in building wrappers around the SQL queries. Differences in the efficiency of the approaches were highlighted, and the importance of investing in the development of Semantic Web technologies as a tool to improve access to data for the wider scientific community are discussed.Data highlighted in chapter two led to the identification of relative differences in the intron structure of a number of organisms including teleost fish. Chapter three encompasses a published, peer-reviewed study. Inter-genomic comparisons were undertaken utilising the 5 available teleost genome sequences in order to examine and describe their intron content. The number and sizes of introns were compared across these fish and a frequency distribution of intron size was produced that identified a novel expansion in the Zebrafish lineage of introns in the size range of approximately 500-2,000 bp. Further hypothesis driven analyses of the introns across the whole distribution of intron sizes identified that the majority, but not all of the introns were largely comprised of repetitive elements. It was concluded that the introns in the Zebrafish peak were likely the result of an ancient expansion of repetitive elements that had since degraded beyond the ability of computational algorithms to identify them. Additional sampling throughout the teleost fish lineage will allow for more focused phylogenetically driven analyses to be undertaken in the future.In chapter four phylogenetic comparative analyses of gene duplications were undertaken across primate and rodent taxonomic groups with the intention of identifying significantly expanded or contracted gene families. Changes in the size of gene families may indicate adaptive evolution. A larger number of expansions, relative to time since common ancestor, were identified in the branch leading to modern humans than in any other primate species. Due to the unique nature of the human data in terms of quantity and quality of annotation, additional analyses were undertaken to determine whether the expansions were methodological artefacts or real biological changes. Novel approaches were developed to test the validity of the data including comparisons to other highly annotated genomes. No similar expansion was seen in mouse when comparing with rodent data, though, as assemblies and annotations were updated, there were differences in the number of significant changes, which brings into question the reliability of the underlying assembly and annotation data. This emphasises the importance of an understanding that computational predictions, in the absence of supporting evidence, may be unlikely to represent the actual genomic structure, and instead be more an artefact of the software parameter space. In particular, significant shortcomings are highlighted due to the assumptions and parameters of the models used by the CAFE gene family analysis software. We must bear in mind that genome assemblies and annotations are hypotheses that themselves need to be questioned and subjected to robust controls to increase the confidence in any conclusions that can be drawn from them.In addition functional genomics analyses were undertaken to identify the role of significantly changed genes and gene families in primates, testing against a hypothesis that would see the majority of changes involving immune, sensory or reproductive genes. Gene Ontology (GO) annotations were retrieved for these data, which enabled highlighting the broad GO groupings and more specific functional classifications of these data. The results showed that the majority of gene expansions were in families that may have arisen due to adaptation, or were maintained due to their necessary involvement in developmental and metabolic processes. Comparisons were made to previously published studies to determine whether the Ensembl functional annotations were supported by the de-novo analyses undertaken in those studies. The majority were not, with only a small number of previously identified functional annotations being present in the most recent Ensembl releases.The impact of gene family evolution on intron evolution was explored in chapter five, by analysing gene family data and intron characteristics across the genomes of 61 vertebrate species. General descriptive statistics and visualisations were produced, along with tests for correlation between change in gene family size and the number, size and density of their associated introns. There was shown to be very little impact of change in gene family size on the underlying intron evolution. Other, non-family effects were therefore considered. These analyses showed that introns were restricted to euchromatic regions, with heterochromatic regions such as the centromeres and telomeres being largely devoid of any such features. A greater involvement of spatial mechanisms such as recombination, GC-bias across GC-rich isochores and biased gene conversion was thus proposed to play more of a role, though depending largely on population genetic and life history traits of the organisms involved. Additional population level sequencing and comparative analyses across a divergent group of species with available recombination maps and life history data would be a useful future direction in understanding the processes involved

    Plataforma de supercomputación para bioinformática

    Get PDF
    En el año 2007 la Universidad de Málaga amplió y trasladó sus recursos de cálculo a un nuevo centro dedicado exclusivamente a la investigación: el edificio de Supercomputación y Bioinnovación sito en el Parque Tecnológico de Andalucía. Este edificio albergaría también la Plataforma Andaluza de Bioinformática junto con otras unidades y laboratorios con instrumentación muy especializada. Desde aquel momento he trabajado como administrador de los recursos de supercomputación del centro y como parte del equipo bioinformático para proporcionar soporte a un gran número de investigadores en sus tareas diarias. Teniendo una visión de ambas partes, fue fácil detectar las carencias existentes en la bioinformática que podían ser cubiertas con una aplicación adecuada de los recursos de cálculo disponibles, y ahí es donde surgió la semilla que nos llevó a comenzar los primeros trabajos que componen este estudio. Al haberse realizado en un entorno tan orientado a la resolución de problemas como el que hemos descrito, esta tesis tendrá un carácter eminentemente práctico, donde cada aportación realizada lleva un importante estudio teórico detrás, pero que culmina en un resultado práctico concreto que puede aplicarse a problemas cotidianos de la bioinformática o incluso de otras áreas de la investigación. Así, con el objetivo de facilitar el acceso a los recursos de supercomputación para los bioinformáticos, hemos creado un generador automático de interfaces web para programas que se ejecutan en línea de comandos, que permite ejecutar los trabajos utilizando recursos de supercomputación de forma transparente para el usuario. Además aportamos un sistema de escritorios virtuales que permiten el acceso remoto a un conjunto de programas ya instalados que proporcionan interfaces visuales para analizar pequeños conjuntos de datos o visualizar los resultados más complejos que hayan sido generados con recursos de supercomputación. Para optimizar el uso de los recursos de supercomputación hemos diseñado un nuevo algoritmo para la ejecución distribuida de tareas, que puede utilizarse tanto en el diseño de nuevas herramientas como para optimizar la ejecución de programas ya existentes. Por otra parte, preocupados por el incremento en la cantidad de datos producidos por las técnicas de ultrasecuenciación, aportamos un nuevo formato de compresión de secuencias, que además de reducir el espacio de almacenamiento utilizado, permite buscar y extraer rápidamente cualquier secuencia almacenada sin necesidad de descomprimir el archivo completo. En el desarrollo de nuevos algoritmos para resolver problemas biológicos concretos, proporcionamos cuatro herramientas nuevas que abarcan la búsqueda de regiones divergentes en alineamientos, el preprocesamiento y limpieza de lecturas obtenidas mediante técnicas de ultrasecuenciación, el análisis de transcriptomas de especies no modelo obtenidos mediante ensamblajes de novo y un prototipo para anotar secuencias genómicas incompletas. Como solución para la difusión y el almacenamiento a largo plazo de resultados obtenidos en diversas investigaciones, se ha desarrollado un sistema genérico de máquinas virtuales para bases de datos de transcriptómica que ya está siendo utilizado en varios proyectos. Además, con el ánimo de difundir los resultados de nuestro trabajo, todos los algoritmos y herramientas productos de esta tesis se han publicado como código abierto en https://github.com/dariogf

    Comparative analysis of algorithms for whole-genome assembly of pyrosequencing data

    Get PDF
    Next-generation sequencing technologies have fostered an unprecedented proliferation of high-throughput sequencing projects and a concomitant development of novel algorithms for the assembly of short reads. In this context, an important issue is the need of a careful assessment of the accuracy of the assembly process. Here, we review the efficiency of a panel of assemblers, specifically designed to handle data from GS FLX 454 platform, on three bacterial data sets with different characteristics in terms of reads coverage and repeats content. Our aim is to investigate their strengths and weaknesses in the reconstruction of the reference genomes. In our benchmarking, we assess assemblers’ performance, quantifying and characterizing assembly gaps and errors, and evaluating their ability to solve complex genomic regions containing repeats. The final goal of this analysis is to highlight pros and cons of each method, in order to provide the final user with general criteria for the right choice of the appropriate assembly strategy, depending on the specific needs. A further aspect we have explored is the relationship between coverage of a sequencing project and quality of the obtained results.The final outcome suggests that, for a good tradeoff between costs and results, the planned genome coverage of an experiment should not exceed 20^30 x

    Evolutionary dynamics of RNA viruses with zoonotic potential

    Get PDF
    RNA viruses with zoonotic potential represent a public health threat throughout the world. High mutation rates, short generation times and large population sizes are three factors responsible for their high genetic variability and enormous adaptive capacity to new environmental conditions. Understanding the genetic properties and the evolutionary dynamics of RNA viruses with zoonotic potential is crucial to prevent, control, treat and lessen the damage to animal and human health. This thesis investigates the most essential aspects related to the evolution and epidemiology of two widespread zoonotic diseases caused by two RNA viruses: Avian Influenza and rabies. Through the application of bioinformatics tools, I analysed a large amount of sequence data, generated using both first and second generation sequencing technology, from viruses collected during four distinct epidemics: a fox-rabies virus epidemic occurred in north-eastern Italy between 2008 and 2011, highly pathogenic H5N1 avian influenza outbreaks reported in Egypt between 2006 and 2010, and two avian influenza epidemics caused by two distinct subtypes that took place in northern Italy from 1999 to 2001 and 2002 to 2004. Through phylogenetic and Bayesian phylogeographic analyses of viral sequences sampled over multiple discrete spatio-temporal scales, the studies in this thesis reveal the co-circulation of multiple viral lineages, explore the viral gene flows and investigate the evolutionary dynamics of viruses under different selection pressures. In addition, to better understand the pattern of transmission of viral subpopulations from host to host, the intra-host variability and the evolution of viral pathogenicity, I employed an ultra-deep sequencing approach to assess the diversity of viral populations. The data generated in this thesis provide important insights into the a) impact and efficacy of surveillance strategies and control measures implemented during an outbreak, b) differences in the evolutionary dynamics and spatial spread between distinct genetic groups, c) emergence of amino acid mutations that may increase viral fitness, d) inter-host transmission of viral variants and e) gain of virulence determinants. Finally, this thesis shows the great opportunity offered by next generation sequencing technology for dramatic advancement in our understanding of the complicated evolutionary dynamics of these pathogen

    BIOCONTROL OF ROOT ROT COMPLEX IN FIELD PEA AND LENTIL AND COMPLETE GENOME ANALYSIS OF BIOCONTROL BACTERIA

    Get PDF
    Aphanomyces root rot (ARR), caused by the soil-borne oomycete pathogen, Aphanomyces euteiches, is a destructive disease of legumes, most notably to field pea (Pisum sativum L.) and lentil (Lens culinaris L.). It commonly occurs as root rot complex (RRC) along with other soil-borne pathogens, including Fusarium avenaceum and F. oxysporum, which collectively result in significant crop damage leading to complete loss of productivity. Currently, in Canada, the available management strategies against RRC are inadequate. However, a recent study at the University of Saskatchewan identified soil bacteria, Lysobacter capsici K-Hf-H2, Pseudomonas simiae K-Hf-L9 and Pantoea agglomerans PSV1-7, as potential biocontrol agents against ARR in field pea under controlled growth chamber condition. Therefore, the purpose of this study was to i) investigate the potential for biological control of RRC caused by A. euteiches, F. avenaceum and F. oxysporum and ii) unravel the mechanisms by which biocontrol was achieved. To achieve these objectives, L. capsici K-Hf-H2, P. simiae K-Hf-L9 and P. agglomerans PSV1-7 were evaluated against RRC in field pea and lentil under controlled growth chamber conditions, and the strains’ whole genomes were sequenced, annotated, and comparatively analyzed using bioinformatics tools. Also, laboratory-based general functional experiments, siderophores production, proteolytic and cellulolytic capacities, and desiccation tolerance were conducted. Additionally, the current state of the science "biological control of ARR" was determined via a quantitative meta-analysis review using data extracted from published articles investigating the biocontrol of ARR in pea. My meta-analysis findings suggest potential for biological control of ARR and the need for more field trials to demonstrate the higher efficacy level observed under growth chamber conditions. Compared to P. simiae K-Hf-L9 and P. agglomerans PSV1-7, L. capsici K-Hf-H2 demonstrated the highest significant biocontrol efficacy against RRC in field pea and lentil, with higher efficacy in field pea. Moreover, my genome analyses identified several genes and gene clusters encoding various traits potentially involved in the suppression of RRC. Such genetic determinants detected in L. capsici K-Hf-H2 genome include genes encoding for Heat Stable Antifungal Factor (HSAF), endoglucanase (cellulase), chitinase, extracellular zinc proteases (metalloendopeptidase), aminopeptidases and siderophores. In P. simiae K-Hf-L9 and P. agglomerans PSV1-7 genomes, gene and gene clusters encoding iron acquisition, chitin metabolism and protein degradation were detected. I also found evidence that L. capsici K-Hf-H2, P. simiae K-Hf-L9 and P. agglomerans PSV1-7 chelate iron through siderophore production and hydrolyze protein via proteolytic activity. Furthermore, L. capsici K-Hf-H2 and P. simiae K-Hf-L9 were positive for cellulolytic activity. Therefore, my findings indicate the great potential of biological control of RRC in field pea and lentil. Also, the findings in this study represent a significant contribution to the effort of biological control of RRC in field pea and lentil in Canada
    corecore