14 research outputs found

    Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome

    Full text link
    The article presents an application of Hidden Markov Models (HMMs) for pattern recognition on genome sequences. We apply HMM for identifying genes encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa causative agents of sleeping sickness and several diseases in domestic and wild animals. These parasites have a peculiar strategy to evade the host's immune system that consists in periodically changing their predominant cellular surface protein (VSG). The motivation for using patterns recognition methods to identify these genes, instead of traditional homology based ones, is that the levels of sequence identity (amino acid and DNA sequence) amongst these genes is often below of what is considered reliable in these methods. Among pattern recognition approaches, HMM are particularly suitable to tackle this problem because they can handle more naturally the determination of gene edges. We evaluate the performance of the model using different number of states in the Markov model, as well as several performance metrics. The model is applied using public genomic data. Our empirical results show that the VSG genes on T. brucei can be safely identified (high sensitivity and low rate of false positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications, Springer. The article contains 23 pages, 4 figures, 8 tables and 51 reference

    In silico prediction of non-coding RNAs using supervised learning and feature ranking methods

    Get PDF
    This thesis presents a novel method, RNAMultifold, for development of a non-coding RNA (ncRNA) classification model based on features derived from folding the consensus sequence of multiple sequence alignments using different folding programs: RNAalifold, CentroidFold, and RSpredict. The method ranks these folding features according to a Class Separation Measure (CSM) that quantifies the ability of the features to differentiate between samples from positive and negative test sets. The set of top-ranked features is then used to construct classification models: Naive Bayes, Fisher Linear Discriminant, and Support Vector Machine (SVM). These models are compared to the performance of the same models with a baseline feature set and with an existing classification tool, RNAz. The Support Vector Machine classification model with a radial basis function kernel, using the top 11 ranked features, is shown to be more sensitive than other models, including another ncRNA prediction program, RNAz, across all specificity values for the RNA families under study. In addition, the target feature set outperforms the baseline feature set of z score and structure conservation index across all classification methods, with the exception of Fisher Linear Discriminant. The RNAMultifold method is then used to search the genome of a Trypanosome species (Trypanosoma brucei) for novel ncRNAs. The results of this search are compared with known ncRNAs and with results from RNAz

    Kinetoplastid Phylogenomics and Evolution

    Get PDF
    This Special Issue, Kinetoplastid Phylogenomics and Evolution, unites a series of research and review papers related to kinetoplastid parasites. The diverse topics represented in this collection display a variety of scientific questions and methodological approaches currently used to study these fascinating organisms

    Roles of R-loops in the Trypanosoma brucei genome and antigenic variation

    Get PDF
    The genome of the eukaryotic parasite Trypanosoma brucei is both dynamic and unconventional in several aspects. In comparison with other eukaryotic genomes, where the majority of protein coding genes are associated with their own transcriptional promoters, T. brucei transcribes almost all protein-coding genes polycistronically. Transcription initiates from broad regions that lack defined promoter sequences and RNA Polymerase II then traverses up to hundreds of genes, generating a pre-mRNA that then requires trans-splicing and polyadenylation to generate mature mRNAs. Termination of transcription, via virtually unknown processes, occurs where two multigene transcription units converges or, in some cases, adjacent to a downstream transcription initiation site. RNA Polymerase II transcribes the majority of protein-coding genes in this manner, negating any differential gene expression via transcriptional control. A further unusual aspect of the genome is the dedication of as much as a third of the coding capacity to elements of antigenic variation. When infecting the mammalian host, parasites express a dense protein coat of variant surface glycoprotein (VSG). In order to evade host immune elements, T. brucei switches expression to antigenically distinct VSGs, employing a repertoire of ~2,000 genes. Both transcriptional and recombination-based strategies enable the parasite to either switch transcription between ~15 expression sites, each housing a distinct VSG, or relocate VSG sequence from silent gene arrays into an active VSG expression site. Although multiple factors have been found to regulate these processes, the events which trigger a VSG switch by either pathway are unclear. R-loops are three stranded structures containing an RNA-DNA hybrid and displaced single-stranded DNA. Although potentially deleterious to genome integrity, R-loops have been linked to transcription initiation and termination, DNA replication and recombination events. In this study, the potential for R-loop involvement in these fundamental genome functions of T. brucei was investigated. Firstly, Ribonuclease (RNase) H enzymes, which resolve the RNA-DNA hybrid portion of R-loops, were characterised, revealing T. brucei expresses potentially three distinct catalytic enzymes, two functioning in the nuclear genome and one in the kinetoplast(mitochondrial) genome. Nuclear RNase H activity was depleted by null mutation or RNAi mediated knockdown of the nuclear RNase H enzymes, showing that while one RNase H, TbRH1, is non-essential, loss of the other, TbRH2, caused several growth and genome integrity defects. As it was hypothesised to increased levels of RNA- DNA hybrids of the genome, RNA-DNA hybrids were mapped in wild type parasites and those lacking RNases H using a specific antiserum, S9.6. This mapping identified the conserved formation of R-loops at centromeres, retrotransposon-associated genes, rRNA and tRNA genes. R-loop enrichment was also uncovered at RNA Polymerase II transcription start sites, as documented in mammalian genomes. DNA damage was specifically increased at these sites after TbRH2 depletion, indicating efficient resolution of these transcription initiation-associated R-loops is critical for genome maintenance. In contrast, R-loops were not associated with DNA replication or transcription termination suggesting RNA-DNA hybrids are not involved in these processes in T. brucei. The most abundant sites of R-loop enrichment were found to be at the nucleosome depleted regions located between the coding regions of polycistronically transcribed genes and are associated with polyadenylation and trans-splicing, highlighting a novel correlation of R-loops with pre-mRNA processing. Lastly, R-loops were mapped to VSG expression sites where their abundance increased after ablation of RNase H activity, an effect that was associated with both increased DNA damage and VSG switching, uncovering an R-loop-driven mechanism of antigenic variation

    Bioinformatics

    Get PDF
    This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here

    The impact of the International Livestock Research Institute

    Get PDF
    Providing the first evidence-based global estimates of the many scientific, economic, policy, and capacity development impacts of livestock research in and for developing countries, this volume is an indispensable guide and reference for veterinarians, animal and forage scientists, and anyone working for the equitable and sustainable development of the world's poorer agricultural economies. Livestock is one of the fastest growing agricultural sectors, with most growth occurring in developing countries. For more than four and a half decades one global centre has been mandated to conduct research on leveraging the benefits and mitigating the costs of livestock production in poor countries. This book focuses on the achievements, failures and impacts of the International Livestock Research Institute (ILRI) and its predecessors, the International Livestock Centre for Africa (ILCA) and the International Laboratory for Research on Animal Diseases (ILRAD). The scientific and economic impacts of tropical livestock research detailed in this work reveal valuable lessons for reducing world hunger, poverty and environmental degradation. Describing the impacts of smallholder livestock systems on the global environment, the book also covers animal genetics, production, health and disease control, and livestock-related land management, public policy and economics, all with useful pointers for future livestock-for-development research

    A Comparison of Mitochondrial Heat Shock Protein 70 and Hsp70 Escort Protein 1 Orthologues from Trypanosoma brucei and Homo sapiens

    Get PDF
    The causative agent of African trypanosomiasis, Trypanosoma brucei (T. brucei), has an expanded retinue of specialized heat shock proteins, which have been identified as crucial to the progression of the disease. These play a central role in disease progression and transmission through their involvement in cell-cycle pathways which bring about cell-cycle arrest and differentiation. Hsp70 proteins are essential for the maintenance of proteostasis in the cell. Mitochondrial Hsp70 (mtHsp70) is a highly conserved molecular chaperone required for both the translocation of nuclear encoded proteins across the two mitochondrial membranes and the subsequent folding of proteins in the matrix. The T. brucei genome encodes three copies of mtHsp70 which are 100% identical. MtHsp70 self-aggregates, a property unique to this isoform, and an Hsp70 escort protein (Hep1) is required to maintain the molecular chaperone in a soluble, functional state. This study aimed to compare the solubilizing interaction of Hep1 from T. brucei and Homo sapiens (H. sapien). The recently introduced Alphafold program was used to analyze the structures of mtHsp70 and Hep1 proteins and allowed observations of structures unavailable to other modelling techniques. The GVFEV motif found in the ATPase domain of mtHsp70s interacted with the linker region, resulting in aggregation, the Alphafold models produced indicated that the replacement of the lysine (K) residue within the KTFEV motif of DnaK (prokaryotic Hsp70) with Glycine (G), may abrogate bond formation between the motif and a region between lobe I and II of the ATPase domain. This may facilitate the aggregation reaction of mtHsp70 orthologues and provides a residue of interest for future studies. Both TbHep1 and HsHep1 reduced the thermal aggregation of TbmtHsp70 and mortalin (H. sapien mtHsp70) respectively, however, TbHep1 was ~ 15 % less effective than HsHep1 at higher concentrations (4 uM). TbHep1 itself appeared to be aggregation-prone when under conditions of thermal stress, Alphafold models suggest this may be due to an N-terminal α- helical structure not present in HsHep1. These results indicate that TbHep1 is functionally similar to HsHep1, however, the orthologue may operate in a unique manner which requires further investigation.Thesis (MSc) -- Faculty of Science, Biotechnology Innovation Centre, 202

    Genetic diversity in Trypanosoma cruzi: marker development and applications; natural population structures, and genetic exchange mechanisms

    Get PDF
    Chagas disease remains the most important parasitic infection in Latin America. The aetiological agent, Trypanosoma cruzi (Kinetoplastida: Trypanosomatidae), is a complex vector-borne zoonosis transmitted in the faeces of hematophagous triatomine bugs (Hemiptera: Reduviidae: Triatominae), and maintained by mammalian reservoir hosts ranging from the southern United States to Argentinean Patagonia. In the absence of chemotherapy, infection is life-long and can lead to a spectrum of pathological sequelae ranging from subclinical to lethal cardiac and/or gastrointestinal complications in up to 30% of patients. T. cruzi displays remarkable genetic diversity, which has long been suspected to contribute to the considerable variation in clinical symptoms observed between endemic regions. Currently, isolates of T. cruzi can be assigned to a minimum of six stable genetic lineages or discrete typing units (DTUs) (TcI-TcVI), which are broadly associated with disparate ecologies, transmission cycles and geographical distributions. The principal mode of reproduction among T. cruzi strains is the subject of an intense, decades-old debate. Despite the existence of two recent natural hybrid lineages (TcV and TcVI), which resemble meiotic F1 progeny, a pervasive view is that recombination has been restrained at an evolutionary scale and is of little epidemiological relevance to contemporary parasite populations. The aim of this PhD project was to investigate T. cruzi genetic diversity through significant development of phylogenetic markers and their application to the characterization of natural parasite population structures and genetic exchange mechanisms. Multiple, single-copy, chromosomally-independent, nuclear housekeeping genes were assessed initially for their ability to allocate isolates to DTU-level, to facilitate higher resolution intra-lineage analyses and finally for their inclusion alongside additional targets in a standardized T. cruzi multilocus sequence typing (nMLST) scheme. For the immediate future, nuclear MLST, using a panel of four to seven nuclear loci, is a robust, reproducible and highly discriminatory method that has potential to become the new gold standard for T. cruzi DTU assignment. To investigate natural parasite population structures and uncover evidence of genetic exchange, a high resolution mitochondrial MLST (mtMLST) scheme, based on ten gene fragments, was developed and evaluated against current nuclear markers (multilocus microsatellite typing; MLMT) using isolates belonging to the oldest and most widely distributed lineage (TcI). Observations of gross nuclear-mitochondrial phylogenetic incongruence indicate that recombination is ongoing, geographically widespread and continues to influence natural populations, challenging the traditional paradigm of clonality in T. cruzi. Application of this combined nuclear-mitochondrial methodology to intensively sampled, minimally-subdivided TcI populations revealed extensive mitochondrial introgression within a disease focus in North-East Colombia as well as among arboreal transmission cycles in Bolivia. Failure to detect any reciprocal nuclear hybridization among recombinant strains ! 4 may be indicative of alternate, cryptic mating strategies in T. cruzi, which are challenging to reconcile with both in vitro parasexual mechanisms of genetic exchange described, and patterns of Mendelian allele inheritance among natural hybrid DTUs. High resolution genotyping of TcI populations was also undertaken to explore the interaction between parasite genetic heterogeneity and ecological biodiversity, exposing the significant impact human activity has had on T. cruzi evolution. Reduced genetic diversity, accelerated parasite dissemination between densely populated areas and mitochondrial gene flow between domestic and sylvatic populations, suggests humans may have played a crucial role in T. cruzi dispersal across the Bolivian highlands. Parallel reductions in genetic diversity were observed among isolates from the Brazilian Atlantic Forest, attributable to ongoing anthropogenic habitat fragmentation. By comparison domestic TcI isolates (TcIDOM) are divergent from their sylvatic counterparts, but also genetically homogeneous, and likely to have originated in North/Central America before distribution southwards. Molecular dating of Colombian TcIDOM clones confirmed that this clade emerged 23,000 ± 12,000 years, coinciding with the earliest human migration into South America. Lastly, Illumina amplicon deep sequencing markers were developed to explore the interaction between parasite multiclonality and clinical status of chronic Chagas disease. An unprecedented level of intra-host genetic diversity was detected, highlighting putative diversifying selection affecting antigenic surface proteases, which may facilitate survival in the mammalian host. In lieu of comparative genomics of representative T. cruzi field isolates, not yet a reality, as is the case with other more experimentally-tractable trypanosomatids, presented herein are some of the highest resolution genotyping techniques developed in T. cruzi to date, which have the potential to expand our current understanding of parasite genetic diversity and its relevance to clinical outcome of Chagas disease
    corecore