14 research outputs found
Hidden Markov Models for Gene Sequence Classification: Classifying the VSG genes in the Trypanosoma brucei Genome
The article presents an application of Hidden Markov Models (HMMs) for
pattern recognition on genome sequences. We apply HMM for identifying genes
encoding the Variant Surface Glycoprotein (VSG) in the genomes of Trypanosoma
brucei (T. brucei) and other African trypanosomes. These are parasitic protozoa
causative agents of sleeping sickness and several diseases in domestic and wild
animals. These parasites have a peculiar strategy to evade the host's immune
system that consists in periodically changing their predominant cellular
surface protein (VSG). The motivation for using patterns recognition methods to
identify these genes, instead of traditional homology based ones, is that the
levels of sequence identity (amino acid and DNA sequence) amongst these genes
is often below of what is considered reliable in these methods. Among pattern
recognition approaches, HMM are particularly suitable to tackle this problem
because they can handle more naturally the determination of gene edges. We
evaluate the performance of the model using different number of states in the
Markov model, as well as several performance metrics. The model is applied
using public genomic data. Our empirical results show that the VSG genes on T.
brucei can be safely identified (high sensitivity and low rate of false
positives) using HMM.Comment: Accepted article in July, 2015 in Pattern Analysis and Applications,
Springer. The article contains 23 pages, 4 figures, 8 tables and 51
reference
In silico prediction of non-coding RNAs using supervised learning and feature ranking methods
This thesis presents a novel method, RNAMultifold, for development of a non-coding RNA (ncRNA) classification model based on features derived from folding the consensus sequence of multiple sequence alignments using different folding programs: RNAalifold, CentroidFold, and RSpredict. The method ranks these folding features according to a Class Separation Measure (CSM) that quantifies the ability of the features to differentiate between samples from positive and negative test sets. The set of top-ranked features is then used to construct classification models: Naive Bayes, Fisher Linear Discriminant, and Support Vector Machine (SVM). These models are compared to the performance of the same models with a baseline feature set and with an existing classification tool, RNAz.
The Support Vector Machine classification model with a radial basis function kernel, using the top 11 ranked features, is shown to be more sensitive than other models, including another ncRNA prediction program, RNAz, across all specificity values for the RNA families under study. In addition, the target feature set outperforms the baseline feature set of z score and structure conservation index across all classification methods, with the exception of Fisher Linear Discriminant. The RNAMultifold method is then used to search the genome of a Trypanosome species (Trypanosoma brucei) for novel ncRNAs. The results of this search are compared with known ncRNAs and with results from RNAz
Kinetoplastid Phylogenomics and Evolution
This Special Issue, Kinetoplastid Phylogenomics and Evolution, unites a series of research and review papers related to kinetoplastid parasites. The diverse topics represented in this collection display a variety of scientific questions and methodological approaches currently used to study these fascinating organisms
Roles of R-loops in the Trypanosoma brucei genome and antigenic variation
The genome of the eukaryotic parasite Trypanosoma brucei is both dynamic and unconventional in several aspects. In comparison with other eukaryotic genomes, where the majority of protein coding genes are associated with their own transcriptional promoters, T. brucei transcribes almost all protein-coding genes polycistronically. Transcription initiates from broad regions that lack defined promoter sequences and RNA Polymerase II then traverses up to hundreds of genes, generating a pre-mRNA that then requires trans-splicing and polyadenylation to generate mature mRNAs. Termination of transcription, via virtually unknown processes, occurs where two multigene transcription units converges or, in some cases, adjacent to a downstream transcription initiation site. RNA Polymerase II transcribes the majority of protein-coding genes in this manner, negating any differential gene expression via transcriptional control. A further unusual aspect of the genome is the dedication of as much as a third of the coding capacity to elements of antigenic variation. When infecting the mammalian host, parasites express a dense protein coat of variant surface glycoprotein (VSG). In order to evade host immune elements, T. brucei switches expression to antigenically distinct VSGs, employing a repertoire of ~2,000 genes. Both transcriptional and recombination-based strategies enable the parasite to either switch transcription between ~15 expression sites, each housing a distinct VSG, or relocate VSG sequence from silent gene arrays into an active VSG expression site. Although multiple factors have been found to regulate these processes, the events which trigger a VSG switch by either pathway are unclear.
R-loops are three stranded structures containing an RNA-DNA hybrid and displaced single-stranded DNA. Although potentially deleterious to genome integrity, R-loops have been linked to transcription initiation and termination, DNA replication and recombination events. In this study, the potential for R-loop involvement in these fundamental genome functions of T. brucei was investigated. Firstly, Ribonuclease (RNase) H enzymes, which resolve the RNA-DNA hybrid portion of R-loops, were characterised, revealing T. brucei expresses potentially three distinct catalytic enzymes, two functioning in the nuclear genome and one in the kinetoplast(mitochondrial) genome. Nuclear RNase H activity was depleted by null mutation or
RNAi mediated knockdown of the nuclear RNase H enzymes, showing that while one RNase H, TbRH1, is non-essential, loss of the other, TbRH2, caused several growth and genome integrity defects. As it was hypothesised to increased levels of RNA- DNA hybrids of the genome, RNA-DNA hybrids were mapped in wild type parasites and those lacking RNases H using a specific antiserum, S9.6. This mapping identified the conserved formation of R-loops at centromeres, retrotransposon-associated genes, rRNA and tRNA genes. R-loop enrichment was also uncovered at RNA Polymerase II transcription start sites, as documented in mammalian genomes. DNA damage was specifically increased at these sites after TbRH2 depletion, indicating efficient resolution of these transcription initiation-associated R-loops is critical for genome maintenance. In contrast, R-loops were not associated with DNA replication or transcription termination suggesting RNA-DNA hybrids are not involved in these processes in T. brucei. The most abundant sites of R-loop enrichment were found to be at the nucleosome depleted regions located between the coding regions of polycistronically transcribed genes and are associated with polyadenylation and trans-splicing, highlighting a novel correlation of R-loops with pre-mRNA processing. Lastly, R-loops were mapped to VSG expression sites where their abundance increased after ablation of RNase H activity, an effect that was associated with both increased DNA damage and VSG switching, uncovering an R-loop-driven mechanism of antigenic variation
Bioinformatics
This book is divided into different research areas relevant in Bioinformatics such as biological networks, next generation sequencing, high performance computing, molecular modeling, structural bioinformatics, molecular modeling and intelligent data analysis. Each book section introduces the basic concepts and then explains its application to problems of great relevance, so both novice and expert readers can benefit from the information and research works presented here
Recommended from our members
INTEGRATING CHEMICAL, BIOLOGICAL AND PHYLOGENETIC SPACES OF AFRICAN NATURAL PRODUCTS TO UNDERSTAND THEIR THERAPEUTIC ACTIVITY
INTEGRATING CHEMICAL, BIOLOGICAL AND PHYLOGENETIC SPACES OF
AFRICAN NATURAL PRODUCTS TO UNDERSTAND THEIR THERAPEUTIC ACTIVITY
Fatima Magdi Hamza Baldo
This research aims to utilise ligand-based target prediction to (i) understand the mechanism
of action of African natural products (ANPs), (ii) help identify patterns of phylogenetic use in
African traditional medicine and (iii) elucidate the mechanism of action of phenotypically
active small molecules and natural products with anti-trypanosomal activity.
In Chapter 2 the objective was to utilise ligand-based target prediction to understand the
mechanism of action of natural products (NPs) from African medicinal plants used against
cancer. The Random Forest classifier used in this work compares the similarity of the input
compounds from the natural product dataset with compound-target combinations in the
training set. The more similar they are in structure, the more likely they are to modulate the
same target. Natural products from plants used against cancer in Africa were predicted to
modulate targets and pathways directly associated with the disease, thus understanding their
mechanism of action e.g. “flap endonuclease 1” and “Mcl-1”. The “Keap1-Nrf2 Pathway”
and “apoptosis modulation by HSP70”, two pathways previously linked to cancer (which are
not currently targeted by marketed drugs, but have been of increasing interest in recent years)
were predicted to be modulated by ANPs.
In Chapter 3, we aimed to identify phylogenetic patterns in medicinal plant use and the role
this plays in predicting medicinal activity. We combined chemical, predicted target and
phylogenetic information of the natural products to identify patterns of use for plant families
containing plant species used against cancer in African, Malay and Indian (Ayurveda)
traditional medicine. Plant families that are close phylogenetically were found to produce
similar natural products that act on similar targets regardless of their origin. Additionally,
phylogenetic patterns were identified for African traditional plant families with medicinal
species used against cancer, malaria and human African trypanosomiasis (HAT). We
identified plant families that have more medicinal species than would statistically be expected
by chance and rationalised this by linking their activity to their unique phyto-chemistry e.g.
the napthyl-isoquinoline alkaloids, uniquely produced by Acistrocladaceae and
Dioncophyllaceae, are responsible for anti-malarial and anti-trypanosome activity.
In Chapter 4, information from target prediction and experimentally validated targets was
combined with orthologue data to predict targets of phenotypically active small molecules
and natural products screened against Trypanosoma brucei. The predicted targets were
prioritised based on their essentiality for the survival of the T. brucei parasite. We predicted
orthologues of targets that are essential for the survival of the trypanosome e.g. glycogen
synthase kinase 3 (GSK3) and rhodesain. We also identified the biological processes
predicted to be perturbed by the compounds e.g. “glycolysis”, “cell cycle”, “regulation of
symbiosis, encompassing mutualism through parasitism” and “modulation of development of
symbiont involved in interaction with host”.
In conclusion, in silico target prediction can be used to predict protein targets of natural
products to understand their molecular mechanism of action. Phylogenetic information and
phytochemical information of medicinal plants can be integrated to identify plant families
with more medicinal species than would be expected by chance
The impact of the International Livestock Research Institute
Providing the first evidence-based global estimates of the many scientific, economic, policy, and capacity development impacts of livestock research in and for developing countries, this volume is an indispensable guide and reference for veterinarians, animal and forage scientists, and anyone working for the equitable and sustainable development of the world's poorer agricultural economies.
Livestock is one of the fastest growing agricultural sectors, with most growth occurring in developing countries. For more than four and a half decades one global centre has been mandated to conduct research on leveraging the benefits and mitigating the costs of livestock production in poor countries. This book focuses on the achievements, failures and impacts of the International Livestock Research Institute (ILRI) and its predecessors, the International Livestock Centre for Africa (ILCA) and the International Laboratory for Research on Animal Diseases (ILRAD). The scientific and economic impacts of tropical livestock research detailed in this work reveal valuable lessons for reducing world hunger, poverty and environmental degradation.
Describing the impacts of smallholder livestock systems on the global environment, the book also covers animal genetics, production, health and disease control, and livestock-related land management, public policy and economics, all with useful pointers for future livestock-for-development research
A Comparison of Mitochondrial Heat Shock Protein 70 and Hsp70 Escort Protein 1 Orthologues from Trypanosoma brucei and Homo sapiens
The causative agent of African trypanosomiasis, Trypanosoma brucei (T. brucei), has an expanded retinue of specialized heat shock proteins, which have been identified as crucial to the progression of the disease. These play a central role in disease progression and transmission through their involvement in cell-cycle pathways which bring about cell-cycle arrest and differentiation. Hsp70 proteins are essential for the maintenance of proteostasis in the cell. Mitochondrial Hsp70 (mtHsp70) is a highly conserved molecular chaperone required for both the translocation of nuclear encoded proteins across the two mitochondrial membranes and the subsequent folding of proteins in the matrix. The T. brucei genome encodes three copies of mtHsp70 which are 100% identical. MtHsp70 self-aggregates, a property unique to this isoform, and an Hsp70 escort protein (Hep1) is required to maintain the molecular chaperone in a soluble, functional state. This study aimed to compare the solubilizing interaction of Hep1 from T. brucei and Homo sapiens (H. sapien). The recently introduced Alphafold program was used to analyze the structures of mtHsp70 and Hep1 proteins and allowed observations of structures unavailable to other modelling techniques. The GVFEV motif found in the ATPase domain of mtHsp70s interacted with the linker region, resulting in aggregation, the Alphafold models produced indicated that the replacement of the lysine (K) residue within the KTFEV motif of DnaK (prokaryotic Hsp70) with Glycine (G), may abrogate bond formation between the motif and a region between lobe I and II of the ATPase domain. This may facilitate the aggregation reaction of mtHsp70 orthologues and provides a residue of interest for future studies. Both TbHep1 and HsHep1 reduced the thermal aggregation of TbmtHsp70 and mortalin (H. sapien mtHsp70) respectively, however, TbHep1 was ~ 15 % less effective than HsHep1 at higher concentrations (4 uM). TbHep1 itself appeared to be aggregation-prone when under conditions of thermal stress, Alphafold models suggest this may be due to an N-terminal α- helical structure not present in HsHep1. These results indicate that TbHep1 is functionally similar to HsHep1, however, the orthologue may operate in a unique manner which requires further investigation.Thesis (MSc) -- Faculty of Science, Biotechnology Innovation Centre, 202
Genetic diversity in Trypanosoma cruzi: marker development and applications; natural population structures, and genetic exchange mechanisms
Chagas disease remains the most important parasitic infection in Latin America. The
aetiological agent, Trypanosoma cruzi (Kinetoplastida: Trypanosomatidae), is a complex
vector-borne zoonosis transmitted in the faeces of hematophagous triatomine bugs
(Hemiptera: Reduviidae: Triatominae), and maintained by mammalian reservoir hosts
ranging from the southern United States to Argentinean Patagonia. In the absence of
chemotherapy, infection is life-long and can lead to a spectrum of pathological sequelae
ranging from subclinical to lethal cardiac and/or gastrointestinal complications in up to 30%
of patients.
T. cruzi displays remarkable genetic diversity, which has long been suspected to contribute to
the considerable variation in clinical symptoms observed between endemic regions.
Currently, isolates of T. cruzi can be assigned to a minimum of six stable genetic lineages or
discrete typing units (DTUs) (TcI-TcVI), which are broadly associated with disparate
ecologies, transmission cycles and geographical distributions. The principal mode of
reproduction among T. cruzi strains is the subject of an intense, decades-old debate. Despite
the existence of two recent natural hybrid lineages (TcV and TcVI), which resemble meiotic
F1 progeny, a pervasive view is that recombination has been restrained at an evolutionary
scale and is of little epidemiological relevance to contemporary parasite populations.
The aim of this PhD project was to investigate T. cruzi genetic diversity through significant
development of phylogenetic markers and their application to the characterization of natural
parasite population structures and genetic exchange mechanisms. Multiple, single-copy,
chromosomally-independent, nuclear housekeeping genes were assessed initially for their
ability to allocate isolates to DTU-level, to facilitate higher resolution intra-lineage analyses
and finally for their inclusion alongside additional targets in a standardized T. cruzi
multilocus sequence typing (nMLST) scheme. For the immediate future, nuclear MLST,
using a panel of four to seven nuclear loci, is a robust, reproducible and highly discriminatory
method that has potential to become the new gold standard for T. cruzi DTU assignment.
To investigate natural parasite population structures and uncover evidence of genetic
exchange, a high resolution mitochondrial MLST (mtMLST) scheme, based on ten gene
fragments, was developed and evaluated against current nuclear markers (multilocus
microsatellite typing; MLMT) using isolates belonging to the oldest and most widely
distributed lineage (TcI). Observations of gross nuclear-mitochondrial phylogenetic
incongruence indicate that recombination is ongoing, geographically widespread and
continues to influence natural populations, challenging the traditional paradigm of clonality
in T. cruzi.
Application of this combined nuclear-mitochondrial methodology to intensively sampled,
minimally-subdivided TcI populations revealed extensive mitochondrial introgression within
a disease focus in North-East Colombia as well as among arboreal transmission cycles in
Bolivia. Failure to detect any reciprocal nuclear hybridization among recombinant strains
!
4
may be indicative of alternate, cryptic mating strategies in T. cruzi, which are challenging to
reconcile with both in vitro parasexual mechanisms of genetic exchange described, and
patterns of Mendelian allele inheritance among natural hybrid DTUs.
High resolution genotyping of TcI populations was also undertaken to explore the interaction
between parasite genetic heterogeneity and ecological biodiversity, exposing the significant
impact human activity has had on T. cruzi evolution. Reduced genetic diversity, accelerated
parasite dissemination between densely populated areas and mitochondrial gene flow
between domestic and sylvatic populations, suggests humans may have played a crucial role
in T. cruzi dispersal across the Bolivian highlands. Parallel reductions in genetic diversity
were observed among isolates from the Brazilian Atlantic Forest, attributable to ongoing
anthropogenic habitat fragmentation. By comparison domestic TcI isolates (TcIDOM) are
divergent from their sylvatic counterparts, but also genetically homogeneous, and likely to
have originated in North/Central America before distribution southwards. Molecular dating
of Colombian TcIDOM clones confirmed that this clade emerged 23,000 ± 12,000 years,
coinciding with the earliest human migration into South America.
Lastly, Illumina amplicon deep sequencing markers were developed to explore the interaction
between parasite multiclonality and clinical status of chronic Chagas disease. An
unprecedented level of intra-host genetic diversity was detected, highlighting putative
diversifying selection affecting antigenic surface proteases, which may facilitate survival in
the mammalian host. In lieu of comparative genomics of representative T. cruzi field isolates,
not yet a reality, as is the case with other more experimentally-tractable trypanosomatids,
presented herein are some of the highest resolution genotyping techniques developed in T.
cruzi to date, which have the potential to expand our current understanding of parasite
genetic diversity and its relevance to clinical outcome of Chagas disease