101 research outputs found

    Spontaneous Emergence of Hierarchy in Biological Systems

    Get PDF
    Hierarchy is widely observed in biological systems. In this thesis, evidence from nature is presented to show that protein interactions have became increasingly modular as evolution has proceeded over the last four billion years. The evolution of animal body plan development is considered. Results show the genes that determine the phylum and superphylum characters evolve slowly, while those genes that determine classes, families, and speciation evolve more rapidly. This result furnishes support to the hypothesis that the hierarchical structure of developmental regulatory networks provides an organizing structure that guides the evolution of aspects of the body plan. Next, the world trade network is treated as an evolving system. The theory of modularity predicts that the trade network is more sensitive to recessionary shocks and recovers more slowly from them now than it did 40 years ago, due to structural changes in the world trade network induced by globalization. Economic data show that recession-induced change to the world trade network leads to an increased hierarchical structure of the global trade network for a few years after the recession. In the study of influenza virus evolution, an approach for early detection of new dominant strains is presented. This method is shown to be able to identify a cluster around an incipient dominant strain before it becomes dominant. Recently, CRISPR has been suggested to provide adaptive immune response to bacteria. A population dynamics model is proposed that explains the biological observation that the leader-proximal end of CRISPR is more diversified and the leader-distal end of CRISPR is less diversifed. Finally, the creation of diversity of antibody repertoire is investigated. It is commonly believed that a heavy chain is generated by randomly combining V, D and J gene segments. However, using high throughput sequence data in this study, the naive VDJ repertoire is shown to be strongly correlated between individuals, which suggest VDJ recombination involves regulated mechanisms

    Machine Learning with Digital Signal Processing for Rapid and Accurate Alignment-Free Genome Analysis: From Methodological Design to a Covid-19 Case Study

    Get PDF
    In the field of bioinformatics, taxonomic classification is the scientific practice of identifying, naming, and grouping of organisms based on their similarities and differences. The problem of taxonomic classification is of immense importance considering that nearly 86% of existing species on Earth and 91% of marine species remain unclassified. Due to the magnitude of the datasets, the need exists for an approach and software tool that is scalable enough to handle large datasets and can be used for rapid sequence comparison and analysis. We propose ML-DSP, a stand-alone alignment-free software tool that uses Machine Learning and Digital Signal Processing to classify genomic sequences. ML-DSP uses numerical representations to map genomic sequences to discrete numerical series (genomic signals), Discrete Fourier Transform (DFT) to obtain magnitude spectra from the genomic signals, Pearson Correlation Coefficient (PCC) as a dissimilarity measure to compute pairwise distances between magnitude spectra of any two genomic signals, and supervised machine learning for the classification and prediction of the labels of new sequences. We first test ML-DSP by classifying 7396 full mitochondrial genomes at various taxonomic levels, from kingdom to genus, with an average classification accuracy of \u3e 97%. We also provide preliminary experiments indicating the potential of ML-DSP to be used for other datasets, by classifying 4271 complete dengue virus genomes into subtypes with 100% accuracy, and 4710 bacterial genomes into phyla with 95.5% accuracy. Second, we propose another tool, MLDSP-GUI, where additional features include: a user-friendly Graphical User Interface, Chaos Game Representation (CGR) to numerically represent DNA sequences, Euclidean and Manhattan distances as additional distance measures, phylogenetic tree output, oligomer frequency information to study the under- and over-representation of any particular sub-sequence in a selected sequence, and inter-cluster distances analysis, among others. We test MLDSP-GUI by classifying 7881 complete genomes of Flavivirus genus into species with 100% classification accuracy. Third, we provide a proof of principle that MLDSP-GUI is able to classify newly discovered organisms by classifying the novel COVID-19 virus

    Maize as production and delivery vehicle of edible vaccines against the enterotoxigenic Escherichia coli and the swine transmissible gastroenteritis (TGE)

    Get PDF
    Plants are becoming increasingly important as a production system for biopharmaceuticals and industrially important proteins. The work presented in this dissertation showed that maize can be used as a source and delivery vehicle for oral vaccines. Antigenic proteins from two economically important pathogens, enterotoxigenic Escherichia coli (E. coli ) and the swine transmissible gastroenteritis virus (TGEV) were expressed in transgenic maize.;This study showed that subunits of the E. coli heat labile enterotoxin (LT) can be synthesized in transgenic maize tissues, correctly processed and assembled in maize tissue. The role of regulatory sequences such as promoters, targeting and retention signals in accumulation of LT-B in transgenic maize kernels was studied. The seed specific 27 kDa gamma zein promoter achieved a significantly higher level of LT-B expression in kernels compared to the constitutive CaMV 35S promoter. The use of the endoplasmic reticulum retention motif SEKDEL significantly enhanced kernel accumulation of LT-B. The LT-13 gene was normally transmitted over three generations.;Maize generated LT-B had biochemical, biophysical, and immunogenic properties of the bacterial protein. Oral administration of transgenic maize expressing LT-B in BALB/c mice induced elevated titers of serum and mucosal antibodies, which protected the immunized animals from subsequent challenge with LT and Cholera toxin (CT).;Using two synthetic genes for the LT toxin subunits, LT-A and LT-B, a non-toxic derivative of the heat labile toxin, LTK63, was expressed in transgenic maize callus. This mutant toxin assembled in maize callus tissue, showing that complex folding of foreign antigens could be achieved in transgenic maize tissues. This mutant derivative was shown to be more immunogenic than the bacteria derived LT-B.;We fused an N-terminal domain of the spike (S) protein of the swine transmissible gastroenteritis virus to the A subunit of LT, and coexpressed this fusion with LT-B in transgenic maize callus. Expression of the fusion proteins and LT-B was observed in callus.;This work demonstrates that maize, a key ingredient in food and feed industry, can be used as a source and delivery vehicle of functional antigens for use as oral vaccines. Maize holds great potential for the generation of human and livestock vaccines, and this work lays the foundation for the development of vaccines against other pathogens in transgenic maize

    Mining virus genomes for host predictive signals

    Get PDF
    The total dependence of a virus on its host for its survival leads to a fundamental entanglement with its host’s cellular machinery. This drives a coevolutionary relationship that leaves an imprint of the host in viral genomes. The aim of this thesis was to develop machine learning approaches to identify and exploit these host predictive signals. We present methods that use these signals both to build classifiers that can assign putative information to virus genomes and to locate the discriminative features on viral proteins thereby identifying regions that are important in the host relationship. The first step aimed to identify discriminative features that capture the different aspects of the virus host relationship. We generated a range of feature sets from alternative representations of the viral genomes that each aimed to exploit the different levels of biological information present. We used a supervised machine learning approach to compare a range of feature sets for their ability to predict host taxonomic information. Next, we opened these “black box” classifiers and to extract the discriminative information learnt by the model to identify regions of a viral protein that are associated with their host relationship. We used the ‘local’ nature of some of the predictive feature sets to transform an amino acid sequence into host signals. Finally, we developed a multi-view generative mixture model, MVC, to tease apart the complex signals that are embedded in viral genomes via different evolutionary processes. This Bayesian approach uses the clustering of the data defined by labels of interest to guide the features associated with those labels into the "relevant view". The MVC model is able to identify features associated with weak effect in the data

    Genotypic analysis of HIV-1 coreceptor usage

    Get PDF
    The acquired immunodeficiency syndrome (AIDS) is one of the biggest medical challenges in the world today. Its causative pathogen, the human immunodeficiency virus (HIV), is responsible for millions of deaths per year. Although about two dozen antiviral drugs are currently available, progression of the disease can only be delayed but patients cannot be cured. In recent years, the new class of coreceptor antagonists has been added to the arsenal of antiretroviral drugs. These drugs block viral cell-entry by binding to one of the receptors the virus requires for infection of a cell. However, some HIV variants can also use another coreceptor so that coreceptor usage has to be tested before administration of the drug. This thesis analyzes the use of statistical learning methods to infer HIV coreceptor usage from viral genotype. Improvements over existing methods are achieved by using sequence information of so far not used genomic regions, next generation sequencing technologies, and by combining different existing prediction systems. In addition, HIV coreceptor usage prediction is analyzed with respect to clinical outcome in patients treated with coreceptor antagonists. The results demonstrate that inferring HIV coreceptor usage from viral genotype can be reliably used in daily routine.Die Immunschwächekrankheit AIDS ist eine der größten Herausforderungen weltweit. Das verursachende Humane Immundefizienz-Virus (HIV) ist verantwortlich für Millionen Tote jährlich. Obwohl es bereits mehr als zwei Dutzend verschiedene AIDS-Medikamente gibt, können diese den Krankheitsverlauf nur verlangsamen, die Patienten jedoch nicht heilen. In den letzten Jahren wurde eine weitere Medikamentenklasse den bestehenden Therapieansätzen hinzugefügt: die Korezeptorantagonisten. Diese Wirkstoffe binden an Rezeptoren, die das Virus zum Eintritt in die Zelle benötigt und blockieren es somit. Allerdings gibt es auch Virusvarianten, die in der Lage sind Zellen mit Hilfe eines anderen Rezeptors zu infizieren. Daher sollte man vor Verschreibung eines Korezeptorantagonisten den Korezeptorgebrauch des Virus testen. Diese Arbeit befasst sich mit der Bestimmung des Korezeptorgebrauchs aus dem viralen Erbgut mit Hilfe von statistischen Lernverfahren. Verbesserungen gegenüber existierenden Methoden werden erreicht in dem bisher nicht verwendete Genomregionen analysiert werden, durch den Gebrauch von neuesten Hochdurchsatz-Sequenziertechniken, sowie durch die Kombination von zwei existierenden Vorhersagesystemen. Schließlich wird die Qualität der Korezeptorvorhersagen bezüglich klinischem Ansprechens bei Patienten untersucht, die mit Korezeptorantagonisten therapiert wurden. Die Ergebnisse zeigen, dass die Vorhersage des Korezeptorgebrauchs aus dem viralen Erbgut eine verläßliche Methode für den klinischen Alltag darstellt

    Predicting and analyzing HIV-1 adaptation to broadly neutralizing antibodies and the host immune system using machine learning

    Get PDF
    Thanks to its extraordinarily high mutation and replication rate, the human immunodeficiency virus type 1 (HIV-1) is able to rapidly adapt to the selection pressure imposed by the host immune system or antiretroviral drug exposure. With neither a cure nor a vaccine at hand, viral control is a major pillar in the combat of the HIV-1 pandemic. Without drug exposure, interindividual differences in viral control are partly influenced by host genetic factors like the human leukocyte antigen (HLA) system, and viral genetic factors like the predominant coreceptor usage of the virus. Thus, a close monitoring of the viral population within the patients and adjustments in the treatment regimens, as well as a continuous development of new drug components are indispensable measures to counteract the emergence of viral escape variants. To this end, a fast and accurate determination of the viral adaptation is essential for a successful treatment. This thesis is based upon four studies that aim to develop and apply statistical learning methods to (i) predict adaptation of the virus to broadly neutralizing antibodies (bNAbs), a promising new treatment option, (ii) advance antibody-mediated immunotherapy for clinical usage, and (iii) predict viral adaptation to the HLA system to further understand the switch in HIV-1 coreceptor usage. In total, this thesis comprises several statistical learning approaches to predict HIV-1 adaptation, thereby, enabling a better control of HIV-1 infections.Dank seiner außergewöhnlich hohen Mutations- und Replikationsrate ist das humane Immundefizienzvirus Typ 1 (HIV-1) in der Lage sich schnell an den vom Immunsystem des Wirtes oder durch die antiretrovirale Arzneimittelexposition ausgeübten Selektionsdruck anzupassen. Da weder ein Heilmittel noch ein Impfstoff verfügbar sind, ist die Viruskontrolle eine wichtige Säule im Kampf gegen die HIV-1-Pandemie. Ohne Arzneimittelexposition werden interindividuelle Unterschiede in der Viruskontrolle teilweise durch genetische Faktoren des Wirts wie das humane Leukozytenantigensystem (HLA) und virale genetische Faktoren wie die vorherrschende Korezeptornutzung des Virus beeinflusst. Eine genaue Überwachung der Viruspopulation innerhalb des Patienten, gegebenfalls Anpassungen der Behandlungsschemata sowie eine kontinuierliche Entwicklung neuer Wirkstoffkomponenten sind daher unerlässliche Maßnahmen, um dem Auftreten viraler Fluchtvarianten entgegenzuwirken. Für eine erfolgreiche Behandlung ist eine schnelle und genaue Bestimmung der Anpassung einer Variante essentiell. Die Thesis basiert auf vier Studien, deren Ziel es ist statistische Lernverfahren zu entwickeln und anzuwenden, um (1) die Anpassung von HIV-1 an breit neutralisierende Antikörper, eine neuartige vielversprechende Therapieoption, vorherzusagen, (2) den Einsatz von Antikörper-basierte Immuntherapien für den klinischen Einsatz voranzutreiben, und (3) die virale Anpassung von HIV-1 an das HLA-System vorherzusagen, um den Wechsel der HIV-1 Korezeptornutzung besser zu verstehen. Zusammenfassend umfasst diese Thesis mehrere statistische Lernverfahrenansätze, um HIV Anpassung vorherzusagen, wodurch eine bessere Kontrolle von HIV-1 Infektionen ermöglicht wird

    One-class SVM and supervised machine learning models for uncovering associations of non-coding RNA with diseases

    Get PDF
    The study of MicroRNAs (miRNAs), long non-coding RNAs (lncRNAs) and gene interactions may be expected to provide new technologies to serve as valuable biomarkers for personalized treatments of diseases and to aid in the prognosis of certain conditions. These molecules act at the genome level by regulating or suppressing their protein expression functions. The primary challenge in the study of these non-coding molecules involves the necessity of finding labeled data indicating positive and negative interactions when predicting interactions using machine-learning or deep-learning techniques. However, usually we end up with a scenario of unbalanced data or unstable scenarios for using these models. An additional problem involves the extraction of features derived from the binding of these non-coding RNAs and genes. This binding process usually occurs fully or partially in animal genetics, which leads to considerable complexity in studying the process. Therefore, the main objective of the present work is to demonstrate that it is possible to use features extracted for miRNAs sequences in the development of diseases such as breast cancer, breast neoplasms, or if there is any influence with immune genes related to the SARS-COV-2. We performed experiments focusing on the erb-b2 receptor tyrosine kinase 2 (ERBB2) gene involved in breast cancer. For this purpose, we gathered miRNA-mRNA information from the binding between these two genetic molecules. In this part of our research, we applied a One-Class SVM and an Isolation Forest to discriminate between weak interactions, outliers given by the one-class model, and strong interactions that could occur between miRNA and mRNA (messenger RNA). Additionally, this study aimed to differentiate between breast cancer cases and breast neoplasm conditions. In this section we used the information encoded in lncRNAs. The additional feature used in this part was the frequency of k-mers, i.e., small portions of nucleotides, along with the data from the energy released in miRNA folding. The models used to discriminate between these diseases were One-Class SVM, SVM, and Random Forest. In the final part of the present work, we described a subset of probable miRNA binding with SARS-COV-2 RNA, focusing on those miRNAs with a relationship with genes involved in the immunological system of the human body. The models used as classifiers were One-Class SVM, SVM, and Random Forest. The results obtained in the present study are comparable to those found in the current literature and demonstrate the feasibility of using one-class models combined with features from the coupling of non-coding genes or mRNAs and their relationships with forms of breast cancer and viral infections. This work is expected to establish a basis for future avenues of research to apply one-class machine-learning models with feature extraction based on genomic sequences to the study of the relationship between non-coding RNAs and various diseases.School of ComputingPh. D. (Computing

    Computational approaches for improving treatment and prevention of viral infections

    Get PDF
    The treatment of infections with HIV or HCV is challenging. Thus, novel drugs and new computational approaches that support the selection of therapies are required. This work presents methods that support therapy selection as well as methods that advance novel antiviral treatments. geno2pheno[ngs-freq] identifies drug resistance from HIV-1 or HCV samples that were subjected to next-generation sequencing by interpreting their sequences either via support vector machines or a rules-based approach. geno2pheno[coreceptor-hiv2] determines the coreceptor that is used for viral cell entry by analyzing a segment of the HIV-2 surface protein with a support vector machine. openPrimeR is capable of finding optimal combinations of primers for multiplex polymerase chain reaction by solving a set cover problem and accessing a new logistic regression model for determining amplification events arising from polymerase chain reaction. geno2pheno[ngs-freq] and geno2pheno[coreceptor-hiv2] enable the personalization of antiviral treatments and support clinical decision making. The application of openPrimeR on human immunoglobulin sequences has resulted in novel primer sets that improve the isolation of broadly neutralizing antibodies against HIV-1. The methods that were developed in this work thus constitute important contributions towards improving the prevention and treatment of viral infectious diseases.Die Behandlung von HIV- oder HCV-Infektionen ist herausfordernd. Daher werden neue Wirkstoffe, sowie neue computerbasierte Verfahren benötigt, welche die Therapie verbessern. In dieser Arbeit wurden Methoden zur Unterstützung der Therapieauswahl entwickelt, aber auch solche, welche neuartige Therapien vorantreiben. geno2pheno[ngs-freq] bestimmt, ob Resistenzen gegen Medikamente vorliegen, indem es Hochdurchsatzsequenzierungsdaten von HIV-1 oder HCV Proben mittels Support Vector Machines oder einem regelbasierten Ansatz interpretiert. geno2pheno[coreceptor-hiv2] bestimmt den HIV-2 Korezeptorgebrauch dadurch, dass es einen Abschnitt des viralen Oberflächenproteins mit einer Support Vector Machine analysiert. openPrimeR kann optimale Kombinationen von Primern für die Multiplex-Polymerasekettenreaktion finden, indem es ein Mengenüberdeckungsproblem löst und auf ein neues logistisches Regressionsmodell für die Vorhersage von Amplifizierungsereignissen zurückgreift. geno2pheno[ngs-freq] und geno2pheno[coreceptor-hiv2] ermöglichen die Personalisierung antiviraler Therapien und unterstützen die klinische Entscheidungsfindung. Durch den Einsatz von openPrimeR auf humanen Immunoglobulinsequenzen konnten Primersätze generiert werden, welche die Isolierung von breit neutralisierenden Antikörpern gegen HIV-1 verbessern. Die in dieser Arbeit entwickelten Methoden leisten somit einen wichtigen Beitrag zur Verbesserung der Prävention und Therapie viraler Infektionskrankheiten
    corecore