7 research outputs found

    Genetic comparison of transmissible gastroenteritis coronaviruses

    Get PDF
    Transmissible gastroenteritis virus (TGEV) is a porcine coronavirus that threatens animal health and remains elusive despite years of research efforts. The systematical analysis of all available full-length genomes of TGEVs (a total of 43) and porcine respiratory coronaviruses PRCVs (a total of 7) showed that TGEVs fell into two independent evolutionary phylogenetic clades, GI and GII. Viruses circulating in China (until 2021) clustered with the traditional or attenuated vaccine strains within the same evolutionary clades (GI). In contrast, viruses latterly isolated in the USA fell into GII clade. The viruses circulating in China have a lower similarity with that isolated latterly in the USA all through the viral genome. In addition, at least four potential genomic recombination events were identified, three of which occurred in GI clade and one in GII clade. TGEVs circulating in China are distinct from the viruses latterly isolated in the USA at either genomic nucleotide or antigenic levels. Genomic recombination serves as a factor driving the expansion of TGEV genomic diversity

    Incomplete tricarboxylic acid cycle and proton gradient in Pandoravirus massiliensis: is it still a virus?

    Get PDF
    The discovery of Acanthamoeba polyphaga Mimivirus, the first isolated giant virus of amoeba, challenged the historical hallmarks defining a virus. Giant virion sizes are known to reach up to 2.3”m, making them visible by optical microscopy. Their large genome sizes of up to 2.5Mb can encode proteins involved in the translation apparatus. We have investigated possible energy production in Pandoravirus massiliensis. Mitochondrial membrane markers allowed for the detection of a membrane potential in purified virions and this was enhanced by a regulator of the tricarboxylic acid cycle but abolished by the use of a depolarizing agent. Bioinformatics was employed to identify enzymes involved in virion proton gradient generation and this approach revealed that 8 putative P. massiliensis proteins exhibited low sequence identities with known cellular enzymes involved in the universal tricarboxylic acid cycle. Further, all 8 viral genes were transcribed during replication. The product of one of these genes, ORF132, was cloned and expressed in Escherichia coli, and shown to function as an isocitrate dehydrogenase, a key enzyme of the tricarboxylic acid cycle. Our findings show for the first time that a membrane potential can exist in Pandoraviruses, and this may be related to tricarboxylic acid cycle. The presence of a proton gradient in P. massiliensis makes this virus a form of life for which it is legitimate to ask the question ‘what is a virus?’

    The myxozoan minicollagen gene repertoire was not simplified by the parasitic lifestyle: computational identification of a novel myxozoan minicollagen gene

    Get PDF
    Background Lineage-specific gene expansions represent one of the driving forces in the evolutionary dynamics of unique phylum traits. Myxozoa, a cnidarian subphylum of obligate parasites, are evolutionarily altered and highly reduced organisms with a simple body plan including cnidarian-specific organelles and polar capsules (a type of nematocyst). Minicollagens, a group of structural proteins, are prominent constituents of nematocysts linking Myxozoa and Cnidaria. Despite recent advances in the identification of minicollagens in Myxozoa, the evolutionary history and diversity of minicollagens in Myxozoa and Cnidaria remain elusive. Results We generated new transcriptomes of two myxozoan species using a novel pipeline for filtering of closely related contaminant species in RNA-seq data. Mining of our transcriptomes and published omics data confirmed the existence of myxozoan Ncol-4, reported only once previously, and revealed a novel noncanonical minicollagen, Ncol-5, which is exclusive to Myxozoa. Phylogenetic analyses support a close relationship between myxozoan Ncol-1-3 with minicollagens of Polypodium hydriforme, but suggest independent evolution in the case of the myxozoan minicollagens Ncol-4 and Ncol-5. Additional genome- and transcriptome-wide searches of cnidarian minicollagens expanded the dataset to better clarify the evolutionary trajectories of minicollagen. Conclusions The development of a new approach for the handling of next-generation data contaminated by closely related species represents a useful tool for future applications beyond the field of myxozoan research. This data processing pipeline allowed us to expand the dataset and study the evolution and diversity of minicollagen genes in Myxozoa and Cnidaria. We identified a novel type of minicollagen in Myxozoa (Ncol-5). We suggest that the large number of minicollagen paralogs in some cnidarians is a result of several recent large gene multiplication events. We revealed close juxtaposition of minicollagens Ncol-1 and Ncol-4 in myxozoan genomes, suggesting their common evolutionary history. The unique gene structure of myxozoan Ncol-5 suggests a specific function in the myxozoan polar capsule or tubule. Despite the fact that myxozoans possess only one type of nematocyst, their gene repertoire is similar to those of other cnidarians

    Genome-based analysis for the bioactive potential of Streptomyces yeochonensis CN732, an acidophilic filamentous soil actinobacterium

    Get PDF
    Acidophilic members of the genus Streptomyces can be a good source for novel secondary metabolites and degradative enzymes of biopolymers. In this study, a genome-based approach on Streptomyces yeochonensis CN732, a representative neutrotolerant acidophilic streptomycete, was employed to examine the biosynthetic as well as enzymatic potential, and also presence of any genetic tools for adaptation in acidic environment. A high quality draft genome (7.8 Mb) of S. yeochonensis CN732 was obtained with a G + C content of 73.53% and 6549 protein coding genes. The in silico analysis predicted presence of multiple biosynthetic gene clusters (BGCs), which showed similarity with those for antimicrobial, anticancer or antiparasitic compounds. However, the low levels of similarity with known BGCs for most cases suggested novelty of the metabolites from those predicted gene clusters. The production of various novel metabolites was also confirmed from the combined high performance liquid chromatography-mass spectrometry analysis. Through comparative genome analysis with related Streptomyces species, genes specific to strain CN732 and also those specific to neutrotolerant acidophilic species could be identified, which showed that genes for metabolism in diverse environment were enriched among acidophilic species. In addition, the presence of strain specific genes for carbohydrate active enzymes (CAZyme) along with many other singletons indicated uniqueness of the genetic makeup of strain CN732. The presence of cysteine transpeptidases (sortases) among the BGCs was also observed from this study, which implies their putative roles in the biosynthesis of secondary metabolites. This study highlights the bioactive potential of strain CN732, an acidophilic streptomycete with regard to secondary metabolite production and biodegradation potential using genomics based approach. The comparative genome analysis revealed genes specific to CN732 and also those among acidophilic species, which could give some insights into the adaptation of microbial life in acidic environment

    Classifying distinct data types: textual streams protein sequences and genomic variants

    Get PDF
    Artificial Intelligence (AI) is an interdisciplinary field combining different research areas with the end goal to automate processes in the everyday life and industry. The fundamental components of AI models are an “intelligent” model and a functional component defined by the end-application. That is, an intelligent model can be a statistical model that can recognize patterns in data instances to distinguish differences in between these instances. For example, if the AI is applied in car manufacturing, based on an image of a part of a car, the model can categorize if the car part is in the front, middle or rear compartment of the car, as a human brain would do. For the same example application, the statistical model informs a mechanical arm, the functional component, for the current car compartment and the arm in turn assembles this compartment, of the car, based on predefined instructions, likely as a human hand would follow human brain neural signals. A crucial step of AI applications is the classification of input instances by the intelligent model. The classification step in the intelligent model pipeline allows the subsequent steps to act in similar fashion for instances belonging to the same category. We define as classification the module of the intelligent model, which categorizes the input instances based on predefined human-expert or data-driven produced patterns of the instances. Irrespectively of the method to find patterns in data, classification is composed of four distinct steps: (i) input representation, (ii) model building (iii) model prediction and (iv) model assessment. Based on these classification steps, we argue that applying classification on distinct data types holds different challenges. In this thesis, I focus on challenges for three distinct classification scenarios: (i) Textual Streams: how to advance the model building step, commonly used for static distribution of data, to classify textual posts with transient data distribution? (ii) Protein Prediction: which biologically meaningful information can be used in the input representation step to overcome the limited training data challenge? (iii) Human Variant Pathogenicity Prediction: how to develop a classification system for functional impact of human variants, by providing standardized and well accepted evidence for the classification outcome and thus enabling the model assessment step? To answer these research questions, I present my contributions in classifying these different types of data: temporalMNB: I adapt the sequential prediction with expert advice paradigm to optimally aggregate complementary distributions to enhance a Naive Bayes model to adapt on drifting distribution of the characteristics of the textual posts. dom2vec: our proposal to learn embedding vectors for the protein domains using self-supervision. Based on the high performance achieved by the dom2vec embeddings in quantitative intrinsic assessment on the captured biological information, I provide example evidence for an analogy between the local linguistic features in natural languages and the domain structure and function information in domain architectures. Last, I describe GenOtoScope bioinformatics software tool to automate standardized evidence-based criteria for pathogenicity impact of variants associated with hearing loss. Finally, to increase the practical use of our last contribution, I develop easy-to-use software interfaces to be used, in research settings, by clinical diagnostics personnel.KĂŒnstliche Intelligenz (KI) ist ein interdisziplinĂ€res Gebiet, das verschiedene Forschungsbereiche mit dem Ziel verbindet, Prozesse im Alltag und in der Industrie zu automatisieren. Die grundlegenden Komponenten von KI-Modellen sind ein “intelligentes” Modell und eine durch die Endanwendung definierte funktionale Komponente. Das heißt, ein intelligentes Modell kann ein statistisches Modell sein, das Muster in Dateninstanzen erkennen kann, um Unterschiede zwischen diesen Instanzen zu unterscheiden. Wird die KI beispielsweise in der Automobilherstellung eingesetzt, kann das Modell auf der Grundlage eines Bildes eines Autoteils kategorisieren, ob sich das Autoteil im vorderen, mittleren oder hinteren Bereich des Autos befindet, wie es ein menschliches Gehirn tun wĂŒrde. Bei der gleichen Beispielanwendung informiert das statistische Modell einen mechanischen Arm, die funktionale Komponente, ĂŒber den aktuellen Fahrzeugbereich, und der Arm wiederum baut diesen Bereich des Fahrzeugs auf der Grundlage vordefinierter Anweisungen zusammen, so wie eine menschliche Hand den neuronalen Signalen des menschlichen Gehirns folgen wĂŒrde. Ein entscheidender Schritt bei KI-Anwendungen ist die Klassifizierung von Eingabeinstanzen durch das intelligente Modell. UnabhĂ€ngig von der Methode zum Auffinden von Mustern in Daten besteht die Klassifizierung aus vier verschiedenen Schritten: (i) Eingabedarstellung, (ii) Modellbildung, (iii) Modellvorhersage und (iv) Modellbewertung. Ausgehend von diesen Klassifizierungsschritten argumentiere ich, dass die Anwendung der Klassifizierung auf verschiedene Datentypen unterschiedliche Herausforderungen mit sich bringt. In dieser Arbeit konzentriere ich uns auf die Herausforderungen fĂŒr drei verschiedene Klassifizierungsszenarien: (i) Textdatenströme: Wie kann der Schritt der Modellerstellung, der ĂŒblicherweise fĂŒr eine statische Datenverteilung verwendet wird, weiterentwickelt werden, um die Klassifizierung von TextbeitrĂ€gen mit einer instationĂ€ren Datenverteilung zu erlernen? (ii) Proteinvorhersage: Welche biologisch sinnvollen Informationen können im Schritt der Eingabedarstellung verwendet werden, um die Herausforderung der begrenzten Trainingsdaten zu ĂŒberwinden? (iii) Vorhersage der PathogenitĂ€t menschlicher Varianten: Wie kann ein Klassifizierungssystem fĂŒr die funktionellen Auswirkungen menschlicher Varianten entwickelt werden, indem standardisierte und anerkannte Beweise fĂŒr das Klassifizierungsergebnis bereitgestellt werden und somit der Schritt der Modellbewertung ermöglicht wird? Um diese Forschungsfragen zu beantworten, stelle ich meine BeitrĂ€ge zur Klassifizierung dieser verschiedenen Datentypen vor: temporalMNB: Verbesserung des Naive-Bayes-Modells zur Klassifizierung driftender Textströme durch Ensemble-Lernen. dom2vec: Lernen von Einbettungsvektoren fĂŒr ProteindomĂ€nen durch SelbstĂŒberwachung. Auf der Grundlage der berichteten Ergebnisse liefere ich Beispiele fĂŒr eine Analogie zwischen den lokalen linguistischen Merkmalen in natĂŒrlichen Sprachen und den DomĂ€nenstruktur- und Funktionsinformationen in DomĂ€nenarchitekturen. Schließlich beschreibe ich ein bioinformatisches Softwaretool, GenOtoScope, zur Automatisierung standardisierter evidenzbasierter Kriterien fĂŒr die orthogenitĂ€tsauswirkungen von Varianten, die mit angeborener Schwerhörigkeit in Verbindung stehen

    Development and application of genetic engineering methods for Actinoplanes sp. SE50/110

    Get PDF
    Gren T. Development and application of genetic engineering methods for Actinoplanes sp. SE50/110. Bielefeld: UniversitÀt Bielefeld; 2017.The alpha-glucosidase inhibitor acarbose is used for treatment of diabetes mellitus type 2, and is manufactured industrially with overproducing derivatives of _Actinoplanes_ sp. SE50/110. This strain was reportedly optimized through step-by-step conventional mutagenesis procedures in the past, however this strategy seems to reach its limits by now. Despite of high industrial significance, only limited information exists regarding acarbose metabolism, function and regulation of these processes, due to the absence of proper genetic engineering methods and tools developed for this strain. In this work, a full toolkit and set of methods for genetic engineering of _Actinoplanes_ sp. SE50/110 were developed. A standardized protocol for a DNA transfer through _E. coli_ - _Actinoplanes_ conjugation was adjusted and applied for the transfer of phiC31, phiBT1 and VWB actinophage-based integrative vectors and pSG5-based replicative vector. Integration sites, occurring once per genome for all integrative vectors, were sequenced and characterized for the first time in _{Actinoplanes_ sp. SE50/110. Notably, the studied plasmids were proven to be stable and neutral with respect to strain morphology and acarbose production, enabling future use for genetic manipulations of _Actinoplanes_ sp. SE50/110. To further broaden the spectrum of available tools, a GUS reporter system, was established in _Actinoplanes_ sp. SE50/110. The set of different methods for gene knockouts was tested, which included integrative and replicative vector based knockouts, ReDirect system based knockouts and CRISPR-Cas9 genetic engineering. ReDirect system was further used to create a library of _Actinoplanes_ single knockout strains. Two of the strains, _Actinoplanes_ _acbD_ and _Actinoplanes_ _cadC_ knockout mutants were further characterized in detail regarding their phenotype. The developed gene cloning system offers multiple possibilities to solve fundamental questions regarding acarbose production, in particular, formulation and verification of the complete acarbose metabolism model, as well as the rational design of acarbose overproducing strains
    corecore