2,108 research outputs found

    Genomes of trombidid mites reveal novel predicted allergens and laterally-transferred genes associated with secondary metabolism

    Get PDF
    Trombidid mites have a unique lifecycle in which only the larval stage is ectoparasitic. In the superfamily Trombiculoidea (“chiggers”), the larvae feed preferentially on vertebrates, including humans. Species in the genus Leptotrombidium are vectors of a potentially fatal bacterial infection, scrub typhus, which affects 1 million people annually. Moreover, chiggers can cause pruritic dermatitis (trombiculiasis) in humans and domesticated animals. In the Trombidioidea (velvet mites), the larvae feed on other arthropods and are potential biological control agents for agricultural pests. Here, we present the first trombidid mites genomes, obtained both for a chigger, Leptotrombidium deliense, and for a velvet mite, Dinothrombium tinctorium

    Proteomics investigations of immune activation

    Get PDF

    Data Representation in Machine Learning Methods with its Application to Compilation Optimization and Epitope Prediction

    Get PDF
    In this dissertation we explore the application of machine learning algorithms to compilation phase order optimization, and epitope prediction. The common thread running through these two disparate domains is the type of data being dealt with. In both problem domains we are dealing with categorical data, with its representation playing a significant role in the performance of classification algorithms. We first present a neuroevolutionary approach which orders optimization phases to generate compiled programs with performance superior to those compiled using LLVM\u27s -O3 optimization level. Performance improvements calculated as the speed of the compiled program\u27s execution ranged from 27% for the ccbench program, to 40.8% for bzip2. This dissertation then explores the problem of data representation of 3D biological data, such as amino acids. A new approach for distributed representation of 3D biological data through the process of embedding is proposed and explored. Analogously to word embedding, we developed a system that uses atomic and residue coordinates to generate distributed representation for residues, which we call 3D Residue BioVectors. Preliminary results are presented which demonstrate that even the low dimensional 3D Residue BioVectors can be used to predict conformational epitopes and protein-protein interactions, with promising proficiency. The generation of such 3D BioVectors, and the proposed methodology, opens the door for substantial future improvements, and application domains. The dissertation then explores the problem domain of linear B-Cell epitope prediction. This problem domain deals with predicting epitopes based strictly on the protein sequence. We present the DRREP system, which demonstrates how an ensemble of shallow neural networks can be combined with string kernels and analytical learning algorithm to produce state of the art epitope prediction results. DRREP was tested on the SARS subsequence, the HIV, Pellequer, AntiJen datasets, and the standard SEQ194 test dataset. AUC improvements achieved over the state of the art ranged from 3% to 8%. Finally, we present the SEEP epitope classifier, which is a multi-resolution SMV ensemble based classifier which uses conjoint triad feature representation, and produces state of the art classification results. SEEP leverages the domain specific knowledge based protein sequence encoding developed within the protein-protein interaction research domain. Using an ensemble of multi-resolution SVMs, and a sliding window based pre and post processing pipeline, SEEP achieves an AUC of 91.2 on the standard SEQ194 test dataset, a 24% improvement over the state of the art

    Computational Analysis of T Cell Receptor Repertoire and Structure

    Get PDF
    The human adaptive immune system has evolved to provide a sophisticated response to a vast body of pathogenic microbes and toxic substances. The primary mediators of this response are T and B lymphocytes. Antigenic peptides presented at the surface of infected cells by major histocompatibility complex (MHC) molecules are recognised by T cell receptors (TCRs) with exceptional specificity. This specificity arises from the enormous diversity in TCR sequence and structure generated through an imprecise process of somatic gene recombination that takes place during T cell development. Quantification of the TCR repertoire through the analysis of data produced by high-throughput RNA sequencing allows for a characterisation of the immune response to disease over time and between patients, and the development of methods for diagnosis and therapeutic design. The latest version of the software package Decombinator extracts and quantifies the TCR repertoire with improved accuracy and compatibility with complementary experimental protocols and external computational tools. The software has been extended for analysis of fragmented short-read data from single cells, comparing favourably with two alternative tools. The development of cell-based therapeutics and vaccines is incomplete without an understanding of molecular level interactions. The breadth of TCR diversity and cross-reactivity presents a barrier for comprehensive structural resolution of the repertoire by traditional means. Computational modelling of TCR structures and TCR-pMHC complexes provides an efficient alternative. Four generalpurpose protein-protein docking platforms were compared in their ability to accurately model TCR-pMHC complexes. Each platform was evaluated against an expanded benchmark of docking test cases and in the context of varying additional information about the binding interface. Continual innovation in structural modelling techniques sets the stage for novel automated tools for TCR design. A prototype platform has been developed, integrating structural modelling and an optimisation routine, to engineer desirable features into TCR and TCR-pMHC complex models

    Soft Computing Techiniques for the Protein Folding Problem on High Performance Computing Architectures

    Get PDF
    The protein-folding problem has been extensively studied during the last fifty years. The understanding of the dynamics of global shape of a protein and the influence on its biological function can help us to discover new and more effective drugs to deal with diseases of pharmacological relevance. Different computational approaches have been developed by different researchers in order to foresee the threedimensional arrangement of atoms of proteins from their sequences. However, the computational complexity of this problem makes mandatory the search for new models, novel algorithmic strategies and hardware platforms that provide solutions in a reasonable time frame. We present in this revision work the past and last tendencies regarding protein folding simulations from both perspectives; hardware and software. Of particular interest to us are both the use of inexact solutions to this computationally hard problem as well as which hardware platforms have been used for running this kind of Soft Computing techniques.This work is jointly supported by the FundaciónSéneca (Agencia Regional de Ciencia y Tecnología, Región de Murcia) under grants 15290/PI/2010 and 18946/JLI/13, by the Spanish MEC and European Commission FEDER under grant with reference TEC2012-37945-C02-02 and TIN2012-31345, by the Nils Coordinated Mobility under grant 012-ABEL-CM-2014A, in part financed by the European Regional Development Fund (ERDF). We also thank NVIDIA for hardware donation within UCAM GPU educational and research centers.Ingeniería, Industria y Construcció

    Mapping the proteome with data-driven methods: A cycle of measurement, modeling, hypothesis generation, and engineering

    Get PDF
    The living cell exhibits emergence of complex behavior and its modeling requires a systemic, integrative approach if we are to thoroughly understand and harness it. The work in this thesis has had the more narrow aim of quantitatively characterizing and mapping the proteome using data-driven methods, as proteins perform most functional and structural roles within the cell. Covered are the different parts of the cycle from improving quantification methods, to deriving protein features relying on their primary structure, predicting the protein content solely from sequence data, and, finally, to developing theoretical protein engineering tools, leading back to experiment.\ua0\ua0\ua0\ua0 High-throughput mass spectrometry platforms provide detailed snapshots of a cell\u27s protein content, which can be mined towards understanding how the phenotype arises from genotype and the interplay between the various properties of the constituent proteins. However, these large and dense data present an increased analysis challenge and current methods capture only a small fraction of signal. The first part of my work has involved tackling these issues with the implementation of a GPU-accelerated and distributed signal decomposition pipeline, making factorization of large proteomics scans feasible and efficient. The pipeline yields individual analyte signals spanning the majority of acquired signal, enabling high precision quantification and further analytical tasks.\ua0\ua0\ua0 Having such detailed snapshots of the proteome enables a multitude of undertakings. One application has been to use a deep neural network model to learn the amino acid sequence determinants of temperature adaptation, in the form of reusable deep model features. More generally, systemic quantities may be predicted from the information encoded in sequence by evolutionary pressure. Two studies taking inspiration from natural language processing have sought to learn the grammars behind the languages of expression, in one case predicting mRNA levels from DNA sequence, and in the other protein abundance from amino acid sequence. These two models helped build a quantitative understanding of the central dogma and, furthermore, in combination yielded an improved predictor of protein amount. Finally, a mathematical framework relying on the embedded space of a deep model has been constructed to assist guided mutation of proteins towards optimizing their abundance

    Computational approaches in high-throughput proteomics data analysis

    Get PDF
    Proteins are key components in biological systems as they mediate the signaling responsible for information processing in a cell and organism. In biomedical research, one goal is to elucidate the mechanisms of cellular signal transduction pathways to identify possible defects that cause disease. Advancements in technologies such as mass spectrometry and flow cytometry enable the measurement of multiple proteins from a system. Proteomics, or the large-scale study of proteins of a system, thus plays an important role in biomedical research. The analysis of all high-throughput proteomics data requires the use of advanced computational methods. Thus, the combination of bioinformatics and proteomics has become an important part in research of signal transduction pathways. The main objective in this study was to develop and apply computational methods for the preprocessing, analysis and interpretation of high-throughput proteomics data. The methods focused on data from tandem mass spectrometry and single cell flow cytometry, and integration of proteomics data with gene expression microarray data and information from various biological databases. Overall, the methods developed and applied in this study have led to new ways of management and preprocessing of proteomics data. Additionally, the available tools have successfully been used to help interpret biomedical data and to facilitate analysis of data that would have been cumbersome to do without the use of computational methods.Proteiineilla on tärkeä merkitys biologisissa systeemeissä sillä ne koordinoivat erilaisia solujen ja organismien prosesseja. Yksi biolääketieteellisen tutkimuksen tavoitteista on valottaa solujen viestintäreittejä ja niiden toiminnassa tapahtuvia muutoksia eri sairauksien yhteydessä, jotta tällaisia muutoksia voitaisiin korjata. Proteomiikka on proteiinien laajamittaista tutkimista solusta, kudoksesta tai organismista. Proteomiikan menetelmät kuten massaspektrometria ja virtaussytometria ovat keskeisiä biolääketieteellisen tutkimuksen menetelmiä, joilla voidaan mitata näytteestä samanaikaisesti useita proteiineja. Nykyajan kehittyneet proteomiikan mittausteknologiat tuottavat suuria tulosaineistoja ja edellyttävät laskennallisten menetelmien käyttöä aineiston analyysissä. Bioinformatiikan menetelmät ovatkin nousseet tärkeäksi osaksi proteomiikka-analyysiä ja viestintäreittien tutkimusta. Tämän tutkimuksen päätavoite oli kehittää ja soveltaa tehokkaita laskennallisia menetelmiä laajamittaisten proteomiikka-aineistojen esikäsittelyyn, analyysiin ja tulkintaan. Tässä tutkimuksessa kehitettiin esikäsittelymenetelmä massaspektrometria-aineistolle sekä automatisoitu analyysimenetelmä virtaussytometria-aineistolle. Proteiinitason tietoa yhdistettiin mittauksiin geenien transkriptiotasoista ja olemassaolevaan biologisista tietokannoista poimittuun tietoon. Väitöskirjatyö osoittaa, että laskennallisilla menetelmillä on keskeinen merkitys proteomiikan aineistojen hallinnassa, esikäsittelyssä ja analyysissä. Tutkimuksessa kehitetyt analyysimenetelmät edistävät huomattavasti biolääketieteellisen tiedon laajempaa hyödyntämistä ja ymmärtämistä

    PHYSELLA ACUTA, COMPARATIVE IMMUNOLOGY AND EVOLUTIONARY ASPECTS OF GASTROPOD IMMUNE FUNCTION

    Get PDF
    Gastropod immunobiology has benefitted from investigations focused on the planorbid snail Biomphalaria glabrata, intermediate host for the human parasite Schistosoma mansoni. Though such concentrated efforts have elucidated fascinating aspects of invertebrate immunity, they have not provided full knowledge regarding the evolution of immune function among other gastropod species. This dissertation presents the importance of making strategic choices regarding which organisms to select for comparative immunology. Herein, the choice was made to investigate the immunobiology of Physella acuta, a freshwater snail species of the Physidae, a sister family to Planorbidae to which B. glabrata belongs. Benefiting greatly from the use of next-generation sequencing (NGS), the immunobiology of P. acuta was studied using 454 pyrosequencing, Illumina RNA-seq, experimental infections with Echinostoma paraensei (trematode parasite), and other molecular techniques. These analyses revealed that many components of gastropod immunity have been conserved among physid and planorbid snails. Also, P. acuta displays differences in immune function, such as the use of fibrinogen-related proteins in response to trematode parasite exposure. Remarkably, P. acuta differentially expressed relatively large immune-relevant gene families (CD109/TEP, dermatopin, GTPase IMAP, among others) after exposure to E. paraenesi. Inspection of the individual members that represent these gene families demonstrated complex transcriptional profiles that suggest parasite influence on host immune function and the capacity of a host to maintain homeostasis while supporting parasite development, an extended phenotype of E. paraenesi. These lab-based studies represent the first large scale characterizations of P. acuta immune function. The immune factors described through NGS approaches enable investigations of the ecoimmunology of P. acuta snails collected from the field. This approach uncovered many sequences that are differentially expressed by P. acuta naturally in the field relative to the lab environment. There is variation in the expression of certain antimicrobial factors and genes governing biological processes. Overall, this dissertation has expanded the scope of gastropod immunity and provides resources and insights that are accessible for continued development and understanding of evolutionary and comparative immunology concepts

    Next generation transcriptomes for next generation genomes using est2assembly

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The decreasing costs of capillary-based Sanger sequencing and next generation technologies, such as 454 pyrosequencing, have prompted an explosion of transcriptome projects in non-model species, where even shallow sequencing of transcriptomes can now be used to examine a range of research questions. This rapid growth in data has outstripped the ability of researchers working on non-model species to analyze and mine transcriptome data efficiently.</p> <p>Results</p> <p>Here we present a semi-automated platform '<it>est2assembly</it>' that processes raw sequence data from Sanger or 454 sequencing into a hybrid <it>de-novo </it>assembly, annotates it and produces GMOD compatible output, including a SeqFeature database suitable for GBrowse. Users are able to parameterize assembler variables, judge assembly quality and determine the optimal assembly for their specific needs. We used <it>est2assembly </it>to process <it>Drosophila </it>and <it>Bicyclus </it>public Sanger EST data and then compared them to published 454 data as well as eight new insect transcriptome collections.</p> <p>Conclusions</p> <p>Analysis of such a wide variety of data allows us to understand how these new technologies can assist EST project design. We determine that assembler parameterization is as essential as standardized methods to judge the output of ESTs projects. Further, even shallow sequencing using 454 produces sufficient data to be of wide use to the community. <it>est2assembly </it>is an important tool to assist manual curation for gene models, an important resource in their own right but especially for species which are due to acquire a genome project using Next Generation Sequencing.</p
    corecore