26 research outputs found

    A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: using machine learning and graph analysis methodologies to reconstruct the bibliome

    Get PDF
    Background In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation. Objectives Aligned with, the European Union strategy “Delivering on EU Food Safety and Nutrition in 2050″ which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain. Methods For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion. Results and conclusions In this sense, 5814 documents were manually annotated and 7424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/Fundação para a Ciência e a Tecnologia | Ref. UIDB/50006/2020Xunta de Galicia | Ref. ED481B-2019-032Xunta de Galicia | Ref. ED431G2019/06Xunta de Galicia | Ref. ED431C 2022/03Universidade de Vigo/CISU

    Contents

    Get PDF

    A systems biology approach for the characterization of metabolic bottlenecks in recombinant protein production processes

    Get PDF
    Tese de doutoramento em Engenharia Química e BiológicaThe main purpose of this thesis is to investigate the influence of recombinant protein production in the reorganization of the metabolic activities and the resulting stress-induced responses in the bacterium Escherichia coli. More specifically, the focus is on the RelA-mediated stringent response, a stress response that is triggered by the sudden lack of intracellular amino acids and that has been associated with the metabolic burden imposed by recombinant processes. To identify the main metabolic bottlenecks in recombinant biosynthetic processes, which include maintenance of recombinant DNA and expression of heterologous genes, a systematic modelling approach is proposed, capable of predicting the amino acid shortages caused by recombinant processes and the consequent activation of the RelA-dependent guanosine pentaphosphate (ppGpp) synthesis. The view of ppGpp as a primarily regulator of gene transcription has been expanded and it is now clear that the response controlled by ppGpp is crucial for cell survival during the adaptation to stressful conditions. Major advances have been achieved to understand this regulatory system governing gene expression in response to environmental growth perturbations, but so far mainly transcriptome and proteome analyses that have been applied to elucidate the stringent control mediated by ppGpp. Metabolomics analysis can provide substantial information on the impact of this stress response at the biochemical level, in particular during recombinant bioprocesses. Therefore, two metabolomics-based approaches were applied: metabolic profiling to evaluate the intracellular metabolic profiles and metabolic footprinting to estimate the profiles of extracellular metabolites. In these metabolomics studies two E. coli strains (E. coli W3110 and the isogenic ΔrelA mutant) were used to investigate the influence of recombinant processes on the host cells’ metabolism, as well as the main metabolic activities affected by the RelA activity under different growth conditions. The mutant strain presented a “relaxed” phenotype that characterized this bacterial system by an acute delay in most metabolic adaptations to transient growth conditions. Most importantly, it was shown that these mutant cells lack metabolic adjustments that are often observed after metabolic burden phenomena. Nevertheless, this cellular system presented major advantages in terms of biomass yield and productivity, which imply a remarkable improvement in recombinant bioprocesses. Thus, alleviating stress responses can be beneficial if they impair the desired quality and quantity of the recombinant product. However, it must be pointed out that this may be an alternative as long as recombinant bioprocesses are designed to achieve a finer balance between strain improvement strategies and culturing conditions.O trabalho realizado no âmbito desta tese teve como principal finalidade a avaliação das alterações metabólicas relacionadas com a produção de proteínas recombinantes em células bacterianas de Escherichia coli e a consequente activação de respostas de stress. Foi evidenciada a resposta restringente promovida pela actividade da enzima RelA, dado ser uma das principais respostas de stress induzidas pelo decréscimo da quantidade de aminoácidos disponíveis no meio intracelular como consequência da expressão de proteínas recombinantes. As diferenças na composição em aminoácidos entre as proteínas da biomassa e recombinantes, têm sido apontadas como principais causas para o desequilíbrio metabólico que conduz à exaustão de alguns metabolitos, nomeadamente de aminoácidos. De modo a explorar estes fenómenos e avaliar o impacto dos processos recombinantes no metabolismo das células hospedeiras, foi proposto um modelo matemático capaz de identificar pontos de estrangulamento na rede metabólica. Estes locais correspondem a vias metabólicas que apresentam limitações na capacidade catalítica e que serão essenciais para compensar o consumo desproporcionado de aminoácidos levado a cabo pela síntese de proteínas recombinantes. Associado a este fenómeno foi considerada a descrição da síntese de nucleótidos guanosina pentafosfato (ppGpp) induzida pela escassez de aminoácidos no meio intracelular. O reconhecimento deste nucleótido como um regulador fundamental na transcrição da informação genética tem sido amplamente descrito e tornou-se evidente que as respostas celulares controladas pelo ppGpp são determinantes para a sobrevivência e adaptação dos organismos a condições adversas. Neste sentido, vários estudos foram elaborados para elucidar o papel do ppGpp no controlo destas respostas de stress e nas alterações fisiológicas que advêm destes processos, nomeadamente ao nível do metabolismo. A análise do metaboloma, em comparação com o transcriptoma ou o proteoma, é capaz de capturar de forma mais directa a relação entre as actividades metabólicas e a fisiologia dos organismos, designadamente em sistema recombinantes. Neste trabalho foram elaborados alguns estudos em que se aplicaram duas abordagens de análise metabolómica distintas: profiling metabólico, que se refere à análise do perfil de metabolitos intracelulares; e footprinting metabólico, que se refere à análise do perfil de metabolitos extracelulares. Nestes estudos foram usadas duas estirpes de E. coli (W3110 e a estirpe isogénica com mutação no gene relA) clonadas com um vector de expressão pTRC-His- AcGP1 que codifica a proteína verde fluorescente AcGFP1, derivada da proteína AcGFP da Aequorea coerulescens. Foram avaliadas as principais alterações metabólicas provocadas pela indução da produção de proteína recombinante e pela actividade catalítica da enzima RelA em diversas condições de crescimento. Comparando os perfis metabólicos das duas estirpes, foram estimadas várias diferenças significativas que se podem revelar críticas durante processos recombinantes. A estirpe mutante revelou um comportamento típico de um fenótipo “relaxado”, que é caracterizado por um retardamento significativo na adaptação do metabolismo a alterações nas condições de crescimento. Não obstante, a estirpe mutante exibiu melhores resultados em termos de rendimento em biomassa e produtividade, o que representa uma vantagem notável para a aplicação destes sistemas bacterianos recombinantes ao nível industrial. Em resumo, a restrição de respostas de stress pode trazer benefícios se a qualidade e quantidade do produto estiverem em causa, mas deve salientar-se que não é uma solução absoluta, sendo que as condições de processamento devem ser também levadas em consideração na implementação destes bioprocessos

    Knowledge representation and ontologies for lipids and lipidomics

    Get PDF
    Master'sMASTER OF SCIENC

    Computational strategies for a system-level understanding of metabolism

    Get PDF
    Cell metabolism is the biochemical machinery that provides energy and building blocks to sustain life. Understanding its fine regulation is of pivotal relevance in several fields, from metabolic engineering applications to the treatment of metabolic disorders and cancer. Sophisticated computational approaches are needed to unravel the complexity of metabolism. To this aim, a plethora of methods have been developed, yet it is generally hard to identify which computational strategy is most suited for the investigation of a specific aspect of metabolism. This review provides an up-to-date description of the computational methods available for the analysis of metabolic pathways, discussing their main advantages and drawbacks. In particular, attention is devoted to the identification of the appropriate scale and level of accuracy in the reconstruction of metabolic networks, and to the inference of model structure and parameters, especially when dealing with a shortage of experimental measurements. The choice of the proper computational methods to derive in silico data is then addressed, including topological analyses, constraint-based modeling and simulation of the system dynamics. A description of some computational approaches to gain new biological knowledge or to formulate hypotheses is finally provided

    A Semantic Framework for Declarative and Procedural Knowledge

    Get PDF
    In any scientic domain, the full set of data and programs has reached an-ome status, i.e. it has grown massively. The original article on the Semantic Web describes the evolution of a Web of actionable information, i.e.\ud information derived from data through a semantic theory for interpreting the symbols. In a Semantic Web, methodologies are studied for describing, managing and analyzing both resources (domain knowledge) and applications (operational knowledge) - without any restriction on what and where they\ud are respectively suitable and available in the Web - as well as for realizing automatic and semantic-driven work\ud ows of Web applications elaborating Web resources.\ud This thesis attempts to provide a synthesis among Semantic Web technologies, Ontology Research, Knowledge and Work\ud ow Management. Such a synthesis is represented by Resourceome, a Web-based framework consisting of two components which strictly interact with each other: an ontology-based and domain-independent knowledge manager system (Resourceome KMS) - relying on a knowledge model where resource and operational knowledge are contextualized in any domain - and a semantic-driven work ow editor, manager and agent-based execution system (Resourceome WMS).\ud The Resourceome KMS and the Resourceome WMS are exploited in order to realize semantic-driven formulations of work\ud ows, where activities are semantically linked to any involved resource. In the whole, combining the use of domain ontologies and work ow techniques, Resourceome provides a exible domain and operational knowledge organization, a powerful engine for semantic-driven work\ud ow composition, and a distributed, automatic and\ud transparent environment for work ow execution

    Biomedical Event Extraction with Machine Learning

    Get PDF
    Biomedical natural language processing (BioNLP) is a subfield of natural language processing, an area of computational linguistics concerned with developing programs that work with natural language: written texts and speech. Biomedical relation extraction concerns the detection of semantic relations such as protein-protein interactions (PPI) from scientific texts. The aim is to enhance information retrieval by detecting relations between concepts, not just individual concepts as with a keyword search. In recent years, events have been proposed as a more detailed alternative for simple pairwise PPI relations. Events provide a systematic, structural representation for annotating the content of natural language texts. Events are characterized by annotated trigger words, directed and typed arguments and the ability to nest other events. For example, the sentence “Protein A causes protein B to bind protein C” can be annotated with the nested event structure CAUSE(A, BIND(B, C)). Converted to such formal representations, the information of natural language texts can be used by computational applications. Biomedical event annotations were introduced by the BioInfer and GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task on Event Extraction. In this thesis we present a method for automated event extraction, implemented as the Turku Event Extraction System (TEES). A unified graph format is defined for representing event annotations and the problem of extracting complex event structures is decomposed into a number of independent classification tasks. These classification tasks are solved using SVM and RLS classifiers, utilizing rich feature representations built from full dependency parsing. Building on earlier work on pairwise relation extraction and using a generalized graph representation, the resulting TEES system is capable of detecting binary relations as well as complex event structures. We show that this event extraction system has good performance, reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently, TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared Tasks, as well as shown competitive performance in the binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared tasks. The Turku Event Extraction System is published as a freely available open-source project, documenting the research in detail as well as making the method available for practical applications. In particular, in this thesis we describe the application of the event extraction method to PubMed-scale text mining, showing how the developed approach not only shows good performance, but is generalizable and applicable to large-scale real-world text mining projects. Finally, we discuss related literature, summarize the contributions of the work and present some thoughts on future directions for biomedical event extraction. This thesis includes and builds on six original research publications. The first of these introduces the analysis of dependency parses that leads to development of TEES. The entries in the three BioNLP Shared Tasks, as well as in the DDIExtraction 2011 task are covered in four publications, and the sixth one demonstrates the application of the system to PubMed-scale text mining.Siirretty Doriast

    Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance

    Get PDF
    Philosophiae Doctor - PhDTo ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.South Afric

    Development of a hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance

    Get PDF
    Philosophiae Doctor - PhDTo ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and(ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/.DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance.Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis.HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma(HCC) related genes, comprehensive reviews on HCV biology and drug development,functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers,gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore,eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists
    corecore