26 research outputs found
A novel gluten knowledge base of potential biomedical and health-related interactions extracted from the literature: using machine learning and graph analysis methodologies to reconstruct the bibliome
Background
In return for their nutritional properties and broad availability, cereal crops have been associated with different alimentary disorders and symptoms, with the majority of the responsibility being attributed to gluten. Therefore, the research of gluten-related literature data continues to be produced at ever-growing rates, driven in part by the recent exploratory studies that link gluten to non-traditional diseases and the popularity of gluten-free diets, making it increasingly difficult to access and analyse practical and structured information. In this sense, the accelerated discovery of novel advances in diagnosis and treatment, as well as exploratory studies, produce a favourable scenario for disinformation and misinformation.
Objectives
Aligned with, the European Union strategy “Delivering on EU Food Safety and Nutrition in 2050″ which emphasizes the inextricable links between imbalanced diets, the increased exposure to unreliable sources of information and misleading information, and the increased dependency on reliable sources of information; this paper presents GlutKNOIS, a public and interactive literature-based database that reconstructs and represents the experimental biomedical knowledge extracted from the gluten-related literature. The developed platform includes different external database knowledge, bibliometrics statistics and social media discussion to propose a novel and enhanced way to search, visualise and analyse potential biomedical and health-related interactions in relation to the gluten domain.
Methods
For this purpose, the presented study applies a semi-supervised curation workflow that combines natural language processing techniques, machine learning algorithms, ontology-based normalization and integration approaches, named entity recognition methods, and graph knowledge reconstruction methodologies to process, classify, represent and analyse the experimental findings contained in the literature, which is also complemented by data from the social discussion.
Results and conclusions
In this sense, 5814 documents were manually annotated and 7424 were fully automatically processed to reconstruct the first online gluten-related knowledge database of evidenced health-related interactions that produce health or metabolic changes based on the literature. In addition, the automatic processing of the literature combined with the knowledge representation methodologies proposed has the potential to assist in the revision and analysis of years of gluten research. The reconstructed knowledge base is public and accessible at https://sing-group.org/glutknois/Fundação para a Ciência e a Tecnologia | Ref. UIDB/50006/2020Xunta de Galicia | Ref. ED481B-2019-032Xunta de Galicia | Ref. ED431G2019/06Xunta de Galicia | Ref. ED431C 2022/03Universidade de Vigo/CISU
A systems biology approach for the characterization of metabolic bottlenecks in recombinant protein production processes
Tese de doutoramento em Engenharia Química e BiológicaThe main purpose of this thesis is to investigate the influence of recombinant protein production
in the reorganization of the metabolic activities and the resulting stress-induced responses in the
bacterium Escherichia coli. More specifically, the focus is on the RelA-mediated stringent
response, a stress response that is triggered by the sudden lack of intracellular amino acids and
that has been associated with the metabolic burden imposed by recombinant processes.
To identify the main metabolic bottlenecks in recombinant biosynthetic processes, which include
maintenance of recombinant DNA and expression of heterologous genes, a systematic modelling
approach is proposed, capable of predicting the amino acid shortages caused by recombinant
processes and the consequent activation of the RelA-dependent guanosine pentaphosphate
(ppGpp) synthesis.
The view of ppGpp as a primarily regulator of gene transcription has been expanded and it is now
clear that the response controlled by ppGpp is crucial for cell survival during the adaptation to
stressful conditions. Major advances have been achieved to understand this regulatory system
governing gene expression in response to environmental growth perturbations, but so far mainly
transcriptome and proteome analyses that have been applied to elucidate the stringent control
mediated by ppGpp. Metabolomics analysis can provide substantial information on the impact of
this stress response at the biochemical level, in particular during recombinant bioprocesses.
Therefore, two metabolomics-based approaches were applied: metabolic profiling to evaluate the
intracellular metabolic profiles and metabolic footprinting to estimate the profiles of extracellular
metabolites.
In these metabolomics studies two E. coli strains (E. coli W3110 and the isogenic ΔrelA mutant)
were used to investigate the influence of recombinant processes on the host cells’ metabolism,
as well as the main metabolic activities affected by the RelA activity under different growth conditions. The mutant strain presented a “relaxed” phenotype that characterized this bacterial
system by an acute delay in most metabolic adaptations to transient growth conditions. Most
importantly, it was shown that these mutant cells lack metabolic adjustments that are often
observed after metabolic burden phenomena. Nevertheless, this cellular system presented major
advantages in terms of biomass yield and productivity, which imply a remarkable improvement in
recombinant bioprocesses. Thus, alleviating stress responses can be beneficial if they impair the
desired quality and quantity of the recombinant product. However, it must be pointed out that
this may be an alternative as long as recombinant bioprocesses are designed to achieve a finer
balance between strain improvement strategies and culturing conditions.O trabalho realizado no âmbito desta tese teve como principal finalidade a avaliação das
alterações metabólicas relacionadas com a produção de proteínas recombinantes em células
bacterianas de Escherichia coli e a consequente activação de respostas de stress. Foi
evidenciada a resposta restringente promovida pela actividade da enzima RelA, dado ser uma
das principais respostas de stress induzidas pelo decréscimo da quantidade de aminoácidos
disponíveis no meio intracelular como consequência da expressão de proteínas recombinantes.
As diferenças na composição em aminoácidos entre as proteínas da biomassa e recombinantes,
têm sido apontadas como principais causas para o desequilíbrio metabólico que conduz à
exaustão de alguns metabolitos, nomeadamente de aminoácidos.
De modo a explorar estes fenómenos e avaliar o impacto dos processos recombinantes no
metabolismo das células hospedeiras, foi proposto um modelo matemático capaz de identificar
pontos de estrangulamento na rede metabólica. Estes locais correspondem a vias metabólicas
que apresentam limitações na capacidade catalítica e que serão essenciais para compensar o
consumo desproporcionado de aminoácidos levado a cabo pela síntese de proteínas
recombinantes. Associado a este fenómeno foi considerada a descrição da síntese de
nucleótidos guanosina pentafosfato (ppGpp) induzida pela escassez de aminoácidos no meio
intracelular.
O reconhecimento deste nucleótido como um regulador fundamental na transcrição da
informação genética tem sido amplamente descrito e tornou-se evidente que as respostas
celulares controladas pelo ppGpp são determinantes para a sobrevivência e adaptação dos
organismos a condições adversas. Neste sentido, vários estudos foram elaborados para elucidar
o papel do ppGpp no controlo destas respostas de stress e nas alterações fisiológicas que advêm
destes processos, nomeadamente ao nível do metabolismo. A análise do metaboloma, em
comparação com o transcriptoma ou o proteoma, é capaz de capturar de forma mais directa a relação entre as actividades metabólicas e a fisiologia dos organismos, designadamente em
sistema recombinantes.
Neste trabalho foram elaborados alguns estudos em que se aplicaram duas abordagens de
análise metabolómica distintas: profiling metabólico, que se refere à análise do perfil de
metabolitos intracelulares; e footprinting metabólico, que se refere à análise do perfil de
metabolitos extracelulares. Nestes estudos foram usadas duas estirpes de E. coli (W3110 e a
estirpe isogénica com mutação no gene relA) clonadas com um vector de expressão pTRC-His-
AcGP1 que codifica a proteína verde fluorescente AcGFP1, derivada da proteína AcGFP da
Aequorea coerulescens. Foram avaliadas as principais alterações metabólicas provocadas pela
indução da produção de proteína recombinante e pela actividade catalítica da enzima RelA em
diversas condições de crescimento. Comparando os perfis metabólicos das duas estirpes, foram
estimadas várias diferenças significativas que se podem revelar críticas durante processos
recombinantes. A estirpe mutante revelou um comportamento típico de um fenótipo “relaxado”,
que é caracterizado por um retardamento significativo na adaptação do metabolismo a
alterações nas condições de crescimento. Não obstante, a estirpe mutante exibiu melhores
resultados em termos de rendimento em biomassa e produtividade, o que representa uma
vantagem notável para a aplicação destes sistemas bacterianos recombinantes ao nível
industrial. Em resumo, a restrição de respostas de stress pode trazer benefícios se a qualidade e
quantidade do produto estiverem em causa, mas deve salientar-se que não é uma solução
absoluta, sendo que as condições de processamento devem ser também levadas em
consideração na implementação destes bioprocessos
Knowledge representation and ontologies for lipids and lipidomics
Master'sMASTER OF SCIENC
Computational strategies for a system-level understanding of metabolism
Cell metabolism is the biochemical machinery that provides energy and building blocks to sustain life. Understanding its fine regulation is of pivotal relevance in several fields, from metabolic engineering applications to the treatment of metabolic disorders and cancer. Sophisticated computational approaches are needed to unravel the complexity of metabolism. To this aim, a plethora of methods have been developed, yet it is generally hard to identify which computational strategy is most suited for the investigation of a specific aspect of metabolism. This review provides an up-to-date description of the computational methods available for the analysis of metabolic pathways, discussing their main advantages and drawbacks. In particular, attention is devoted to the identification of the appropriate scale and level of accuracy in the reconstruction of metabolic networks, and to the inference of model structure and parameters, especially when dealing with a shortage of experimental measurements. The choice of the proper computational methods to derive in silico data is then addressed, including topological analyses, constraint-based modeling and simulation of the system dynamics. A description of some computational approaches to gain new biological knowledge or to formulate hypotheses is finally provided
A Semantic Framework for Declarative and Procedural Knowledge
In any scientic domain, the full set of data and programs has reached an-ome status, i.e. it has grown massively. The original article on the Semantic Web describes the evolution of a Web of actionable information, i.e.\ud
information derived from data through a semantic theory for interpreting the symbols. In a Semantic Web, methodologies are studied for describing, managing and analyzing both resources (domain knowledge) and applications (operational knowledge) - without any restriction on what and where they\ud
are respectively suitable and available in the Web - as well as for realizing automatic and semantic-driven work\ud
ows of Web applications elaborating Web resources.\ud
This thesis attempts to provide a synthesis among Semantic Web technologies, Ontology Research, Knowledge and Work\ud
ow Management. Such a synthesis is represented by Resourceome, a Web-based framework consisting of two components which strictly interact with each other: an ontology-based and domain-independent knowledge manager system (Resourceome KMS) - relying on a knowledge model where resource and operational knowledge are contextualized in any domain - and a semantic-driven work ow editor, manager and agent-based execution system (Resourceome WMS).\ud
The Resourceome KMS and the Resourceome WMS are exploited in order to realize semantic-driven formulations of work\ud
ows, where activities are semantically linked to any involved resource. In the whole, combining the use of domain ontologies and work ow techniques, Resourceome provides a exible domain and operational knowledge organization, a powerful engine for semantic-driven work\ud
ow composition, and a distributed, automatic and\ud
transparent environment for work ow execution
Biomedical Event Extraction with Machine Learning
Biomedical natural language processing (BioNLP) is a subfield of natural
language processing, an area of computational linguistics concerned with
developing programs that work with natural language: written texts and
speech. Biomedical relation extraction concerns the detection of semantic
relations such as protein-protein interactions (PPI) from scientific texts.
The aim is to enhance information retrieval by detecting relations between
concepts, not just individual concepts as with a keyword search.
In recent years, events have been proposed as a more detailed alternative
for simple pairwise PPI relations. Events provide a systematic, structural
representation for annotating the content of natural language texts. Events
are characterized by annotated trigger words, directed and typed arguments
and the ability to nest other events. For example, the sentence “Protein A
causes protein B to bind protein C” can be annotated with the nested event
structure CAUSE(A, BIND(B, C)). Converted to such formal representations,
the information of natural language texts can be used by computational
applications. Biomedical event annotations were introduced by the
BioInfer and GENIA corpora, and event extraction was popularized by the
BioNLP'09 Shared Task on Event Extraction.
In this thesis we present a method for automated event extraction, implemented
as the Turku Event Extraction System (TEES). A unified graph
format is defined for representing event annotations and the problem of
extracting complex event structures is decomposed into a number of independent
classification tasks. These classification tasks are solved using SVM
and RLS classifiers, utilizing rich feature representations built from full dependency
parsing. Building on earlier work on pairwise relation extraction
and using a generalized graph representation, the resulting TEES system is
capable of detecting binary relations as well as complex event structures.
We show that this event extraction system has good performance, reaching
the first place in the BioNLP'09 Shared Task on Event Extraction.
Subsequently, TEES has achieved several first ranks in the BioNLP'11 and
BioNLP'13 Shared Tasks, as well as shown competitive performance in the
binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared
tasks.
The Turku Event Extraction System is published as a freely available
open-source project, documenting the research in detail as well as making
the method available for practical applications. In particular, in this thesis
we describe the application of the event extraction method to PubMed-scale
text mining, showing how the developed approach not only shows good
performance, but is generalizable and applicable to large-scale real-world
text mining projects.
Finally, we discuss related literature, summarize the contributions of the
work and present some thoughts on future directions for biomedical event
extraction. This thesis includes and builds on six original research publications.
The first of these introduces the analysis of dependency parses that
leads to development of TEES. The entries in the three BioNLP Shared
Tasks, as well as in the DDIExtraction 2011 task are covered in four publications,
and the sixth one demonstrates the application of the system to
PubMed-scale text mining.Siirretty Doriast
Development of a Hepatitis C Virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance
Philosophiae Doctor - PhDTo ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and (ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/. DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance. Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis. HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma (HCC) related genes, comprehensive reviews on HCV biology and drug development, functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers, gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore, eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists.South Afric
Development of a hepatitis C virus knowledgebase with computational prediction of functional hypothesis of therapeutic relevance
Philosophiae Doctor - PhDTo ameliorate Hepatitis C Virus (HCV) therapeutic and diagnostic challenges requires robust intervention strategies, including approaches that leverage the plethora of rich data published in biomedical literature to gain greater understanding of HCV pathobiological mechanisms. The multitudes of metadata originating from HCV clinical trials as well as low and high-throughput experiments embedded in text corpora can be mined as data sources for the implementation of HCV-specific resources. HCV-customized resources may support the generation of worthy and testable hypothesis and reveal potential research clues to augment the pursuit of efficient diagnostic biomarkers and therapeutic targets. This research thesis report the development of two freely available HCV-specific web-based resources: (i) Dragon Exploratory System on Hepatitis C Virus (DESHCV) accessible via http://apps.sanbi.ac.za/DESHCV/ or http://cbrc.kaust.edu.sa/deshcv/ and(ii) Hepatitis C Virus Protein Interaction Database (HCVpro) accessible via http://apps.sanbi.ac.za/hcvpro/ or http://cbrc.kaust.edu.sa/hcvpro/.DESHCV is a text mining system implemented using named concept recognition and cooccurrence based approaches to computationally analyze about 32, 000 HCV related abstracts obtained from PubMed. As part of DESHCV development, the pre-constructed dictionaries of the Dragon Exploratory System (DES) were enriched with HCV biomedical concepts, including HCV proteins, name variants and symbols to enable HCV knowledge specific exploration. The DESHCV query inputs consist of user-defined keywords, phrases and concepts. DESHCV is therefore an information extraction tool that enables users to computationally generate association between concepts and support the prediction of potential hypothesis with diagnostic and therapeutic relevance.Additionally, users can retrieve a list of abstracts containing tagged concepts that can be used to overcome the herculean task of manual biocuration. DESHCV has been used to simulate previously reported thalidomide-chronic hepatitis C hypothesis and also to model a potentially novel thalidomide-amantadine hypothesis.HCVpro is a relational knowledgebase dedicated to housing experimentally detected HCV-HCV and HCV-human protein interaction information obtained from other databases and curated from biomedical journal articles. Additionally, the database contains consolidated biological information consisting of hepatocellular carcinoma(HCC) related genes, comprehensive reviews on HCV biology and drug development,functional genomics and molecular biology data, and cross-referenced links to canonical pathways and other essential biomedical databases. Users can retrieve enriched information including interaction metadata from HCVpro by using protein identifiers,gene chromosomal locations, experiment types used in detecting the interactions, PubMed IDs of journal articles reporting the interactions, annotated protein interaction IDs from external databases, and via “string searches”. The utility of HCVpro has been demonstrated by harnessing integrated data to suggest putative baseline clues that seem to support current diagnostic exploratory efforts directed towards vimentin. Furthermore,eight genes comprising of ACLY, AZGP1, DDX3X, FGG, H19, SIAH1, SERPING1 and THBS1 have been recommended for possible investigation to evaluate their diagnostic potential. The data archived in HCVpro can be utilized to support protein-protein interaction network-based candidate HCC gene prioritization for possible validation by experimental biologists