357 research outputs found
Towards Interoperability in E-health Systems: a three-dimensional approach based on standards and semantics
Proceedings of: HEALTHINF 2009 (International Conference on Helath Informatics), Porto (Portugal), January 14-17, 2009, is part of BIOSTEC (Intemational Joint Conference on Biomedical Engineering Systems and Technologies)The interoperability problem in eHealth can only be addressed by mean of combining standards and technology. However, these alone do not suffice. An appropiate framework that articulates such combination is required. In this paper, we adopt a three-dimensional (information, conference and inference) approach for such framework, based on OWL as formal language for terminological and ontological health resources, SNOMED CT as lexical backbone for all such resources, and the standard CEN 13606 for representing EHRs. Based on tha framewok, we propose a novel form for creating and supporting networks of clinical terminologies. Additionally, we propose a number of software modules to semantically process and exploit EHRs, including NLP-based search and inference, wich can support medical applications in heterogeneous and distributed eHealth systems.This work has been funded as part of the Spanish nationally funded projects ISSE (FIT-350300-2007-75) and CISEP (FIT-350301-2007-18). We also acknowledge IST-2005-027595 EU project NeO
Blueprint: descrição da complexidade da regulação metabólica através da reconstrução de modelos metabólicos e regulatórios integrados
Tese de doutoramento em Biomedical EngineeringUm modelo metabólico consegue prever o fenótipo de um organismo. No entanto, estes modelos
podem obter previsões incorretas, pois alguns processos metabólicos são controlados por mecanismos
reguladores. Assim, várias metodologias foram desenvolvidas para melhorar os modelos metabólicos
através da integração de redes regulatórias. Todavia, a reconstrução de modelos regulatórios e metabólicos à escala genómica para diversos organismos apresenta diversos desafios.
Neste trabalho, propõe-se o desenvolvimento de diversas ferramentas para a reconstrução e análise
de modelos metabólicos e regulatórios à escala genómica. Em primeiro lugar, descreve-se o Biological
networks constraint-based In Silico Optimization (BioISO), uma nova ferramenta para auxiliar a curação
manual de modelos metabólicos. O BioISO usa um algoritmo de relação recursiva para orientar as previsões de fenótipo. Assim, esta ferramenta pode reduzir o número de artefatos em modelos metabólicos,
diminuindo a possibilidade de obter erros durante a fase de curação.
Na segunda parte deste trabalho, desenvolveu-se um repositório de redes regulatórias para procariontes que permite suportar a sua integração em modelos metabólicos. O Prokaryotic Transcriptional
Regulatory Network Database (ProTReND) inclui diversas ferramentas para extrair e processar informação regulatória de recursos externos. Esta ferramenta contém um sistema de integração de dados que
converte dados dispersos de regulação em redes regulatórias integradas. Além disso, o ProTReND dispõe
de uma aplicação que permite o acesso total aos dados regulatórios.
Finalmente, desenvolveu-se uma ferramenta computacional no MEWpy para simular e analisar modelos regulatórios e metabólicos. Esta ferramenta permite ler um modelo metabólico e/ou rede regulatória,
em diversos formatos. Esta estrutura consegue construir um modelo regulatório e metabólico integrado
usando as interações regulatórias e as ligações entre genes e proteínas codificadas no modelo metabólico e na rede regulatória. Além disso, esta estrutura suporta vários métodos de previsão de fenótipo
implementados especificamente para a análise de modelos regulatórios-metabólicos.Genome-Scale Metabolic (GEM) models can predict the phenotypic behavior of organisms. However,
these models can lead to incorrect predictions, as certain metabolic processes are controlled by regulatory
mechanisms. Accordingly, many methodologies have been developed to extend the reconstruction and
analysis of GEM models via the integration of Transcriptional Regulatory Network (TRN)s. Nevertheless,
the perspective of reconstructing integrated genome-scale regulatory and metabolic models for diverse
prokaryotes is still an open challenge.
In this work, we propose several tools to assist the reconstruction and analysis of regulatory and
metabolic models. We start by describing BioISO, a novel tool to assist the manual curation of GEM
models. BioISO uses a recursive relation-like algorithm and Flux Balance Analysis (FBA) to evaluate and
guide debugging of in silico phenotype predictions. Hence, this tool can reduce the number of artifacts in
GEM models, decreasing the burdens of model refinement and curation.
A state-of-the-art repository of TRNs for prokaryotes was implemented to support the reconstruction
and integration of TRNs into GEM models. The ProTReND repository comprehends several tools to extract
and process regulatory information available in several resources. More importantly, this repository contains a data integration system to unify the regulatory data into standardized TRNs at the genome scale.
In addition, ProTReND contains a web application with full access to the regulatory data.
Finally, we have developed a new modeling framework to define, simulate and analyze GEnome-scale
Regulatory and Metabolic (GERM) models in MEWpy. The GERM model framework can read a GEM
model, as well as a TRN from different file formats. This framework assembles a GERM model using
the regulatory interactions and Genes-Proteins-Reactions (GPR) rules encoded into the GEM model and
TRN. In addition, this modeling framework supports several methods of phenotype prediction designed
for regulatory-metabolic models.I would like to thank Fundação para a Ciência e Tecnologia for the Ph.D. studentship I was awarded
with (SFRH/BD/139198/2018)
Availability and Preservation of Scholarly Digital Resources
The dynamic, decentralized world-wide-web has become an essential part of scientific research and communication, representing a relatively new medium for the conveyance of scientific thought and discovery. Researchers create thousands of web sites every year to share software, data and services. Unlike books and journals, however, the preservation systems are not yet mature. This carries implications that go to the core of science: the ability to examine another\u27s sources to understand and reproduce their work. These valuable resources have been documented as disappearing over time in several subject areas. This dissertation examines the problem by performing a crossdisciplinary investigation, testing the effectiveness of existing remedies and introducing new ones. As part of the investigation, 14,489 unique web pages found in the abstracts within Thomson Reuters’ Web of Science citation index were accessed. The median lifespan of these web pages was found to be 9.3 years with 62% of them being archived. Survival analysis and logistic regression identified significant predictors of URL lifespan and included the year a URL was published, the number of times it was cited, its depth as well as its domain. Statistical analysis revealed biases in current static web-page solutions
A framework to extract biomedical knowledge from gluten-related tweets: the case of dietary concerns in digital era
Journal pre proofBig data importance and potential are becoming more and more relevant nowadays, enhanced by the explosive growth of information volume that is being generated on the Internet in the last years. In this sense, many experts agree that social media networks are one of the internet areas with higher growth in recent years and one of the fields that are expected to have a more significant increment in the coming years. Similarly, social media sites are quickly becoming one of the most popular platforms to discuss health issues and exchange social support with others. In this context, this work presents a new methodology to process, classify, visualise and analyse the big data knowledge produced by the sociome on social media platforms. This work proposes a methodology that combines natural language processing techniques, ontology-based named entity recognition methods, machine learning algorithms and graph mining techniques to: (i) reduce the irrelevant messages by identifying and focusing the analysis only on individuals and patient experiences from the public discussion; (ii) reduce the lexical noise produced by the different ways in how users express themselves through the use of domain ontologies; (iii) infer the demographic data of the individuals through the combined analysis of textual, geographical and visual profile information; (iv) perform a community detection and evaluate the health topic study combining the semantic processing of the public discourse with knowledge graph representation techniques; and (v) gain information about the shared resources combining the social media statistics with the semantical analysis of the web contents. The practical relevance of the proposed methodology has been proven in the study of 1.1 million unique messages from more than 400,000 distinct users related to one of the most popular dietary fads that evolve into a multibillion-dollar industry, i.e., gluten-free food. Besides, this work analysed one of the least research fields studied on Twitter concerning public health (i.e., the allergies or immunology diseases as celiac disease), discovering a wide range of health-related conclusions.SING group thanks CITI (Centro de Investigacion, Transferencia e Innovacion) from the University of Vigo for hosting its IT infrastructure. This work was supported by: the Associate Laboratory for Green Chemistry-LAQV, which is financed by national funds from and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of [UIDB/50006/2020] and [UIDB/04469/2020] units, and BioTecNorte operation [NORTE010145FEDER000004] funded by the European Regional Development Fund under the scope of Norte2020Programa Operacional Regional do Norte, the Xunta de Galicia (Centro singular de investigacion de Galicia accreditation 2019-2022) and the European Union (European Regional Development Fund - ERDF)- Ref. [ED431G2019/06] , and Conselleria de Educacion, Universidades e Formacion Profesional (Xunta de Galicia) under the scope of the strategic funding of [ED431C2018/55GRC] Competitive Reference Group. The authors also acknowledge the post-doctoral fellowship [ED481B2019032] of Martin PerezPerez, funded by the Xunta de Galicia. Funding for open access charge: Universidade de Vigo/CISUGinfo:eu-repo/semantics/publishedVersio
Composição de serviços para aplicações biomédicas
Doutoramento em Engenharia InformáticaA exigente inovação na área das aplicações biomédicas tem guiado a evolução
das tecnologias de informação nas últimas décadas. Os desafios associados a
uma gestão, integração, análise e interpretação eficientes dos dados
provenientes das mais modernas tecnologias de hardware e software
requerem um esforço concertado. Desde hardware para sequenciação de
genes a registos electrónicos de paciente, passando por pesquisa de
fármacos, a possibilidade de explorar com precisão os dados destes
ambientes é vital para a compreensão da saúde humana. Esta tese engloba a
discussão e o desenvolvimento de melhores estratégias informáticas para
ultrapassar estes desafios, principalmente no contexto da composição de
serviços, incluindo técnicas flexíveis de integração de dados, como
warehousing ou federação, e técnicas avançadas de interoperabilidade, como
serviços web ou LinkedData.
A composição de serviços é apresentada como um ideal genérico, direcionado
para a integração de dados e para a interoperabilidade de software.
Relativamente a esta última, esta investigação debruçou-se sobre o campo da
farmacovigilância, no contexto do projeto Europeu EU-ADR. As contribuições
para este projeto, um novo standard de interoperabilidade e um motor de
execução de workflows, sustentam a sucesso da EU-ADR Web Platform, uma
plataforma para realizar estudos avançados de farmacovigilância. No contexto
do projeto Europeu GEN2PHEN, esta investigação visou ultrapassar os
desafios associados à integração de dados distribuídos e heterogéneos no
campo do varíoma humano. Foi criada uma nova solução, WAVe - Web
Analyses of the Variome, que fornece uma coleção rica de dados de variação
genética através de uma interface Web inovadora e de uma API avançada. O
desenvolvimento destas estratégias evidenciou duas oportunidades claras na
área de software biomédico: melhorar o processo de implementação de
software através do recurso a técnicas de desenvolvimento rápidas e
aperfeiçoar a qualidade e disponibilidade dos dados através da adopção do
paradigma de web semântica.
A plataforma COEUS atravessa as fronteiras de integração e
interoperabilidade, fornecendo metodologias para a aquisição e tradução
flexíveis de dados, bem como uma camada de serviços interoperáveis para
explorar semanticamente os dados agregados. Combinando as técnicas de
desenvolvimento rápidas com a riqueza da perspectiva "Semantic Web in a
box", a plataforma COEUS é uma aproximação pioneira, permitindo o
desenvolvimento da próxima geração de aplicações biomédicas.The demand for innovation in the biomedical software domain has been an
information technologies evolution driver over the last decades. The challenges
associated with the effective management, integration, analyses and
interpretation of the wealth of life sciences information stemming from modern
hardware and software technologies require concerted efforts. From gene
sequencing hardware to pharmacology research up to patient electronic health
records, the ability to accurately explore data from these environments is vital
to further improve our understanding of human health. This thesis encloses the
discussion on building better informatics strategies to address these
challenges, primarily in the context of service composition, including
warehousing and federation strategies for resource integration, as well as web
services or LinkedData for software interoperability.
Service composition is introduced as a general principle, geared towards data
integration and software interoperability. Concerning the latter, this research
covers the service composition requirements within the pharmacovigilance
field, namely on the European EU-ADR project. The contributions to this area,
the definition of a new interoperability standard and the creation of a new
workflow-wrapping engine, are behind the successful construction of the EUADR
Web Platform, a workspace for delivering advanced pharmacovigilance
studies. In the context of the European GEN2PHEN project, this research
tackles the challenges associated with the integration of heterogeneous and
distributed data in the human variome field. For this matter, a new lightweight
solution was created: WAVe, Web Analysis of the Variome, provides a rich
collection of genetic variation data through an innovative portal and an
advanced API. The development of the strategies underlying these products
highlighted clear opportunities in the biomedical software field: enhancing the
software implementation process with rapid application development
approaches and improving the quality and availability of data with the adoption
of the Semantic Web paradigm.
COEUS crosses the boundaries of integration and interoperability as it provides
a framework for the flexible acquisition and translation of data into a semantic
knowledge base, as well as a comprehensive set of interoperability services,
from REST to LinkedData, to fully exploit gathered data semantically. By
combining the lightness of rapid application development strategies with the
richness of its "Semantic Web in a box" approach, COEUS is a pioneering
framework to enhance the development of the next generation of biomedical
applications
The Pharmacoepigenomics Informatics Pipeline and H-GREEN Hi-C Compiler: Discovering Pharmacogenomic Variants and Pathways with the Epigenome and Spatial Genome
Over the last decade, biomedical science has been transformed by the epigenome and spatial genome, but the discipline of pharmacogenomics, the study of the genetic underpinnings of pharmacological phenotypes like drug response and adverse events, has not. Scientists have begun to use omics atlases of increasing depth, and inferences relating to the bidirectional causal relationship between the spatial epigenome and gene expression, as a foundational underpinning for genetics research. The epigenome and spatial genome are increasingly used to discover causative regulatory variants in the significance regions of genome-wide association studies, for the discovery of the biological mechanisms underlying these phenotypes and the design of genetic tests to predict them. Such variants often have more predictive power than coding variants, but in the area of pharmacogenomics, such advances have been radically underapplied. The majority of pharmacogenomics tests are designed manually on the basis of mechanistic work with coding variants in candidate genes, and where genome wide approaches are used, they are typically not interpreted with the epigenome.
This work describes a series of analyses of pharmacogenomics association studies with the tools and datasets of the epigenome and spatial genome, undertaken with the intent of discovering causative regulatory variants to enable new genetic tests. It describes the potent regulatory variants discovered thereby to have a putative causative and predictive role in a number of medically important phenotypes, including analgesia and the treatment of depression, bipolar disorder, and traumatic brain injury with opiates, anxiolytics, antidepressants, lithium, and valproate, and in particular the tendency for such variants to cluster into spatially interacting, conceptually unified pathways which offer mechanistic insight into these phenotypes.
It describes the Pharmacoepigenomics Informatics Pipeline (PIP), an integrative multiple omics variant discovery pipeline designed to make this kind of analysis easier and cheaper to perform, more reproducible, and amenable to the addition of advanced features. It described the successes of the PIP in rediscovering manually discovered gene networks for lithium response, as well as discovering a previously unknown genetic basis for warfarin response in anticoagulation therapy.
It describes the H-GREEN Hi-C compiler, which was designed to analyze spatial genome data and discover the distant target genes of such regulatory variants, and its success in discovering spatial contacts not detectable by preceding methods and using them to build spatial contact networks that unite disparate TADs with phenotypic relationships.
It describes a potential featureset of a future pipeline, using the latest epigenome research and the lessons of the previous pipeline. It describes my thinking about how to use the output of a multiple omics variant pipeline to design genetic tests that also incorporate clinical data. And it concludes by describing a long term vision for a comprehensive pharmacophenomic atlas, to be constructed by applying a variant pipeline and machine learning test design system, such as is described, to thousands of phenotypes in parallel.
Scientists struggled to assay genotypes for the better part of a century, and in the last twenty years, succeeded. The struggle to predict phenotypes on the basis of the genotypes we assay remains ongoing. The use of multiple omics variant pipelines and machine learning models with omics atlases, genetic association, and medical records data will be an increasingly significant part of that struggle for the foreseeable future.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/145835/1/ariallyn_1.pd
Developing the MAR databases – Augmenting Genomic Versatility of Sequenced Marine Microbiota
This thesis introduces the MAR databases as marine-specific resources in the genomic landscape. Paper 1 describes the curation effort and development leading to the MAR databases being created. It results in the highly valued reference database MarRef, the broader MarDB, and the marine gene catalog MarCat. Definition of a marine environment, the curation process, and the Marine Metagenomics Portal as a public web-service are described. It facilitates scientists to find marine sequence data for prokaryotes and to explore rich contextual information, secondary metabolites, updated taxonomy, and helps in evaluating genome quality. Many of these database advancements are covered in Paper 2. This includes new entries and development of specific databases on marine fungi (MarFun) and salmon related prokaryotes (SalDB). With the implementation of metagenome assembled and single amplified genomes it leads up to the database quality evaluation discussed in Paper 3. The lack of quality control in primary databases is here discussed based on estimated completeness and contamination in the genomes of the MAR databases.
Paper 4 explores the microbiota of skin and gut mucosa of Atlantic salmon. By using a database dependent amplicon analysis, the full-length 16 rRNA gene proved accurate, but not a game-changer in taxonomic classification for this environmental niche. The proportion of dataset sequences lacking clear taxonomic classification suggests lack of diversity in current-day databases and inadequate phylogenetic resolution. Advancing phylogenetic resolution was the subject of Paper 5. Here the highly similar species of genus Aliivibrio became delineated using six genes in a multilocus sequence analysis. Five potentially novel species could in this way be delineated, which coincided with recent genome-wide taxonomy listings. Thus, Paper 4 and 5 parallel those of the MAR databases by providing insight into the inter-relational framework of bioinformatic analysis and marine database sources
Image analysis platforms for exploring genetic and neuronal mechanisms regulating animal behavior
An important aim of neuroscience is to understand how gene interactions and neuronal networks regulate animal behavior. The larvae of the marine annelid Platynereis dumerilii provide a convenient system for such integrative studies. These larvae exhibit a wide range of behaviors, including phototaxis, chemotaxis and gravitaxis and at the same time exhibit relatively simple nervous system organization. Due to its small size and transparent body, the Platynereis larva is compatible with whole-body light microscopic imaging following tissue staining protocols. It is also suitable for serial electron microscopic imaging and subsequent neuronal connectome reconstruction. Despite advances in imaging techniques, automated computational tools for large data analysis are not well-established in Platynereis. In the current work, I developed image analysis software for exploring genetic and nervous system mechanisms modulating Platynereis behavior.
Exploring gene expression patterns
Current labeling and imaging techniques restrict the number of gene expression patterns that can be labelled and visualized in a single specimen, which hinders the study of behaviors driven by multi-molecular interactions. To address this problem, I employed image registration to generate a gene expression atlas that integrates gene expression information from multiple specimens in a common reference space. The gene expression atlas was used to investigate mechanisms regulating larval locomotion, settlement and phototaxis in Platynereis. The atlas can assist in the identification of inter-individual and inter-species variations in gene expression. To provide a representation convenient for exploring gene expression patterns, I created a model of the atlas using 3D graphics software, which enabled convenient data visualization and efficient data storage and sharing.
Exploring neuronal networks regulating behavior
Neuronal circuitry can be reconstructed from the images obtained from electron microscopy, which resolves very fine structures such as neuron morphology or synapses. The amount of data resulting from electron microscopy and the complexity of neuronal networks represent a
significant challenge for manual analysis. To solve this problem, I developed the NeuroDetective software, which models a neuronal circuitry and analyzes the information flow within it. The software combines the advantages of 3D visualization and graph analysis software by integrating neuron morphology and spatial distribution together with synaptic connectivity. NeuroDetective allowed studying the neuronal circuitry responsible for phototaxis in Platynereis larvae, revealing the connections and the neurons important for the network functionality. NeuroDetective facilitated the establishment of a relationship between the function and the structure of the neuronal circuitry in Platynereis phototaxis.
Integrating gene expression patterns with neuronal connectivity
Neuronal circuitry and its associated modulating biomolecules, such as neurotransmitters and neuropeptides, are thought to be the main factors regulating animal behavior. Therefore it was important to integrate both genetic and neuronal information in order to fully understand how biomolecules in conjunction with neuronal anatomy elicit certain animal behavior. To resolve the difference in specimen preparation for gene expression versus electron microscopy preparations, I developed an image registration procedure to match the signals from these two different datasets. This method enabled the integration the spatial distribution of specific modulators into the analysis of neuronal networks, leading to an improved understanding of the genetic and neuronal mechanisms modulating behavior in Platynereis
- …