17 research outputs found
Foreword
The aim of this Workshop is to focus on building and evaluating resources used to facilitate biomedical text mining, including their design, update, delivery, quality assessment, evaluation and dissemination. Key resources of interest are lexical and knowledge repositories (controlled vocabularies, terminologies, thesauri, ontologies) and annotated corpora, including both task-specific resources and repositories reengineered from biomedical or general language resources. Of particular interest is the process of building annotated resources, including designing guidelines and annotation schemas (aiming at both syntactic and semantic interoperability) and relying on language engineering standards. Challenging aspects are updates and evolution management of resources, as well as their documentation, dissemination and evaluation
Cloud-Based Benchmarking of Medical Image Analysis
Medical imagin
Conceptualization of Computational Modeling Approaches and Interpretation of the Role of Neuroimaging Indices in Pathomechanisms for Pre-Clinical Detection of Alzheimer Disease
With swift advancements in next-generation sequencing technologies alongside the voluminous growth of biological data, a diversity of various data resources such as databases and web services have been created to facilitate data management, accessibility, and analysis. However, the burden of interoperability between dynamically growing data resources is an increasingly rate-limiting step in biomedicine, specifically concerning neurodegeneration. Over the years, massive investments and technological advancements for dementia research have resulted in large proportions of unmined data. Accordingly, there is an essential need for intelligent as well as integrative approaches to mine available data and substantiate novel research outcomes. Semantic frameworks provide a unique possibility to integrate multiple heterogeneous, high-resolution data resources with semantic integrity using standardized ontologies and vocabularies for context- specific domains. In this current work, (i) the functionality of a semantically structured terminology for mining pathway relevant knowledge from the literature, called Pathway Terminology System, is demonstrated and (ii) a context-specific high granularity semantic framework for neurodegenerative diseases, known as NeuroRDF, is presented. Neurodegenerative disorders are especially complex as they are characterized by widespread manifestations and the potential for dramatic alterations in disease progression over time. Early detection and prediction strategies through clinical pointers can provide promising solutions for effective treatment of AD. In the current work, we have presented the importance of bridging the gap between clinical and molecular biomarkers to effectively contribute to dementia research. Moreover, we address the need for a formalized framework called NIFT to automatically mine relevant clinical knowledge from the literature for substantiating high-resolution cause-and-effect models
Recuperação de informação multimodal em repositórios de imagem médica
The proliferation of digital medical imaging modalities in hospitals and other
diagnostic facilities has created huge repositories of valuable data, often
not fully explored. Moreover, the past few years show a growing trend
of data production. As such, studying new ways to index, process and
retrieve medical images becomes an important subject to be addressed by
the wider community of radiologists, scientists and engineers. Content-based
image retrieval, which encompasses various methods, can exploit the visual
information of a medical imaging archive, and is known to be beneficial to
practitioners and researchers. However, the integration of the latest systems
for medical image retrieval into clinical workflows is still rare, and their
effectiveness still show room for improvement.
This thesis proposes solutions and methods for multimodal information
retrieval, in the context of medical imaging repositories. The major
contributions are a search engine for medical imaging studies supporting
multimodal queries in an extensible archive; a framework for automated
labeling of medical images for content discovery; and an assessment and
proposal of feature learning techniques for concept detection from medical
images, exhibiting greater potential than feature extraction algorithms that
were pertinently used in similar tasks. These contributions, each in their
own dimension, seek to narrow the scientific and technical gap towards
the development and adoption of novel multimodal medical image retrieval
systems, to ultimately become part of the workflows of medical practitioners,
teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais,
clínicas e outros centros de diagnóstico, levou à criação de enormes
repositórios de dados, frequentemente não explorados na sua totalidade.
Além disso, os últimos anos revelam, claramente, uma tendência para o
crescimento da produção de dados. Portanto, torna-se importante estudar
novas maneiras de indexar, processar e recuperar imagens médicas, por
parte da comunidade alargada de radiologistas, cientistas e engenheiros. A
recuperação de imagens baseada em conteúdo, que envolve uma grande
variedade de métodos, permite a exploração da informação visual num
arquivo de imagem médica, o que traz benefícios para os médicos e
investigadores. Contudo, a integração destas soluções nos fluxos de trabalho
é ainda rara e a eficácia dos mais recentes sistemas de recuperação de
imagem médica pode ser melhorada.
A presente tese propõe soluções e métodos para recuperação de informação
multimodal, no contexto de repositórios de imagem médica. As contribuições
principais são as seguintes: um motor de pesquisa para estudos de imagem
médica com suporte a pesquisas multimodais num arquivo extensível; uma
estrutura para a anotação automática de imagens; e uma avaliação e
proposta de técnicas de representation learning para deteção automática de
conceitos em imagens médicas, exibindo maior potencial do que as técnicas
de extração de features visuais outrora pertinentes em tarefas semelhantes.
Estas contribuições procuram reduzir as dificuldades técnicas e científicas
para o desenvolvimento e adoção de sistemas modernos de recuperação de
imagem médica multimodal, de modo a que estes façam finalmente parte
das ferramentas típicas dos profissionais, professores e investigadores da área
da saúde.Programa Doutoral em Informátic
Arquiteturas federadas para integração de dados biomédicos
Doutoramento Ciências da ComputaçãoThe last decades have been characterized by a continuous adoption of
IT solutions in the healthcare sector, which resulted in the proliferation
of tremendous amounts of data over heterogeneous systems. Distinct
data types are currently generated, manipulated, and stored, in the
several institutions where patients are treated. The data sharing and an
integrated access to this information will allow extracting relevant
knowledge that can lead to better diagnostics and treatments.
This thesis proposes new integration models for gathering information
and extracting knowledge from multiple and heterogeneous biomedical
sources.
The scenario complexity led us to split the integration problem according
to the data type and to the usage specificity. The first contribution is a
cloud-based architecture for exchanging medical imaging services. It
offers a simplified registration mechanism for providers and services,
promotes remote data access, and facilitates the integration of
distributed data sources. Moreover, it is compliant with international
standards, ensuring the platform interoperability with current medical
imaging devices. The second proposal is a sensor-based architecture
for integration of electronic health records. It follows a federated
integration model and aims to provide a scalable solution to search and
retrieve data from multiple information systems. The last contribution is
an open architecture for gathering patient-level data from disperse and
heterogeneous databases. All the proposed solutions were deployed
and validated in real world use cases.A adoção sucessiva das tecnologias de comunicação e de informação
na área da saúde tem permitido um aumento na diversidade e na
qualidade dos serviços prestados, mas, ao mesmo tempo, tem gerado
uma enorme quantidade de dados, cujo valor científico está ainda por
explorar. A partilha e o acesso integrado a esta informação poderá
permitir a identificação de novas descobertas que possam conduzir a
melhores diagnósticos e a melhores tratamentos clínicos.
Esta tese propõe novos modelos de integração e de exploração de
dados com vista à extração de conhecimento biomédico a partir de
múltiplas fontes de dados.
A primeira contribuição é uma arquitetura baseada em nuvem para
partilha de serviços de imagem médica. Esta solução oferece um
mecanismo de registo simplificado para fornecedores e serviços,
permitindo o acesso remoto e facilitando a integração de diferentes
fontes de dados. A segunda proposta é uma arquitetura baseada em
sensores para integração de registos electrónicos de pacientes. Esta
estratégia segue um modelo de integração federado e tem como
objetivo fornecer uma solução escalável que permita a pesquisa em
múltiplos sistemas de informação. Finalmente, o terceiro contributo é
um sistema aberto para disponibilizar dados de pacientes num contexto
europeu. Todas as soluções foram implementadas e validadas em
cenários reais
Development of a text mining approach to disease network discovery
Scientific literature is one of the major sources of knowledge for systems biology, in the form of papers, patents and other types of written reports. Text mining methods aim at automatically extracting relevant information from the literature. The hypothesis of this thesis was that biological systems could be elucidated by the development of text mining solutions that can automatically extract relevant information from documents. The first objective consisted in developing software components to recognize biomedical entities in text, which is the first step to generate a network about a biological system. To this end, a machine learning solution was developed, which can be trained for specific biological entities using an annotated dataset, obtaining high-quality results. Additionally, a rule-based solution was developed, which can be easily adapted to various types of entities.
The second objective consisted in developing an automatic approach to link the recognized entities to a reference knowledge base. A solution based on the PageRank algorithm was developed in order to match the entities to the concepts that most contribute to the overall coherence.
The third objective consisted in automatically extracting relations between entities, to generate knowledge graphs about biological systems. Due to the lack of annotated datasets available for this task, distant supervision was employed to train a relation classifier on a corpus of documents and a knowledge base. The applicability of this approach was demonstrated in two case studies: microRNAgene relations for cystic fibrosis, obtaining a network of 27 relations using the abstracts of 51 recently published papers; and cell-cytokine relations for tolerogenic cell therapies, obtaining a network of 647 relations from 3264 abstracts.
Through a manual evaluation, the information contained in these networks was determined to be relevant. Additionally, a solution combining deep learning techniques with ontology information was developed, to take advantage of the domain knowledge provided by ontologies.
This thesis contributed with several solutions that demonstrate the usefulness of text mining methods to systems biology by extracting domain-specific information from the literature. These solutions make it easier to integrate various areas of research, leading to a better understanding of biological systems
An Autoethnographic Account of Innovation at the US Department of Veterans Affairs
The history of the U.S. Department of Veterans Affairs (VA) health information technology (HIT) has been characterized by both enormous successes and catastrophic failures. While the VA was once hailed as the way to the future of twenty-first-century health care, many programs have been mismanaged, delayed, or flawed, resulting in the waste of hundreds of millions of taxpayer dollars. Since 2015 the U.S. Government Accountability Office (GAO) has designated HIT at the VA as being susceptible to waste, fraud, and mismanagement. The timely central research question I ask in this study is, can healthcare IT at the VA be healed? To address this question, I investigate a HIT case study at the VA Center of Innovation (VACI), originally designed to be the flagship initiative of the open government transformation at the VA. The Open Source Electronic Health Record Alliance (OSEHRA) was designed to promote the open innovation ecosystem public-private-academic partnership. Based on my fifteen years of experience at the VA, I use an autoethnographic methodology to make a significant value-added contribution to understanding and modeling the VA’s approach to innovation. I use several theoretical information system framework models including People, Process, and Technology (PPT), Technology, Organization and Environment (TOE), and Technology Adaptive Model (TAM) and propose a new adaptive theory to understand the inability of VA HIT to innovate. From the perspective of people and culture, I study retaliation against whistleblowers, organization behavioral integrity, and lack of transparency in communications. I examine the VA processes, including the different software development methodologies used, the development and operations process (DevOps) of an open-source application developed at VACI, the Radiology Protocol Tool Recorder (RAPTOR), a Veterans Health Information Systems and Technology Architecture (VistA) radiology workflow module. I find that the VA has chosen to migrate away from inhouse application software and buy commercial software. The impact of these People, Process, and Technology findings are representative of larger systemic failings and are appropriate examples to illustrate systemic issues associated with IT innovation at the VA. This autoethnographic account builds on first-hand project experience and literature-based insights
A Life Cycle Approach to the Development and Validation of an Ontology of the U.S. Common Rule (45 C.F.R. § 46)
Requirements for the protection of human research subjects stem from directly from federal regulation by the Department of Health and Human Services in Title 45 of the Code of Federal Regulations (C.F.R.) part 46. 15 other federal agencies include subpart A of part 46 verbatim in their own body of regulation. Hence 45 C.F.R. part 46 subpart A has come to be called colloquially the ‘Common Rule.’
Overall motivation for this study began as a desire to facilitate the ethical sharing of biospecimen samples from large biospecimen collections by using ontologies. Previous work demonstrated that in general the informed consent process and subsequent decision making about data and specimen release still relies heavily on paper-based informed consent forms and processes. Consequently, well-validated computable models are needed to provide an enhanced foundation for data sharing.
This dissertation describes the development and validation of a Common Rule Ontology (CRO), expressed in the OWL-2 Web Ontology Language, and is intended to provide a computable semantic knowledge model for assessing and representing components of the information artifacts of required as part of regulated research under 45 C.F.R. § 46. I examine if the alignment of this ontology with the Basic Formal Ontology and other ontologies from the Open Biomedical Ontology (OBO) Foundry provide a good fit for the regulatory aspects of the Common Rule Ontology.
The dissertation also examines and proposes a new method for ongoing evaluation of ontology such as CRO across the ontology development lifecycle and suggest methods to achieve high quality, validated ontologies.
While the CRO is not in itself intended to be a complete solution to the data and specimen sharing problems outlined above, it is intended to produce a well-validated computationally grounded framework upon which others can build. This model can be used in future work to build decision support systems to assist Institutional Review Boards (IRBs), regulatory personnel, honest brokers, tissue bank managers, and other individuals in the decision-making process involving biorepository specimen and data sharing