3,629 research outputs found
A chemical specialty semantic network for the Unified Medical Language System
Background Terms representing chemical concepts found the Unified Medical Language System (UMLS) are used to derive an expanded semantic network with mutually exclusive semantic types. The UMLS Semantic Network (SN) is composed of a collection of broad categories called semantic types (STs) that are assigned to concepts. Within the UMLS’s coverage of the chemical domain, we find a great deal of concepts being assigned more than one ST. This leads to the situation where the extent of a given ST may contain concepts elaborating variegated semantics. A methodology for expanding the chemical subhierarchy of the SN into a finer-grained categorization of mutually exclusive types with semantically uniform extents is presented. We call this network a Chemical Specialty Semantic Network (CSSN). A CSSN is derived automatically from the existing chemical STs and their assignments. The methodology incorporates a threshold value governing the minimum size of a type’s extent needed for inclusion in the CSSN. Thus, different CSSNs can be created by choosing different threshold values based on varying requirements. Results A complete CSSN is derived using a threshold value of 300 and having 68 STs. It is used effectively to provide high-level categorizations for a random sample of compounds from the “Chemical Entities of Biological Interest” (ChEBI) ontology. The effect on the size of the CSSN using various threshold parameter values between one and 500 is shown. Conclusions The methodology has several potential applications, including its use to derive a pre-coordinated guide for ST assignments to new UMLS chemical concepts, as a tool for auditing existing concepts, inter-terminology mapping, and to serve as an upper-level network for ChEBI
An interactive retrieval system for clinical trial studies with context-dependent protocol elements.
A well-defined protocol for a clinical trial guarantees a successful outcome report. When designing the protocol, most researchers refer to electronic databases and extract protocol elements using a keyword search. However, state-of-the-art database systems only offer text-based searches for user-entered keywords. In this study, we present a database system with a context-dependent and protocol-element-selection function for successfully designing a clinical trial protocol. To do this, we first introduce a database for a protocol retrieval system constructed from individual protocol data extracted from 184,634 clinical trials and 13,210 frame structures of clinical trial protocols. The database contains a variety of semantic information that allows the filtering of protocols during the search operation. Based on the database, we developed a web application called the clinical trial protocol database system (CLIPS; available at https://corus.kaist.edu/clips). This system enables an interactive search by utilizing protocol elements. To enable an interactive search for combinations of protocol elements, CLIPS provides optional next element selection according to the previous element in the form of a connected tree. The validation results show that our method achieves better performance than that of existing databases in predicting phenotypic features
Using structural and semantic methodologies to enhance biomedical terminologies
Biomedical terminologies and ontologies underlie various Health Information Systems (HISs), Electronic Health Record (EHR) Systems, Health Information Exchanges (HIEs) and health administrative systems. Moreover, the proliferation of interdisciplinary research efforts in the biomedical field is fueling the need to overcome terminological barriers when integrating knowledge from different fields into a unified research project. Therefore well-developed and well-maintained terminologies are in high demand. Most of the biomedical terminologies are large and complex, which makes it impossible for human experts to manually detect and correct all errors and inconsistencies. Automated and semi-automated Quality Assurance methodologies that focus on areas that are more likely to contain errors and inconsistencies are therefore important.
In this dissertation, structural and semantic methodologies are used to enhance biomedical terminologies. The dissertation work is divided into three major parts. The first part consists of structural auditing techniques for the Semantic Network of the Unified Medical Language System (UMLS), which serves as a vocabulary knowledge base for biomedical research in various applications. Research techniques are presented on how to automatically identify and prevent erroneous semantic type assignments to concepts. The Web-based adviseEditor system is introduced to help UMLS editors to make correct multiple semantic type assignments to concepts. It is made available to the National Library of Medicine for future use in maintaining the UMLS.
The second part of this dissertation is on how to enhance the conceptual content of SNOMED CT by methods of semantic harmonization. By 2015, SNOMED will become the standard terminology for EH R encoding of diagnoses and problem lists. In order to enrich the semantics and coverage of SNOMED CT for clinical and research applications, the problem of semantic harmonization between SNOMED CT and six reference terminologies is approached by 1) comparing the vertical density of SNOM ED CT with the reference terminologies to find potential concepts for export and import; and 2) categorizing the relationships between structurally congruent concepts from pairs of terminologies, with SNOMED CT being one terminology in the pair. Six kinds of configurations are observed, e.g., alternative classifications, and suggested synonyms. For each configuration, a corresponding solution is presented for enhancing one or both of the terminologies.
The third part applies Quality Assurance techniques based on “Abstraction Networks” to biomedical ontologies in BioPortal. The National Center for Biomedical Ontology provides B ioPortal as a repository of over 350 biomedical ontologies covering a wide range of domains. It is extremely difficult to design a new Quality Assurance methodology for each ontology in BioPortal. Fortunately, groups of ontologies in BioPortal share common structural features. Thus, they can be grouped into families based on combinations of these features. A uniform Quality Assurance methodology design for each family will achieve improved efficiency, which is critical with the limited Quality Assurance resources available to most ontology curators. In this dissertation, a family-based framework covering 186 BioPortal ontologies and accompanying Quality Assurance methods based on abstraction networks are presented to tackle this problem
Conceptual graph-based knowledge representation for supporting reasoning in African traditional medicine
Although African patients use both conventional or modern and traditional healthcare simultaneously, it has been proven that 80% of people rely on African traditional medicine (ATM). ATM includes medical activities stemming from practices, customs and traditions which were integral to the distinctive African cultures. It is based mainly on the oral transfer of knowledge, with the risk of losing critical knowledge. Moreover, practices differ according to the regions and the availability of medicinal plants. Therefore, it is necessary to compile tacit, disseminated and complex knowledge from various Tradi-Practitioners (TP) in order to determine interesting patterns for treating a given disease. Knowledge engineering methods for traditional medicine are useful to model suitably complex information needs, formalize knowledge of domain experts and highlight the effective practices for their integration to conventional medicine. The work described in this paper presents an approach which addresses two issues. First it aims at proposing a formal representation model of ATM knowledge and practices to facilitate their sharing and reusing. Then, it aims at providing a visual reasoning mechanism for selecting best available procedures and medicinal plants to treat diseases. The approach is based on the use of the Delphi method for capturing knowledge from various experts which necessitate reaching a consensus. Conceptual graph formalism is used to model ATM knowledge with visual reasoning capabilities and processes. The nested conceptual graphs are used to visually express the semantic meaning of Computational Tree Logic (CTL) constructs that are useful for formal specification of temporal properties of ATM domain knowledge. Our approach presents the advantage of mitigating knowledge loss with conceptual development assistance to improve the quality of ATM care (medical diagnosis and therapeutics), but also patient safety (drug monitoring)
The iOSC3 system: using ontologies and SWRL rules for intelligent supervision and care of patients with acute cardiac disorders
[Abstract] Physicians in the Intensive Care Unit (ICU) are specially trained to deal constantly with very large and complex quantities of clinical data and make quick decisions as they face complications. However, the amount of information generated and the way the data are presented may overload the cognitive skills of even experienced professionals and lead to inaccurate or erroneous actions that put patients’ lives at risk. In this paper, we present the design, development, and validation of iOSC3, an ontology-based system for intelligent supervision and treatment of critical patients with acute cardiac disorders. The system analyzes the patient’s condition and provides a recommendation about the treatment that should be administered to achieve the fastest possible recovery. If the recommendation is accepted by the doctor, the system automatically modifies the quantity of drugs that are being delivered to the patient. The knowledge base is constituted by an OWL ontology and a set of SWRL rules that represent the expert’s knowledge. iOSC3 has been developed in collaboration with experts from the Cardiac Intensive Care Unit (CICU) of the Meixoeiro Hospital, one of the most significant hospitals in the northwest region of Spain.Instituto de Salud Carlos III; FIS-PI10/02180Programa Iberoamericano de Ciencia y TecnologĂa para el Desarrollo; 209RT0366Galicia. ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; CN2012/217Xunta. ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; CN2011/034Galicia. ConsellerĂa de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; CN2012/21
Addendum to Informatics for Health 2017: Advancing both science and practice
This article presents presentation and poster abstracts that were mistakenly omitted from the original publication
Terminology Services: Standard Terminologies to Control Medical Vocabulary. “Words are Not What they Say but What they Mean”
Data entry is an obstacle for the usability of electronic health records (EHR) applications and the acceptance of physicians, who prefer to document using “free text”. Natural language is huge and very rich in details but at the same time is ambiguous; it has great dependence on context and uses jargon and acronyms. Healthcare Information Systems should capture clinical data in a structured and preferably coded format. This is crucial for data exchange between health information systems, epidemiological analysis, quality and research, clinical decision support systems, administrative functions, etc. In order to address this point, numerous terminological systems for the systematic recording of clinical data have been developed. These systems interrelate concepts of a particular domain and provide reference to related terms and possible definitions and codes. The purpose of terminology services consists of representing facts that happen in the real world through database management. This process is named Semantic Interoperability. It implies that different systems understand the information they are processing through the use of codes of clinical terminologies. Standard terminologies allow controlling medical vocabulary. But how do we do this? What do we need? Terminology services are a fundamental piece for health data management in health environment
Named Entity Recognition and Linking in a Multilingual Biomedical Setting
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de CiĂŞncias, 2021Information analysis is an essential process for all researchers and physicians. However, the amount of biomedical literature that we currently have available and the format in which it is found make this process difficult. Therefore, it is essential to apply text mining tools to automatically obtain information from these documents. The problem is that most of these tools are not designed to deal with non-English languages, which is critical in the biomedical literature, since many of these documents are written in the authors’ native language. Although there have been organized several shared tasks where text mining tools were developed for the Spanish language, the same does not happen for the Portuguese language. However, due to the lexical similarity between the two languages, it is possible to hypothesize that the tools for the two languages may be similar and that there is an annotation transfer between Portuguese and Spanish. To contribute to the development of text mining tools for Portuguese and Spanish, this dissertation presents the ICERL (Iberian Cancer-related Entity Recognition and Linking) system, a NERL (Named Entity Recognition and Linking) system that uses deep learning and it is composed of two similar pipelines for each language, and the parallel corpus ICR (Iberian Cancer-related) corpus. Both these tools are focused on the oncology domain. The application of the ICERL system on the ICR corpus resulted in 3,999 annotations in Spanish and 3,287 in Portuguese. The similarities between the annotations of the two languages and the F1-score of 0.858 that resulted from the comparison of the Portuguese annotations with the Spanish ones confirm the hypothesis initially presented.A divulgação de descobertas realizadas pelos investigadores e mĂ©dicos Ă© feita atravĂ©s de vários documentos como livros, artigos, patentes e outros tipos de publicações. Para que investigadores estejam atualizados sobre a sua área de interesse, Ă© essencial que realizem uma análise rápida e eficaz destes documentos. Isto porque, quanto mais eficiente for esta fase, melhores serĂŁo os resultados que serĂŁo obtidos e, quanto mais rápida for, mais tempo poderĂŁo dedicar a outras componentes dos seus trabalhos. No entanto, a velocidade com que estes documentos sĂŁo publicados e o facto de o texto presente nos mesmos ser expresso em linguagem natural dificulta esta tarefa. Por isso, torna-se essencial a aplicação de ferramentas de prospeção de texto para a extração de informação. As ferramentas de prospeção de texto sĂŁo compostas por diversas etapas, como por exemplo, Reconhecimento de Entidades Nomeadas (em inglĂŞs Named Entity Recognition ou NER) e Mapeamento de Entidades Nomeadas (em inglĂŞs Named Entity Linking ou NEL). A etapa NER corresponde Ă identificação de uma entidade no texto. NEL consiste na ligação de entidades a uma base de conhecimento. Os sistemas estado-de-arte para a NER sĂŁo mĂ©todos de aprendizagem profunda e normalmente utilizam a arquitetura BiLSTM-CRF. Por outro lado, os sistemas estado-de-arte NEL usam nĂŁo sĂł mĂ©todos de aprendizagem profunda, mas tambĂ©m mĂ©todos baseados em grafos. A maioria dos sistemas de prospeção de texto que atualmente temos disponĂveis está desenhada ape nas para a lĂngua inglesa, o que Ă© problemático, pois muitas das vezes a literatura biomĂ©dica encontra-se descrita na lĂngua nativa dos autores. Para resolver este problema tĂŞm surgido competições para desenvolver sistemas de prospeção de texto para outras lĂnguas que nĂŁo o inglĂŞs. Uma das lĂnguas que tĂŞm sido um dos principais focos destas competições Ă© a lĂngua espanhola. O espanhol Ă© a segunda lĂngua com o maior nĂşmero de falantes nativos no mundo e com um elevado nĂşmero de publicações biomĂ©dicas disponĂvel. Um dos exemplos de competições para a lĂngua espanhola Ă© o CANTEMIST. O objetivo do CANTEMIST passa pela identificação de entidades do domĂnio oncolĂłgico e a ligação das mesmas Ă base de dados ClasificaciĂłn Internacional de Enfermedades para OncologĂa (CIE-O). Por outro lado, o portuguĂŞs nĂŁo tĂŞm sido alvo de grande interesse por parte destas competições. Devido ao facto de que o portuguĂŞs e o espanhol derivarem do latim, existe uma semelhança lexical elevada entre as duas lĂnguas (89%). Portanto, Ă© possĂvel assumir que as soluções encontradas para espanhol possam ser adaptadas ou utilizadas para o portuguĂŞs, e que exista transferĂŞncias de anotações entre as duas lĂnguas. Por isso, o objetivo deste trabalho passa por criar ferramentas que validem esta hipĂłtese: o sistema ICERL (Iberian Cancer-related Entity Recognition and Linking) e o corpus ICR (Iberian Cancer-related). O sistema ICERL Ă© um sistema NERL (Named Entity Recognition and Linking) bilĂngue portuguĂŞs-espanhol, enquanto que o ICR Ă© um corpus paralelo para as mesmas lĂnguas. Ambas as ferramentas estĂŁo desenhadas para o domĂnio oncolĂłgico. A primeira etapa no desenvolvimento do sistema ICERL passou pela criação de uma pipeline NERL para a lĂngua espanhola especĂfica para o domĂnio oncolĂłgico. Esta pipeline foi baseada no trabalho desenvolvido pela equipa LasigeBioTM na competição CANTEMIST. A abordagem apresentada pelo LasigeBioTM no CANTEMIST consiste na utilização da framework Flair para a tarefa NER e do algoritmo Personalized PageRank (PPR) para a tarefa NEL. O Flair Ă© uma ferramenta que permite a combinação de diferentes embeddings (representações vetoriais para palavras) de diferentes modelos num sĂł para a tarefa NER. O PPR Ă© uma variação do algoritmo PageRank que Ă© utilizado para classificar importância de páginas web. O algoritmo PageRank Ă© aplicado sobre um grafo. Originalmente, cada nĂł do grafo representava uma página web e as ligações entre nĂłs representavam hiperligações entre páginas. O algoritmo estima a coerĂŞncia de cada nĂł no grafo, isto Ă©, a sua relevância. No contexto da tarefa NEL, o grafo Ă© composto por candidatos para as entidades de interesse. O Flair foi utilizado pela equipa LasigeBioTM para o treino de embeddings que foram obtidos em documentos em espanhol do PubMed. Estes embeddings foram integrados num modelo para NER que foi treinado nos conjuntos de treino e desenvolvimento do corpus do CANTEMIST. O modelo treinado foi depois utilizado no conjunto de teste do corpus do CANTEMIST para a obtenção de ficheiros de anotação com as entidades reconhecidas. Foi depois feita uma procura pelos candidatos para a tarefa de NEL das entidades reconhecidas em trĂŞs bases de dados: o CIE-O, o Health Sciences Descriptors (DeCS) e o International Classification of Diseases (ICD). A partir destes candidatos foi construĂdo um grafo e atravĂ©s do algoritmo PPR os candidatos foram classificados e foi escolhido o melhor candidato para ligar cada entidade. Esta pipeline foi aperfeiçoada atravĂ©s da adição de novos embeddings, um prolongamento do treino no modelo NER e uma correção de erros no cĂłdigo do sistema para a tarefa NEL. Apesar destas alterações contribuĂrem para um aumento significativo na performance da tarefa NEL (medida-F de 0.0061 para 0.665), o mesmo nĂŁo aconteceu para a tarefa NER (medida-F de 0.741 para 0.754). A versĂŁo final do sistema ICERL Ă© composta por uma pipeline para a lĂngua portuguesa e pela pipeline que foi testada no corpus do CANTEMIST, com uma ligeira diferença na tarefa NEL: em vez de ser escolhido apenas um candidato para cada entidade, Ă© escolhida uma lista de candidatos do CIE-O e o DeCS. Já na pipeline portuguesa sĂŁo escolhidos candidatos do DeCS e da Classificação Internacional de Doenças (CID). Esta diferença na tarefa NEL deve-se ao mĂ©todo que foi utilizado para avaliar a performance do sistema ICERL e para nĂŁo restringir o sistema a apenas um candidato e a um vocabulário. Para a construção da pipeline portuguesa, trĂŞs modelos para a tarefa NER foram testados e concluiu-se que a melhor abordagem passaria pela combinação de um modelo semelhante ao modelo utilizado na pipeline espanhola e o modelo BioBERTpt. Devido Ă elevada semelhança lexical entre as duas lĂnguas, foi testada a hipĂłtese de utilização da mesma pipeline para as duas lĂnguas. No entanto, atravĂ©s do software NLPStatTest foi possĂvel concluir que a utilização de uma pipeline especĂfica para cada lĂngua traduz-se numa melhoria de 58 por cento na medida-F para os textos em portuguĂŞs. O corpus ICR Ă© composto por 1555 documentos para cada lĂngua que foram retirados do SciELO. Uma vez que a pipeline espanhola foi treinada com ficheiros do CANTEMIST corpus, foi tambĂ©m necessário retirar documentos do SciELO e do PubMed para treinar a pipeline portuguesa. O sistema ICERL foi aplicado ao corpus ICR e o mĂ©todo de avaliação passou pela comparação dos resultados das anotações portuguesas com as anotações em espanhol. Isto porque foi possĂvel avaliar a performance da pipeline espanhol no corpus do CANTEMIST, e os resultados obtidos foram prĂłximos do estado-de-arte. A aplicação do sistema ICERL no corpus ICR resultou em 3999 anotações em espanhol sendo que 216 dessas anotações sĂŁo Ăşnicas e 3287 em portuguĂŞs sendo que 171 dessas anotações sĂŁo Ăşnicas. Para alĂ©m disso, a entidade câncer Ă© a entidade mais frequente para as duas lĂnguas. Para alĂ©m destas semelhanças nas anotações, o facto de ter sido obtido 0.858 em medida-F no mĂ©todo de avaliação permite concluir que existe transferĂŞncias de anotações entre as duas lĂnguas e que Ă© possĂvel utilizar ferramentas de prospeção de texto semelhantes para ambas
- …