3,209 research outputs found
Towards semantic interpretation of clinical narratives with ontology-based text mining
In the realm of knee pathology, magnetic resonance imaging (MRI) has the advantage of visualising all structures within the knee joint, which makes it a valuable tool for increasing diagnostic accuracy and planning surgical treatments. Therefore, clinical narratives found in MRI reports convey valuable diagnostic information. A range of studies have proven the feasibility of natural language processing for information extraction from clinical narratives. However, no study focused specifically on MRI reports in relation to knee pathology, possibly due to the complexity of knee anatomy and a wide range of conditions that may be associated with different anatomical entities.
In this thesis, we describe KneeTex, an information extraction system that operates in this domain. As an ontology-driven information extraction system, KneeTex makes active use of an ontology to strongly guide and constrain text analysis. We used automatic term recognition to facilitate the development of a domain-specific ontology with sufficient detail and coverage for text mining applications. In combination with the ontology, high regularity of the sublanguage used in knee MRI reports allowed us to model its processing by a set of sophisticated lexico-semantic rules with minimal syntactic analysis. The main processing steps involve named entity recognition combined with coordination, enumeration, ambiguity and co-reference resolution, followed by text segmentation. Ontology-based semantic typing is then used to drive the template filling process. We adopted an existing ontology, TRAK (Taxonomy for RehAbilitation of Knee conditions), for use within KneeTex. The original TRAK ontology expanded from 1,292 concepts, 1,720 synonyms and 518 relationship instances to 1,621 concepts, 2,550 synonyms and 560 relationship instances. This provided KneeTex with a very fine-grained lexicosemantic knowledge base, which is highly attuned to the given sublanguage. Information extraction results were evaluated on a test set of 100 MRI reports. A gold standard consisted of 1,259 filled template records with the following slots: finding, finding qualifier, negation, certainty, anatomy and anatomy qualifier. KneeTex extracted information with precision of 98.00%, recall of 97.63% and F-measure of 97.81%, the values of which are in line with human-like performance.
To demonstrate the utility of formally structuring clinical narratives and possible applications in epidemiology, we describe an implementation of KneeBase, a web-based information retrieval system that supports complex searches over the results obtained via KneeTex. It is the structured nature of extracted information that allows queries that encode not only search terms, but also relationships between them (e.g. between clinical findings and anatomical locations). This is of particular value for large-scale epidemiology studies based on qualitative evidence, whose main bottleneck involves manual inspection of many text documents.
The two systems presented in this dissertation, KneeTex and KneeBase, operate in a specific domain, but illustrate generic principles for rapid development of clinical text mining systems. The key enabler of such systems is the existence of an appropriate ontology. To tackle this issue, we proposed a strategy for ontology expansion, which proved effective in fast–tracking the development of our information extraction and retrieval systems
Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples
Machine Learning has been a big success story during the AI resurgence. One
particular stand out success relates to learning from a massive amount of data.
In spite of early assertions of the unreasonable effectiveness of data, there
is increasing recognition for utilizing knowledge whenever it is available or
can be created purposefully. In this paper, we discuss the indispensable role
of knowledge for deeper understanding of content where (i) large amounts of
training data are unavailable, (ii) the objects to be recognized are complex,
(e.g., implicit entities and highly subjective content), and (iii) applications
need to use complementary or related data in multiple modalities/media. What
brings us to the cusp of rapid progress is our ability to (a) create relevant
and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP
techniques. Using diverse examples, we seek to foretell unprecedented progress
in our ability for deeper understanding and exploitation of multimodal data and
continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International
Conference on Web Intelligence (WI). arXiv admin note: substantial text
overlap with arXiv:1610.0770
Ranking Significant Discrepancies in Clinical Reports
Medical errors are a major public health concern and a leading cause of death
worldwide. Many healthcare centers and hospitals use reporting systems where
medical practitioners write a preliminary medical report and the report is
later reviewed, revised, and finalized by a more experienced physician. The
revisions range from stylistic to corrections of critical errors or
misinterpretations of the case. Due to the large quantity of reports written
daily, it is often difficult to manually and thoroughly review all the
finalized reports to find such errors and learn from them. To address this
challenge, we propose a novel ranking approach, consisting of textual and
ontological overlaps between the preliminary and final versions of reports. The
approach learns to rank the reports based on the degree of discrepancy between
the versions. This allows medical practitioners to easily identify and learn
from the reports in which their interpretation most substantially differed from
that of the attending physician (who finalized the report). This is a crucial
step towards uncovering potential errors and helping medical practitioners to
learn from such errors, thus improving patient-care in the long run. We
evaluate our model on a dataset of radiology reports and show that our approach
outperforms both previously-proposed approaches and more recent language models
by 4.5% to 15.4%.Comment: ECIR 2020 (short
Knowledge-based best of breed approach for automated detection of clinical events based on German free text digital hospital discharge letters
OBJECTIVES:
The secondary use of medical data contained in electronic medical records, such as hospital discharge letters, is a valuable resource for the improvement of clinical care (e.g. in terms of medication safety) or for research purposes. However, the automated processing and analysis of medical free text still poses a huge challenge to available natural language processing (NLP) systems. The aim of this study was to implement a knowledge-based best of breed approach, combining a terminology server with integrated ontology, a NLP pipeline and a rules engine.
METHODS:
We tested the performance of this approach in a use case. The clinical event of interest was the particular drug-disease interaction "proton-pump inhibitor [PPI] use and osteoporosis". Cases were to be identified based on free text digital discharge letters as source of information. Automated detection was validated against a gold standard.
RESULTS:
Precision of recognition of osteoporosis was 94.19%, and recall was 97.45%. PPIs were detected with 100% precision and 97.97% recall. The F-score for the detection of the given drug-disease-interaction was 96,13%.
CONCLUSION:
We could show that our approach of combining a NLP pipeline, a terminology server, and a rules engine for the purpose of automated detection of clinical events such as drug-disease interactions from free text digital hospital discharge letters was effective. There is huge potential for the implementation in clinical and research contexts, as this approach enables analyses of very high numbers of medical free text documents within a short time period
Foreword
The aim of this Workshop is to focus on building and evaluating resources used to facilitate biomedical text mining, including their design, update, delivery, quality assessment, evaluation and dissemination. Key resources of interest are lexical and knowledge repositories (controlled vocabularies, terminologies, thesauri, ontologies) and annotated corpora, including both task-specific resources and repositories reengineered from biomedical or general language resources. Of particular interest is the process of building annotated resources, including designing guidelines and annotation schemas (aiming at both syntactic and semantic interoperability) and relying on language engineering standards. Challenging aspects are updates and evolution management of resources, as well as their documentation, dissemination and evaluation
Recommended from our members
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: NL
Proceedings ICPW'07: 2nd International Conference on the Pragmatic Web, 22-23 Oct. 2007, Tilburg: N
Information extraction from medication leaflets
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
- …