Search CORE

5 research outputs found

Dictionary construction and identification of possible adverse drug events in Danish clinical narrative text

Author: Brunak Søren
Eriksson Robert
Jensen Lars Juhl
Jensen Peter Bjødstrup
Pletscher-Frankild Sune
Publication venue: 'BMJ'
Publication date: 01/01/2013
Field of study

OBJECTIVE: Drugs have tremendous potential to cure and relieve disease, but the risk of unintended effects is always present. Healthcare providers increasingly record data in electronic patient records (EPRs), in which we aim to identify possible adverse events (AEs) and, specifically, possible adverse drug events (ADEs). MATERIALS AND METHODS: Based on the undesirable effects section from the summary of product characteristics (SPC) of 7446 drugs, we have built a Danish ADE dictionary. Starting from this dictionary we have developed a pipeline for identifying possible ADEs in unstructured clinical narrative text. We use a named entity recognition (NER) tagger to identify dictionary matches in the text and post-coordination rules to construct ADE compound terms. Finally, we apply post-processing rules and filters to handle, for example, negations and sentences about subjects other than the patient. Moreover, this method allows synonyms to be identified and anatomical location descriptions can be merged to allow appropriate grouping of effects in the same location. RESULTS: The method identified 1 970 731 (35 477 unique) possible ADEs in a large corpus of 6011 psychiatric hospital patient records. Validation was performed through manual inspection of possible ADEs, resulting in precision of 89% and recall of 75%. DISCUSSION: The presented dictionary-building method could be used to construct other ADE dictionaries. The complication of compound words in Germanic languages was addressed. Additionally, the synonym and anatomical location collapse improve the method. CONCLUSIONS: The developed dictionary and method can be used to identify possible ADEs in Danish clinical narratives

Crossref

Copenhagen University Research Information System

PubMed Central

Online Research Database In Technology

HFST-SweNER – A New NER Resource for Swedish

Author: Borin Lars
Hardwick Sam
Kokkinakis Dimitrios
Linden Krister
Niemi Jyrki
Publication venue: European Language Resources Association (ELRA)
Publication date: 26/05/2014
Field of study

Named entity recognition (NER) is a knowledge-intensive information extraction task that is used for recognizing textual mentions of entities that belong to a predefined set of categories, such as locations, organizations and time expressions. NER is a challenging, difficult, yet essential preprocessing technology for many natural language processing applications, and particularly crucial for language understanding. NER has been actively explored in academia and in industry especially during the last years due to the advent of social media data. This paper describes the conversion, modeling and adaptation of a Swedish NER system from a hybrid environment, with integrated functionality from various processing components, to the Helsinki Finite-State Transducer Technology (HFST) platform. This new HFST-based NER (HFST-SweNER) is a full-fledged open source implementation that supports a variety of generic named entity types and consists of multiple, reusable resource layers, e.g., various n-gram-based named entity lists (gazetteers).Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Projecting named entity tags from a resource rich language to a resource poor language

Author: Abu Bakar Zainab
Oxley Alan
Zamin Norshuhani
Publication venue: Universiti Utara Malaysia Press
Publication date: 01/01/2012
Field of study

Named Entities (NE) are the prominent entities appearing in textual documents.Automatic classification of NE in a textual corpus is a vital process in Information Extraction and Information Retrieval research. Named Entity Recognition (NER) is the identification of words in text that correspond to a pre-defined taxonomy such as person, organization, location, date, time, etc.This article focuses on the person (PER), organization (ORG) and location (LOC) entities for a Malay journalistic corpus of terrorism.A projection algorithm, using the Dice Coefficient function and bigram scoring method with domain-specific rules, is suggested to map the NE information from the English corpus to the Malay corpus of terrorism.The English corpus is the translated version of the Malay corpus.Hence, these two corpora are treated as parallel corpora. The method computes the string similarity between the English words and the list of available lexemes in a pre-built lexicon that approximates the best NE mapping.The algorithm has been effectively evaluated using our own terrorism tagged corpus; it achieved satisfactory results in terms of precision, recall, and F-measure.An evaluation of the selected open source NER tool for English is also presented

UUM Repository

Crossref

Directory of Open Access Journals

Named Entity Recognition for the Mainland Scandinavian Languages

Author: Bick Eckhard
Björk Jónsdottir Andra
Bondi Johannessen Janne
Haaland Åsne
Hagen Kristin
Hansen Dorte Haltrup
Kokkinakis Dimitrios
Meurer Paul
Nøklestad Anders
Publication venue
Publication date: 01/01/2004
Field of study

Copenhagen University Research Information System