1,512 research outputs found
Errors lingüístics en el domini biomèdic: Cap a una tipologia d’errors per a l’espanyol
L’objectiu d’aquest treball és l’anàlisi d’errors continguts en un corpus d’informes
mèdics en llenguatge natural i el disseny d’una tipologia d’errors, ja que no hi va haver una
revisió sistemàtica sobre verificació i correcció d’errors en documentació clínica en castellà. En
el desenvolupament de sistemes automàtics de detecció i correcció, és d’interès aprofundir en la
naturalesa dels errors lingüístics que es produeixen en els informes clínics per tal de detectar-los i
tractar-los adequadament. Els resultats mostren que els errors d’omissió són els més freqüents en
la mostra analitzada i que la longitud de la paraula sens dubte influeix en la freqüència d’error.
La tipificació dels patrons d’error proporcionats permet el desenvolupament d’un mòdul basat
en coneixements lingüístics, actualment en curs, que serà capaç de millorar el rendiment dels
sistemes de correcció de detecció i correcció d’errors per al domini biomèdicThe objective of this work is the analysis of errors contained in a corpus of medical reports in
natural language and the design of a typology of errors, as there was no systematic review on
verification and correction of errors in clinical documentation in Spanish. In the development
of automatic detection and correction systems, it is of great interest to delve into the nature of
the linguistic errors that occur in clinical reports, in order to detect and treat them properly.
The results show that omission errors are the most frequent ones in the analyzed sample, and
that word length certainly influences error frequency. The typification of error patterns provided
is enabling the development of a module based on linguistic knowledge, which is currently in
progress. This will help to improve the performance of error detection and correction systems for
the biomedical domain.This work was supported by the Spanish National Research Agency (AEI) through project LaTe4PSP
(PID2019-107652RB-I00/AEI/10.13039/501100011033). Furthermore, the main autor is supported by
Ministerio de Universidades of Spain through the national program Ayudas para la formación de profesorado
universitario (FPU), with reference FPU16/0332
Restoring the intended structure of Hungarian ophthalmology documents
Clinical documents have been an emerg-
ing target of natural language applications.
Information stored in documents created
at clinical settings can be very useful for
doctors or medical experts. However,
the way these documents are created and
stored is often a hindrance to accessing
their content. In this paper, an automatic
method for restoring the intended structure
of Hungarian ophthalmology documents is
described. The statements in these docu-
ments in their original form appeared un-
der various subheadings. We successfully
applied our method for reassigning the
correct heading for each line based on its
content. The results show that the cate-
gorization was correct for 81.99% of the
statements in our testset, compared to a
human categorization
Nem felügyelt módszerek alkalmazása releváns kifejezések azonosítására és csoportosítására klinikai dokumentumokban
A kórházi körülmények között létrejövő klinikai dokumentu-
mok feldolgozása a nyelvtechnológia egyik központi kutatás
i területévé
vált az utóbbi időben. A más jellegű, általános nyelvezetű sz
övegek feldolgozására használt kész eszközök azonban nem alkalmazhatóak, illetve
gyengén teljesítenek a speciális orvosi szövegek esetén. To
vábbá számos
olyan feladat van, amelyek során a szakkifejezések azonosítás
a és a közöt
tük lévő kapcsolatok meghatározása nagyon fontos lépés, azo
nban csak
külső lexikai erőforrások, tezauruszok és ontológiák segít
ségével oldhatók
meg. Az olyan kisebb nyelvek esetén, mint a magyar, ilyen tudásbázisok
nem állnak rendelkezésre. Ezért a szövegekben lévő informác
iók annotálása és rendszerezése emberi szakértői munkát igényel. Ebb
en a cikkben
bemutatjuk,hogy statisztikai módszerekkel milyen módon al
akíthatók át
a nyers dokumentumok egy olyan előfeldolgozott,részben str
ukturált for
mára,ami ezt az emberi munkát könnyebbé teszi. A csupán a korpusz fel
használásával alkalmazott modulok felismerik és feloldják a r
övidítéseket,
azonosítják a többszavas kifejezéseket és meghatározzák azok
hasonlóságát. Végül létrehoztuk a szövegek egy magasabb szintű repre
zentációját,
ahol az egyes kifejezések helyére a hasonlóságuk alapján kialakított klasz
terek azonosítóját helyettesítve a szövegek egyszerűsíthe
tőek, a gyakran
ismétlődő mintázatok általános alakja meghatározható
A Resource for Detecting Misspellings and Denoising Medical Text Data
In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è così rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche
An improved Levenshtein algorithm for spelling correction word candidate list generation
Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same
Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results
Abstract This paper describes part of an ongoing effort to improve the readability of Swedish electronic health records (EHRs). An EHR contains systematic documentation of a single patient's medical history across time, entered by healthcare professionals with the purpose of enabling safe and informed care. Linguistically, medical records exemplify a highly specialised domain, which can be superficially characterised as having telegraphic sentences involving displaced or missing words, abundant abbreviations, spelling variations including misspellings, and terminology. We report results on lexical simplification of Swedish EHRs, by which we mean detecting the unknown, out-ofdictionary words and trying to resolve them either as compounded known words, abbreviations or misspellings
"I Saw You": searching for lost love via practices of reading, writing and responding
How do emotions move and how do emotions move us? How are feelings and recognitions distributed socio-materially? Based on a multi-site ethnographic study of a romantic correspondance system, this article explores the themes of love, privacy, identity and public displays. Informed by ethnomethodology and actor-network theory its investigations into these informal affairs are somewhat unusual in that much of the research carried out by those bodies of work concentrates on institutional settings such as laboratories, offices and courtrooms. In common with ethnomethodology it attempts to re-specify some topics of interest in the social sciences and humanities; in this case, documents and practices of writing and reading those documents. A key element of the approach taken is restoring to reading and writing their situated nature as observable, knowable, distributed community practices. Re-specifying topics for the social sciences involves the detailed description of several situated ways in which the romantic correspondence system is used. Detailing the translations, transformations and transportations of documents as 'quasi-objects' through several orderings, the article suggests that documents have no essential meaning and that making them meaningful is part of the work of those settings
- …