1,512 research outputs found

    Errors lingüístics en el domini biomèdic: Cap a una tipologia d’errors per a l’espanyol

    Get PDF
    L’objectiu d’aquest treball és l’anàlisi d’errors continguts en un corpus d’informes mèdics en llenguatge natural i el disseny d’una tipologia d’errors, ja que no hi va haver una revisió sistemàtica sobre verificació i correcció d’errors en documentació clínica en castellà. En el desenvolupament de sistemes automàtics de detecció i correcció, és d’interès aprofundir en la naturalesa dels errors lingüístics que es produeixen en els informes clínics per tal de detectar-los i tractar-los adequadament. Els resultats mostren que els errors d’omissió són els més freqüents en la mostra analitzada i que la longitud de la paraula sens dubte influeix en la freqüència d’error. La tipificació dels patrons d’error proporcionats permet el desenvolupament d’un mòdul basat en coneixements lingüístics, actualment en curs, que serà capaç de millorar el rendiment dels sistemes de correcció de detecció i correcció d’errors per al domini biomèdicThe objective of this work is the analysis of errors contained in a corpus of medical reports in natural language and the design of a typology of errors, as there was no systematic review on verification and correction of errors in clinical documentation in Spanish. In the development of automatic detection and correction systems, it is of great interest to delve into the nature of the linguistic errors that occur in clinical reports, in order to detect and treat them properly. The results show that omission errors are the most frequent ones in the analyzed sample, and that word length certainly influences error frequency. The typification of error patterns provided is enabling the development of a module based on linguistic knowledge, which is currently in progress. This will help to improve the performance of error detection and correction systems for the biomedical domain.This work was supported by the Spanish National Research Agency (AEI) through project LaTe4PSP (PID2019-107652RB-I00/AEI/10.13039/501100011033). Furthermore, the main autor is supported by Ministerio de Universidades of Spain through the national program Ayudas para la formación de profesorado universitario (FPU), with reference FPU16/0332

    Restoring the intended structure of Hungarian ophthalmology documents

    Get PDF
    Clinical documents have been an emerg- ing target of natural language applications. Information stored in documents created at clinical settings can be very useful for doctors or medical experts. However, the way these documents are created and stored is often a hindrance to accessing their content. In this paper, an automatic method for restoring the intended structure of Hungarian ophthalmology documents is described. The statements in these docu- ments in their original form appeared un- der various subheadings. We successfully applied our method for reassigning the correct heading for each line based on its content. The results show that the cate- gorization was correct for 81.99% of the statements in our testset, compared to a human categorization

    Nem felügyelt módszerek alkalmazása releváns kifejezések azonosítására és csoportosítására klinikai dokumentumokban

    Get PDF
    A kórházi körülmények között létrejövő klinikai dokumentu- mok feldolgozása a nyelvtechnológia egyik központi kutatás i területévé vált az utóbbi időben. A más jellegű, általános nyelvezetű sz övegek feldolgozására használt kész eszközök azonban nem alkalmazhatóak, illetve gyengén teljesítenek a speciális orvosi szövegek esetén. To vábbá számos olyan feladat van, amelyek során a szakkifejezések azonosítás a és a közöt tük lévő kapcsolatok meghatározása nagyon fontos lépés, azo nban csak külső lexikai erőforrások, tezauruszok és ontológiák segít ségével oldhatók meg. Az olyan kisebb nyelvek esetén, mint a magyar, ilyen tudásbázisok nem állnak rendelkezésre. Ezért a szövegekben lévő informác iók annotálása és rendszerezése emberi szakértői munkát igényel. Ebb en a cikkben bemutatjuk,hogy statisztikai módszerekkel milyen módon al akíthatók át a nyers dokumentumok egy olyan előfeldolgozott,részben str ukturált for mára,ami ezt az emberi munkát könnyebbé teszi. A csupán a korpusz fel használásával alkalmazott modulok felismerik és feloldják a r övidítéseket, azonosítják a többszavas kifejezéseket és meghatározzák azok hasonlóságát. Végül létrehoztuk a szövegek egy magasabb szintű repre zentációját, ahol az egyes kifejezések helyére a hasonlóságuk alapján kialakított klasz terek azonosítóját helyettesítve a szövegek egyszerűsíthe tőek, a gyakran ismétlődő mintázatok általános alakja meghatározható

    A Resource for Detecting Misspellings and Denoising Medical Text Data

    Get PDF
    In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è così rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche

    An improved Levenshtein algorithm for spelling correction word candidate list generation

    Get PDF
    Candidates’ list generation in spelling correction is a process of finding words from a lexicon that should be close to the incorrect word. The most widely used algorithm for generating candidates’ list for incorrect words is based on Levenshtein distance. However, this algorithm takes too much time when there is a large number of spelling errors. The reason is that calculating Levenshtein algorithm includes operations that create an array and fill the cells of this array by comparing the characters of an incorrect word with the characters of a word from a lexicon. Since most lexicons contain millions of words, then these operations will be repeated millions of times for each incorrect word to generate its candidates list. This dissertation improved Levenshtein algorithm by designing an operational technique that has been included in this algorithm. The proposed operational technique enhances Levenshtein algorithm in terms of the processing time of its executing without affecting its accuracy. It reduces the operations required to measure cells’ values in the first row, first column, second row, second column, third row, and third column in Levenshtein array. The improved Levenshtein algorithm was evaluated against the original algorithm. Experimental results show that the proposed algorithm outperforms Levenshtein algorithm in terms of the processing time by 36.45% while the accuracy of both algorithms is still the same

    Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results

    Get PDF
    Abstract This paper describes part of an ongoing effort to improve the readability of Swedish electronic health records (EHRs). An EHR contains systematic documentation of a single patient's medical history across time, entered by healthcare professionals with the purpose of enabling safe and informed care. Linguistically, medical records exemplify a highly specialised domain, which can be superficially characterised as having telegraphic sentences involving displaced or missing words, abundant abbreviations, spelling variations including misspellings, and terminology. We report results on lexical simplification of Swedish EHRs, by which we mean detecting the unknown, out-ofdictionary words and trying to resolve them either as compounded known words, abbreviations or misspellings

    "I Saw You": searching for lost love via practices of reading, writing and responding

    Get PDF
    How do emotions move and how do emotions move us? How are feelings and recognitions distributed socio-materially? Based on a multi-site ethnographic study of a romantic correspondance system, this article explores the themes of love, privacy, identity and public displays. Informed by ethnomethodology and actor-network theory its investigations into these informal affairs are somewhat unusual in that much of the research carried out by those bodies of work concentrates on institutional settings such as laboratories, offices and courtrooms. In common with ethnomethodology it attempts to re-specify some topics of interest in the social sciences and humanities; in this case, documents and practices of writing and reading those documents. A key element of the approach taken is restoring to reading and writing their situated nature as observable, knowable, distributed community practices. Re-specifying topics for the social sciences involves the detailed description of several situated ways in which the romantic correspondence system is used. Detailing the translations, transformations and transportations of documents as 'quasi-objects' through several orderings, the article suggests that documents have no essential meaning and that making them meaningful is part of the work of those settings
    corecore