3,974 research outputs found

    Unsupervised Context-Sensitive Spelling Correction of English and Dutch Clinical Free-Text with Word and Character N-Gram Embeddings

    Full text link
    We present an unsupervised context-sensitive spelling correction method for clinical free-text that uses word and character n-gram embeddings. Our method generates misspelling replacement candidates and ranks them according to their semantic fit, by calculating a weighted cosine similarity between the vectorized representation of a candidate and the misspelling context. To tune the parameters of this model, we generate self-induced spelling error corpora. We perform our experiments for two languages. For English, we greatly outperform off-the-shelf spelling correction tools on a manually annotated MIMIC-III test set, and counter the frequency bias of a noisy channel model, showing that neural embeddings can be successfully exploited to improve upon the state-of-the-art. For Dutch, we also outperform an off-the-shelf spelling correction tool on manually annotated clinical records from the Antwerp University Hospital, but can offer no empirical evidence that our method counters the frequency bias of a noisy channel model in this case as well. However, both our context-sensitive model and our implementation of the noisy channel model obtain high scores on the test set, establishing a state-of-the-art for Dutch clinical spelling correction with the noisy channel model.Comment: Appears in volume 7 of the CLIN Journal, http://www.clinjournal.org/biblio/volum

    A Resource for Detecting Misspellings and Denoising Medical Text Data

    Get PDF
    In this paper we propose a method for collecting a dictionary to deal with noisy medical text documents. The quality of such Italian Emergency Room Reports is so poor that in most cases these can be hardly automatically elaborated; this also holds for other languages (e.g., English), with the notable difference that no Italian dictionary has been proposed to deal with this jargon. In this work we introduce and evaluate a resource designed to fill this gap.In questo lavoro illustriamo un metodo per la costruzione di un dizionario dedicato all’elaborazione di documenti medici, la porzione delle cartelle cliniche annotata nei reparti di pronto soccorso. Questo tipo di documenti è così rumoroso che in genere le cartelle cliniche difficilmente posono essere direttamente elaborate in maniera automatica. Pur essendo il problema di ripulire questo tipo di documenti un problema rilevante e diffuso, non esisteva un dizionario completo per trattare questo linguaggio settoriale. In questo lavoro proponiamo e valutiamo una risorsa finalizzata a condurre questo tipo di elaborazione sulle cartelle cliniche

    Errors lingüístics en el domini biomèdic: Cap a una tipologia d’errors per a l’espanyol

    Get PDF
    L’objectiu d’aquest treball és l’anàlisi d’errors continguts en un corpus d’informes mèdics en llenguatge natural i el disseny d’una tipologia d’errors, ja que no hi va haver una revisió sistemàtica sobre verificació i correcció d’errors en documentació clínica en castellà. En el desenvolupament de sistemes automàtics de detecció i correcció, és d’interès aprofundir en la naturalesa dels errors lingüístics que es produeixen en els informes clínics per tal de detectar-los i tractar-los adequadament. Els resultats mostren que els errors d’omissió són els més freqüents en la mostra analitzada i que la longitud de la paraula sens dubte influeix en la freqüència d’error. La tipificació dels patrons d’error proporcionats permet el desenvolupament d’un mòdul basat en coneixements lingüístics, actualment en curs, que serà capaç de millorar el rendiment dels sistemes de correcció de detecció i correcció d’errors per al domini biomèdicThe objective of this work is the analysis of errors contained in a corpus of medical reports in natural language and the design of a typology of errors, as there was no systematic review on verification and correction of errors in clinical documentation in Spanish. In the development of automatic detection and correction systems, it is of great interest to delve into the nature of the linguistic errors that occur in clinical reports, in order to detect and treat them properly. The results show that omission errors are the most frequent ones in the analyzed sample, and that word length certainly influences error frequency. The typification of error patterns provided is enabling the development of a module based on linguistic knowledge, which is currently in progress. This will help to improve the performance of error detection and correction systems for the biomedical domain.This work was supported by the Spanish National Research Agency (AEI) through project LaTe4PSP (PID2019-107652RB-I00/AEI/10.13039/501100011033). Furthermore, the main autor is supported by Ministerio de Universidades of Spain through the national program Ayudas para la formación de profesorado universitario (FPU), with reference FPU16/0332

    Semantic Sentiment Analysis of Twitter Data

    Full text link
    Internet and the proliferation of smart mobile devices have changed the way information is created, shared, and spreads, e.g., microblogs such as Twitter, weblogs such as LiveJournal, social networks such as Facebook, and instant messengers such as Skype and WhatsApp are now commonly used to share thoughts and opinions about anything in the surrounding world. This has resulted in the proliferation of social media content, thus creating new opportunities to study public opinion at a scale that was never possible before. Naturally, this abundance of data has quickly attracted business and research interest from various fields including marketing, political science, and social studies, among many others, which are interested in questions like these: Do people like the new Apple Watch? Do Americans support ObamaCare? How do Scottish feel about the Brexit? Answering these questions requires studying the sentiment of opinions people express in social media, which has given rise to the fast growth of the field of sentiment analysis in social media, with Twitter being especially popular for research due to its scale, representativeness, variety of topics discussed, as well as ease of public access to its messages. Here we present an overview of work on sentiment analysis on Twitter.Comment: Microblog sentiment analysis; Twitter opinion mining; In the Encyclopedia on Social Network Analysis and Mining (ESNAM), Second edition. 201

    Data Mining in Electronic Commerce

    Full text link
    Modern business is rushing toward e-commerce. If the transition is done properly, it enables better management, new services, lower transaction costs and better customer relations. Success depends on skilled information technologists, among whom are statisticians. This paper focuses on some of the contributions that statisticians are making to help change the business world, especially through the development and application of data mining methods. This is a very large area, and the topics we cover are chosen to avoid overlap with other papers in this special issue, as well as to respect the limitations of our expertise. Inevitably, electronic commerce has raised and is raising fresh research problems in a very wide range of statistical areas, and we try to emphasize those challenges.Comment: Published at http://dx.doi.org/10.1214/088342306000000204 in the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    The Educational Leader, Vol. 8, No. 2: Commerce and Business Administration, Foreign Languages, and Music Number

    Get PDF
    The Educational Leader, March 1945, Vol. 8, No. 2, 43 pages, a bulletin by Kansas State Teacher College of Pittsburg

    Spartan Daily, April 2, 1990

    Get PDF
    Volume 94, Issue 46https://scholarworks.sjsu.edu/spartandaily/7976/thumbnail.jp

    The Development of Student Personal Education Plans into Portfolios through the Use of the Pathway Model

    Get PDF
    The purpose of this project was to create a Personalized Education Plan and portfolio system to support secondary students in quest for successful skill development and job employment upon graduation. The project was specifically designed for freshmen students and the teaching staff at Auburn High Schooi in Auburn, Washington
    corecore