41,843 research outputs found

    In no uncertain terms : a dataset for monolingual and multilingual automatic term extraction from comparable corpora

    Get PDF
    Automatic term extraction is a productive field of research within natural language processing, but it still faces significant obstacles regarding datasets and evaluation, which require manual term annotation. This is an arduous task, made even more difficult by the lack of a clear distinction between terms and general language, which results in low inter-annotator agreement. There is a large need for well-documented, manually validated datasets, especially in the rising field of multilingual term extraction from comparable corpora, which presents a unique new set of challenges. In this paper, a new approach is presented for both monolingual and multilingual term annotation in comparable corpora. The detailed guidelines with different term labels, the domain- and language-independent methodology and the large volumes annotated in three different languages and four different domains make this a rich resource. The resulting datasets are not just suited for evaluation purposes but can also serve as a general source of information about terms and even as training data for supervised methods. Moreover, the gold standard for multilingual term extraction from comparable corpora contains information about term variants and translation equivalents, which allows an in-depth, nuanced evaluation

    The systematic guideline review: method, rationale, and test on chronic heart failure

    Get PDF
    Background: Evidence-based guidelines have the potential to improve healthcare. However, their de-novo-development requires substantial resources-especially for complex conditions, and adaptation may be biased by contextually influenced recommendations in source guidelines. In this paper we describe a new approach to guideline development-the systematic guideline review method (SGR), and its application in the development of an evidence-based guideline for family physicians on chronic heart failure (CHF). Methods: A systematic search for guidelines was carried out. Evidence-based guidelines on CHF management in adults in ambulatory care published in English or German between the years 2000 and 2004 were included. Guidelines on acute or right heart failure were excluded. Eligibility was assessed by two reviewers, methodological quality of selected guidelines was appraised using the AGREE instrument, and a framework of relevant clinical questions for diagnostics and treatment was derived. Data were extracted into evidence tables, systematically compared by means of a consistency analysis and synthesized in a preliminary draft. Most relevant primary sources were re-assessed to verify the cited evidence. Evidence and recommendations were summarized in a draft guideline. Results: Of 16 included guidelines five were of good quality. A total of 35 recommendations were systematically compared: 25/35 were consistent, 9/35 inconsistent, and 1/35 un-rateable (derived from a single guideline). Of the 25 consistencies, 14 were based on consensus, seven on evidence and four differed in grading. Major inconsistencies were found in 3/9 of the inconsistent recommendations. We re-evaluated the evidence for 17 recommendations (evidence-based, differing evidence levels and minor inconsistencies) - the majority was congruent. Incongruity was found where the stated evidence could not be verified in the cited primary sources, or where the evaluation in the source guidelines focused on treatment benefits and underestimated the risks. The draft guideline was completed in 8.5 man-months. The main limitation to this study was the lack of a second reviewer. Conclusion: The systematic guideline review including framework development, consistency analysis and validation is an effective, valid, and resource saving-approach to the development of evidence-based guidelines

    Spanish named entity recognition in the biomedical domain

    Get PDF
    Named Entity Recognition in the clinical domain and in languages different from English has the difficulty of the absence of complete dictionaries, the informality of texts, the polysemy of terms, the lack of accordance in the boundaries of an entity, the scarcity of corpora and of other resources available. We present a Named Entity Recognition method for poorly resourced languages. The method was tested with Spanish radiology reports and compared with a conditional random fields system.Peer ReviewedPostprint (author's final draft

    Improving the translation environment for professional translators

    Get PDF
    When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side. This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project

    Short-term Heart Rate Turbulence Analysis Versus Variability and Baroreceptor Sensitivity in Patients With Dilated Cardiomyopathy

    Get PDF
    New methods for the analysis of arrhythmias and their hemodynamic consequences have been applied in risk stratification, in particular to patients after myocardial infarction. This study investigates the suitability of short-term heart rate turbulence (HRT) analysis in comparison to heart rate and blood pressure variability as well as baroreceptor sensitivity analyses to characterise the regulatory differences between patients with dilated cardiomyopathy (DCM) and healthy controls. In this study, 30 minutes data of non-invasive continuous blood pressure and ECGs of 37 DCM patients and 167 controls measured under standard resting conditions were analysed. The results show highly significant differences between DCM patients and controls in heart rate and blood pressure variability as well as in baroreceptor sensitivity parameters. Applying a combined heart rate-blood pressure trigger, ventricular premature beats were detected in 24.3% (9) of the DCM patients and 11.3% (19) of the controls. This fact demonstrates the limited applicability of short-term HRT analyses. However, the HRT parameters showed significant differences in this subgroup with ventricular premature beats (turbulence onset: DCM: 1.80±2.72, controls: - 4.34±3.10, p<0.001; turbulence slope: DCM: 6.75±5.50, controls: 21.30±17.72, p=0.021). Considering all (including HRT) parameters in the subgroup with ventricular beats, a discrimination rate between DCM patients and controls of 88.0% was obtained (max. 6 parameters). The corresponding value obtained for the total group was 86.3% (without HRT parameters). Comparable classification rates and high correlations between heart rate turbulence and variability and baroreflex parameters point to a more universal applicability of the latter methods

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)
    • …
    corecore