160 research outputs found

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Volumetric and three-dimensional examination of sella turcica by cone-beam computed tomography: reference data for guidance to pathologic pituitary morphology

    Get PDF
    Background: The aim of the study was to assess the dimensions and volume of sella turcica in healthy Caucasian adults with normal occlusion and facial appearance from cone-beam computed tomography (CBCT) images. Materials and methods: CBCT images of 80 Caucasian adult patients (40 males, 40 females) with normal facial appearance and occlusion taken previously for diagnostic purposes were evaluated. Two groups were constructed in accordance to gender. The volume, length, diameter, and depth of the sella turcica were measured by Romexis software programme. Mann-Whitney U test and Independent t-tests were used for statistical analysis. Results: The mean lengths of the sella were 9.9 mm and 10.2 mm, depths were 9.2 mm and 8.8 mm and diameters were 12.3 mm and 12.1 mm in female and male groups, respectively. Between the genders, no statistically significant differences were found for any of the measurements. There were significantly higher values for the volume of sella turcica in males than in females (1102 ± 285.3 mm3 and 951.3 ± 278.5 mm3, respectively). Conclusions: The dimensions of sella turcica in healthy Caucasian adults with normal occlusion and facial appearance revealed nonsignificant differences between the genders. Individual variability in dimensions and gender differences in the volume are of importance in comparison of patients with craniofacial syndromes and aberrations. Knowledge concerning the dimensions and volume of sella turcica will be clinically relevant for a guidance to consciously realize pituitary disorders

    Evaluation of the anatomical measurements of the temporomandibular joint by cone-beam computed tomography

    Get PDF
    Background: To examine the detailed anatomy of the normal temporomandibular joint (TMJ) in a large series of patients divided into different age groups.  Materials and methods: Cone-beam computed tomography images of 100 patients included in the study. Morphometric analysis regarding mandibular con- dyle and mandibular fossa, articular tubercle and the zygomatic arch was done. The volumetric and surface measurements of mandibular condyles (total tissue volume [TV], total bone volume [BV], bone surface area [BS] and percentage of bony tissue of the mandibular condyle [BV/TV]) were also measured.  Results: Statistical analysis was performed and statistically significant differences according to the side of the joint, sex, and age groups were reported. Additionally, correlations between aging and all of these parameters were also determined.  Conclusions: TV, BV, BS and BV/TV parameters according to side, age, and sex groups were defined for normal TMJ which may help to understand the onset and progress of TMJ disorders.

    Automatic de-identification of textual documents in the electronic health record: a review of recent research

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the United States, the Health Insurance Portability and Accountability Act (HIPAA) protects the confidentiality of patient data and requires the informed consent of the patient and approval of the Internal Review Board to use data for research purposes, but these requirements can be waived if data is de-identified. For clinical data to be considered de-identified, the HIPAA "Safe Harbor" technique requires 18 data elements (called PHI: Protected Health Information) to be removed. The de-identification of narrative text documents is often realized manually, and requires significant resources. Well aware of these issues, several authors have investigated automated de-identification of narrative text documents from the electronic health record, and a review of recent research in this domain is presented here.</p> <p>Methods</p> <p>This review focuses on recently published research (after 1995), and includes relevant publications from bibliographic queries in PubMed, conference proceedings, the ACM Digital Library, and interesting publications referenced in already included papers.</p> <p>Results</p> <p>The literature search returned more than 200 publications. The majority focused only on structured data de-identification instead of narrative text, on image de-identification, or described manual de-identification, and were therefore excluded. Finally, 18 publications describing automated text de-identification were selected for detailed analysis of the architecture and methods used, the types of PHI detected and removed, the external resources used, and the types of clinical documents targeted. All text de-identification systems aimed to identify and remove person names, and many included other types of PHI. Most systems used only one or two specific clinical document types, and were mostly based on two different groups of methodologies: pattern matching and machine learning. Many systems combined both approaches for different types of PHI, but the majority relied only on pattern matching, rules, and dictionaries.</p> <p>Conclusions</p> <p>In general, methods based on dictionaries performed better with PHI that is rarely mentioned in clinical text, but are more difficult to generalize. Methods based on machine learning tend to perform better, especially with PHI that is not mentioned in the dictionaries used. Finally, the issues of anonymization, sufficient performance, and "over-scrubbing" are discussed in this publication.</p

    Text Mining the History of Medicine

    Get PDF
    Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform
    • …
    corecore