19,388 research outputs found

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    PhenDisco: phenotype discovery system for the database of genotypes and phenotypes.

    Get PDF
    The database of genotypes and phenotypes (dbGaP) developed by the National Center for Biotechnology Information (NCBI) is a resource that contains information on various genome-wide association studies (GWAS) and is currently available via NCBI's dbGaP Entrez interface. The database is an important resource, providing GWAS data that can be used for new exploratory research or cross-study validation by authorized users. However, finding studies relevant to a particular phenotype of interest is challenging, as phenotype information is presented in a non-standardized way. To address this issue, we developed PhenDisco (phenotype discoverer), a new information retrieval system for dbGaP. PhenDisco consists of two main components: (1) text processing tools that standardize phenotype variables and study metadata, and (2) information retrieval tools that support queries from users and return ranked results. In a preliminary comparison involving 18 search scenarios, PhenDisco showed promising performance for both unranked and ranked search comparisons with dbGaP's search engine Entrez. The system can be accessed at http://pfindr.net

    The Portuguese Severe Asthma Registry: Development, Features, and Data Sharing Policies

    Get PDF
    The Portuguese Severe Asthma Registry (Registo de Asma Grave Portugal, RAG) was developed by an open collaborative network of asthma specialists. RAG collects data from adults and pediatric severe asthma patients that despite treatment optimization and adequate management of comorbidities require step 4/5 treatment according to GINA recommendations. In this paper, we describe the development and implementation of RAG, its features, and data sharing policies. The contents and structure of RAG were defined in a multistep consensus process. A pilot version was pretested and iteratively improved. The selection of data elements for RAG considered other severe asthma registries, aiming at characterizing the patient's clinical status whilst avoiding overloading the standard workflow of the clinical appointment. Features of RAG include automatic assessment of eligibility, easy data input, and exportable data in natural language that can be pasted directly in patients' electronic health record and security features to enable data sharing (among researchers and with other international databases) without compromising patients' confidentiality. RAG is a national web-based disease registry of severe asthma patients, available at asmagrave.pt. It allows prospective clinical data collection, promotes standardized care and collaborative clinical research, and may contribute to inform evidence-based healthcare policies for severe asthma.info:eu-repo/semantics/publishedVersio

    Knowledge will Propel Machine Understanding of Content: Extrapolating from Current Examples

    Full text link
    Machine Learning has been a big success story during the AI resurgence. One particular stand out success relates to learning from a massive amount of data. In spite of early assertions of the unreasonable effectiveness of data, there is increasing recognition for utilizing knowledge whenever it is available or can be created purposefully. In this paper, we discuss the indispensable role of knowledge for deeper understanding of content where (i) large amounts of training data are unavailable, (ii) the objects to be recognized are complex, (e.g., implicit entities and highly subjective content), and (iii) applications need to use complementary or related data in multiple modalities/media. What brings us to the cusp of rapid progress is our ability to (a) create relevant and reliable knowledge and (b) carefully exploit knowledge to enhance ML/NLP techniques. Using diverse examples, we seek to foretell unprecedented progress in our ability for deeper understanding and exploitation of multimodal data and continued incorporation of knowledge in learning techniques.Comment: Pre-print of the paper accepted at 2017 IEEE/WIC/ACM International Conference on Web Intelligence (WI). arXiv admin note: substantial text overlap with arXiv:1610.0770

    The ecology of suffering: developmental disorders of structured stress, emotion, and chronic inflammation

    Get PDF
    'Punctuated equilibrium' models of cognitive process, adapted from the Large Deviations Program of probability theory, are applied to the interaction between immune function and emotion in the context of culturally structured psychosocial stress. The analysis suggests: (1) Chronic inflammatory diseases should be comorbid and synergistic with characteristic emotional dysfunction, and may form a collection of joint disorders most effectively treated at the individual level using multifactorial 'mind/body' strategies. (2) Culturally constructed psychosocial stress can literally write an image of itself onto the punctuated etiology and progression of such composite disorders, beginning a trajectory to disease in utero or early childhood, and continuing throughout the life course, suggesting that, when moderated by 'social exposures', these are developmental disorders. (3) At the community level of organization, strategies for prevention and control of the spectrum of emotional/inflammatory developmental disorders must include redress of cross-sectional and logitudinal (i.e. historical) patterns of inequality and injustice which generate structured psychosocial stress. Evidence further suggests that within 'Westernized' or 'market economy' societies, such stress will inevitably entrain high as well as lower stutus subopulations into a unified ecology of suffering
    corecore