21,471 research outputs found

    A Study Into the Feasibility of Using Natural Language Processing and Machine Learning for the Identification of Alcohol Misuse in Trauma Patients

    Get PDF
    Alcohol misuse is a leading cause of premature death in the United States, with nearly a third of trauma patients found to have elevated blood alcohol levels upon admission. However, timely intervention has been shown to reduce this. It is thus important to be able to quickly screen patients to identify alcohol misuse. Many medical centers use standardized questionnaires to identify alcohol misuse, but since these instruments are not usually a part of routine care, there are many cases where it is not done. In this study, large quantities of notes were processed with natural language processing and machine learning methods to identify important social and behavioral determinants for health. It resulted in the creation of a system that provides good discrimination of patients with and without alcohol misuse

    Text Mining Infrastructure in R

    Get PDF
    During the last decade text mining has become a widely used discipline utilizing statistical and machine learning methods. We present the tm package which provides a framework for text mining applications within R. We give a survey on text mining facilities in R and explain how typical application tasks can be carried out using our framework. We present techniques for count-based analysis methods, text clustering, text classification and string kernels.

    Creating Data from Unstructured Text with Context Rule Assisted Machine Learning (CRAML)

    Get PDF
    Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, and expert classification of any documents with any scheme. To demonstrate this process for building data from text with Machine Learning, we publish open-source resources: the software, a new public document corpus, and a replicable analysis to build an interpretable classifier of suspected “no poach” clauses in franchise documents
    corecore