32 research outputs found

    Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

    Full text link
    PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

    Perbaikan Workload Karyawan Melalui Fish Bone Diagram Pada Bagian Primary Maintenance Pt Djarum Kudus

    Get PDF
    Era Globalisasi menuntut semua pihak untuk melakukan perbaikan untuk memberikan hasil yang lebih baik. Perbaikan dapat dilakukan setelah mengidentifikasi dan menganalisa permasalahan yang sedang dihadapi. Salah satu perusahaan yang sering melakukan perbaikan di segala hal adalah PT. Djarum Kudus. Untuk melakukan identifikasi dan analisis, PT. Djarum menggunakan Key Performance Indicator (KPI) sebagai gambaran apa yang sedang dihadapi oleh organisasi. Gambaran ini berisi kondisi organisasi secara utuh. Objek penelitian yang diteliti adalah bagian Primary Maintenance PT. Djarum. Penelitian ini dilakukan untuk menindaklanjuti isu yang muncul berkaitan dengan workload karyawan di bagian Primary Maintenance PT. Djarum. Dari isu ini kemudian dibandingkan dengan gambaran yang diberikan oleh Key Performance Indicator. Berkaitan dengan penelitian ini, peneliti hanya mengambil data workload karyawan, apa saja yang berhubungan dengan workload, rumusan permasalahan, dan apa perbaikan yang dapat dilakukan untuk mengatasi masalah yang ada. Untuk melengkapi data yang ada, peneliti juga menggunakan data hasil observasi dan wawancara terhadap pihak terkait. Penelitian ini berupa penelitian deskriptif yang tidak berusaha menjelaskan seberapa besar pengaruh antara variabel-variabel yang ada. Perumusan masalah dalam penelitian ini menggunakan metode fish bone diagram / Ishikawa diagram / diagram tulang ikan. Diharapkan dengan menggunakan metode fish bone diagram ini dapat diketahui segala hal yang berkaitan dengan tinggirendahnya workload karyawan, karena satu masalah dapat disebabkan oleh banyak penyebab. Tujuan penelitian ini adalah untuk menghasilkan suatu perbaikan berkaitan dengan workload karyawan. Dengan workload yang tepat diharapkan hasil kerja karyawan Primary Maintenance PT. Djarum dapat maksimal

    Total Recall, Language Processing, and Software Engineering

    Full text link
    A broad class of software engineering problems can be generalized as the "total recall problem". This short paper claims that identifying and exploring total recall language processing problems in software engineering is an important task with wide applicability. To make that case, we show that by applying and adapting the state of the art active learning and text mining, solutions of the total recall problem, can help solve two important software engineering tasks: (a) supporting large literature reviews and (b) identifying software security vulnerabilities. Furthermore, we conjecture that (c) test case prioritization and (d) static warning identification can also be categorized as the total recall problem. The widespread applicability of "total recall" to software engineering suggests that there exists some underlying framework that encompasses not just natural language processing, but a wide range of important software engineering tasks.Comment: 4 pages, 2 figures. Submitted to NL4SE@ESEC/FSE 201

    Risk of bias reporting in the recent animal focal cerebral ischaemia literature

    Get PDF
    BACKGROUND: Findings from in vivo research may be less reliable where studies do not report measures to reduce risks of bias. The experimental stroke community has been at the forefront of implementing changes to improve reporting, but it is not known whether these efforts are associated with continuous improvements. Our aims here were firstly to validate an automated tool to assess risks of bias in published works, and secondly to assess the reporting of measures taken to reduce the risk of bias within recent literature for two experimental models of stroke. METHODS: We developed and used text analytic approaches to automatically ascertain reporting of measures to reduce risk of bias from full-text articles describing animal experiments inducing middle cerebral artery occlusion (MCAO) or modelling lacunar stroke. RESULTS: Compared with previous assessments, there were improvements in the reporting of measures taken to reduce risks of bias in the MCAO literature but not in the lacunar stroke literature. Accuracy of automated annotation of risk of bias in the MCAO literature was 86% (randomization), 94% (blinding) and 100% (sample size calculation); and in the lacunar stroke literature accuracy was 67% (randomization), 91% (blinding) and 96% (sample size calculation). DISCUSSION: There remains substantial opportunity for improvement in the reporting of animal research modelling stroke, particularly in the lacunar stroke literature. Further, automated tools perform sufficiently well to identify whether studies report blinded assessment of outcome, but improvements are required in the tools to ascertain whether randomization and a sample size calculation were reported

    Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: a Systematic Literature Review

    Full text link
    Objectives: An SLR is presented focusing on text mining based automation of SLR creation. The present review identifies the objectives of the automation studies and the aspects of those steps that were automated. In so doing, the various ML techniques used, challenges, limitations and scope of further research are explained. Methods: Accessible published literature studies that primarily focus on automation of study selection, study quality assessment, data extraction and data synthesis portions of SLR. Twenty-nine studies were analyzed. Results: This review identifies the objectives of the automation studies, steps within the study selection, study quality assessment, data extraction and data synthesis portions that were automated, the various ML techniques used, challenges, limitations and scope of further research. Discussion: We describe uses of NLP/TM techniques to support increased automation of systematic literature reviews. This area has attracted increase attention in the last decade due to significant gaps in the applicability of TM to automate steps in the SLR process. There are significant gaps in the application of TM and related automation techniques in the areas of data extraction, monitoring, quality assessment and data synthesis. There is thus a need for continued progress in this area, and this is expected to ultimately significantly facilitate the construction of systematic literature reviews
    corecore