32 research outputs found
Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes
PURPOSE: The medical literature relevant to germline genetics is growing
exponentially. Clinicians need tools monitoring and prioritizing the literature
to understand the clinical implications of the pathogenic genetic variants. We
developed and evaluated two machine learning models to classify abstracts as
relevant to the penetrance (risk of cancer for germline mutation carriers) or
prevalence of germline genetic mutations. METHODS: We conducted literature
searches in PubMed and retrieved paper titles and abstracts to create an
annotated dataset for training and evaluating the two machine learning
classification models. Our first model is a support vector machine (SVM) which
learns a linear decision rule based on the bag-of-ngrams representation of each
title and abstract. Our second model is a convolutional neural network (CNN)
which learns a complex nonlinear decision rule based on the raw title and
abstract. We evaluated the performance of the two models on the classification
of papers as relevant to penetrance or prevalence. RESULTS: For penetrance
classification, we annotated 3740 paper titles and abstracts and used 60% for
training the model, 20% for tuning the model, and 20% for evaluating the model.
The SVM model achieves 89.53% accuracy (percentage of papers that were
correctly classified) while the CNN model achieves 88.95 % accuracy. For
prevalence classification, we annotated 3753 paper titles and abstracts. The
SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 %
accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts
as relevant to penetrance or prevalence. By facilitating literature review,
this tool could help clinicians and researchers keep abreast of the burgeoning
knowledge of gene-cancer associations and keep the knowledge bases for clinical
decision support tools up to date
Perbaikan Workload Karyawan Melalui Fish Bone Diagram Pada Bagian Primary Maintenance Pt Djarum Kudus
Era Globalisasi menuntut semua pihak untuk melakukan perbaikan untuk memberikan hasil yang lebih baik. Perbaikan dapat dilakukan setelah mengidentifikasi dan menganalisa permasalahan yang sedang dihadapi. Salah satu perusahaan yang sering melakukan perbaikan di segala hal adalah PT. Djarum Kudus. Untuk melakukan identifikasi dan analisis, PT. Djarum menggunakan Key Performance Indicator (KPI) sebagai gambaran apa yang sedang dihadapi oleh organisasi. Gambaran ini berisi kondisi organisasi secara utuh. Objek penelitian yang diteliti adalah bagian Primary Maintenance PT. Djarum. Penelitian ini dilakukan untuk menindaklanjuti isu yang muncul berkaitan dengan workload karyawan di bagian Primary Maintenance PT. Djarum. Dari isu ini kemudian dibandingkan dengan gambaran yang diberikan oleh Key Performance Indicator. Berkaitan dengan penelitian ini, peneliti hanya mengambil data workload karyawan, apa saja yang berhubungan dengan workload, rumusan permasalahan, dan apa perbaikan yang dapat dilakukan untuk mengatasi masalah yang ada. Untuk melengkapi data yang ada, peneliti juga menggunakan data hasil observasi dan wawancara terhadap pihak terkait. Penelitian ini berupa penelitian deskriptif yang tidak berusaha menjelaskan seberapa besar pengaruh antara variabel-variabel yang ada. Perumusan masalah dalam penelitian ini menggunakan metode fish bone diagram / Ishikawa diagram / diagram tulang ikan. Diharapkan dengan menggunakan metode fish bone diagram ini dapat diketahui segala hal yang berkaitan dengan tinggirendahnya workload karyawan, karena satu masalah dapat disebabkan oleh banyak penyebab. Tujuan penelitian ini adalah untuk menghasilkan suatu perbaikan berkaitan dengan workload karyawan. Dengan workload yang tepat diharapkan hasil kerja karyawan Primary Maintenance PT. Djarum dapat maksimal
Total Recall, Language Processing, and Software Engineering
A broad class of software engineering problems can be generalized as the
"total recall problem". This short paper claims that identifying and exploring
total recall language processing problems in software engineering is an
important task with wide applicability.
To make that case, we show that by applying and adapting the state of the art
active learning and text mining, solutions of the total recall problem, can
help solve two important software engineering tasks: (a) supporting large
literature reviews and (b) identifying software security vulnerabilities.
Furthermore, we conjecture that (c) test case prioritization and (d) static
warning identification can also be categorized as the total recall problem.
The widespread applicability of "total recall" to software engineering
suggests that there exists some underlying framework that encompasses not just
natural language processing, but a wide range of important software engineering
tasks.Comment: 4 pages, 2 figures. Submitted to NL4SE@ESEC/FSE 201
Risk of bias reporting in the recent animal focal cerebral ischaemia literature
BACKGROUND: Findings from in vivo research may be less reliable where studies do not report measures to reduce risks of bias. The experimental stroke community has been at the forefront of implementing changes to improve reporting, but it is not known whether these efforts are associated with continuous improvements. Our aims here were firstly to validate an automated tool to assess risks of bias in published works, and secondly to assess the reporting of measures taken to reduce the risk of bias within recent literature for two experimental models of stroke. METHODS: We developed and used text analytic approaches to automatically ascertain reporting of measures to reduce risk of bias from full-text articles describing animal experiments inducing middle cerebral artery occlusion (MCAO) or modelling lacunar stroke. RESULTS: Compared with previous assessments, there were improvements in the reporting of measures taken to reduce risks of bias in the MCAO literature but not in the lacunar stroke literature. Accuracy of automated annotation of risk of bias in the MCAO literature was 86% (randomization), 94% (blinding) and 100% (sample size calculation); and in the lacunar stroke literature accuracy was 67% (randomization), 91% (blinding) and 96% (sample size calculation). DISCUSSION: There remains substantial opportunity for improvement in the reporting of animal research modelling stroke, particularly in the lacunar stroke literature. Further, automated tools perform sufficiently well to identify whether studies report blinded assessment of outcome, but improvements are required in the tools to ascertain whether randomization and a sample size calculation were reported
Automating Systematic Literature Reviews with Natural Language Processing and Text Mining: a Systematic Literature Review
Objectives: An SLR is presented focusing on text mining based automation of
SLR creation. The present review identifies the objectives of the automation
studies and the aspects of those steps that were automated. In so doing, the
various ML techniques used, challenges, limitations and scope of further
research are explained.
Methods: Accessible published literature studies that primarily focus on
automation of study selection, study quality assessment, data extraction and
data synthesis portions of SLR. Twenty-nine studies were analyzed.
Results: This review identifies the objectives of the automation studies,
steps within the study selection, study quality assessment, data extraction and
data synthesis portions that were automated, the various ML techniques used,
challenges, limitations and scope of further research.
Discussion: We describe uses of NLP/TM techniques to support increased
automation of systematic literature reviews. This area has attracted increase
attention in the last decade due to significant gaps in the applicability of TM
to automate steps in the SLR process. There are significant gaps in the
application of TM and related automation techniques in the areas of data
extraction, monitoring, quality assessment and data synthesis. There is thus a
need for continued progress in this area, and this is expected to ultimately
significantly facilitate the construction of systematic literature reviews