5 research outputs found
Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes
PURPOSE: The medical literature relevant to germline genetics is growing
exponentially. Clinicians need tools monitoring and prioritizing the literature
to understand the clinical implications of the pathogenic genetic variants. We
developed and evaluated two machine learning models to classify abstracts as
relevant to the penetrance (risk of cancer for germline mutation carriers) or
prevalence of germline genetic mutations. METHODS: We conducted literature
searches in PubMed and retrieved paper titles and abstracts to create an
annotated dataset for training and evaluating the two machine learning
classification models. Our first model is a support vector machine (SVM) which
learns a linear decision rule based on the bag-of-ngrams representation of each
title and abstract. Our second model is a convolutional neural network (CNN)
which learns a complex nonlinear decision rule based on the raw title and
abstract. We evaluated the performance of the two models on the classification
of papers as relevant to penetrance or prevalence. RESULTS: For penetrance
classification, we annotated 3740 paper titles and abstracts and used 60% for
training the model, 20% for tuning the model, and 20% for evaluating the model.
The SVM model achieves 89.53% accuracy (percentage of papers that were
correctly classified) while the CNN model achieves 88.95 % accuracy. For
prevalence classification, we annotated 3753 paper titles and abstracts. The
SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 %
accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts
as relevant to penetrance or prevalence. By facilitating literature review,
this tool could help clinicians and researchers keep abreast of the burgeoning
knowledge of gene-cancer associations and keep the knowledge bases for clinical
decision support tools up to date
A natural language processing pipeline for pairing measurements uniquely across free-text CT reports
Pengukuran Keserupaan Semantik Menggunakan TACSim dan Struktural Menggunakan Graph Edit Distance-Greedy pada Diagram Kasus Penggunaan
Unified Modeling Language (UML) merupakan sekumpulan diagram sebagai perancangan perangkat lunak. Terdapat beberapa jenis diagram UML, salah satunya diagram kasus penggunaan. Perancangan perangkat lunak sangat dibutuhkan dalam membangun sebuah sistem. Dalam dunia pendidikan khususnya mata pelajaran desain, seperti Rekayasa Perangkat Lunak, UML merupakan salah satu topik penting. Saat ini, dalam pembelajaran banyak dilakukan dengan sistem digital seperti e-learning. E-learning menyediakan banyak fitur pembelajaran, salah satunya fitur penilaian otomatis dalam bentuk soal dengan jawaban pilihan ganda, daftar pilihan, isian, dan uraian. Pada mata pelajaran kuliah Rekayasa Perangkat Lunak terdapat jawaban salah satunya berupa diagram kasus penggunaan, namun saat ini penilaian otomatis terhadap jawaban berbentuk diagram tersebut belum tersedia. Untuk itu dibutuhkan metode untuk penilaian otomatis terhadap bentuk jawaban tersebut.
Penelitian ini mengajukan metode untuk menghitung keserupaan antar dua buah diagram kasus penggunaan yang memodelkan sistem yang sama. Pengukuran dilakukan dengan membandingkan antara jawaban dan kunci jawaban yang memperhatikan dua aspek pengukuran, yaitu semantik dan struktural. Pengukuran semantik dilakukan dengan membandingkan informasi leksikal pada dua buah diagram kasus penggunaan dalam bentuk graf dengan menggunakan metode Topology-Attributed Coupling Similarity (TACSim), sedangkan pengukuran struktural dilakukan dengan memodelkan diagram kasus penggunaan menjadi graf dan dilakukan perhitungan keserupaan dengan menggunakan penggabungan Graph Edit Distance dan algoritma Greedy.
Hasil pengujian menunjukkan bahwa metode yang diusulkan untuk menghitung keserupaan antara dua buah kasus penggunaan dapat diandalkan seperti pengampu mata pelajaran Rekayasa Perangkat Lunak dalam melakukan penilaian jawaban siswa terhadap diagram kasus penggunaan. Nilai kesepakatan antara pengampu mata pelajaran dan metode yang diusulkan mencapai 0,86. Hal tersebut menunjukkan kesepakatan yang hampir sempurna (almost perfect agrement).
====================================================================================================
Unified Modeling Language (UML) is a collection of diagrams as software design. There are several types of UML diagrams, one of which is a use case diagram. Software design is needed in building a system. In the world of education especially design subjects, such as Software Engineering, UML is one of the important topics. Currently, learning is mostly done with digital systems such as e-learning. E-learning provides many learning features, one of which is an automatic assessment feature in the form of questions with multiple choice answers, list of choices, fields, and descriptions. In the subject of Software Engineering, there are answers to one of them in the form of use case diagrams, but at this time automatic assessment of answers in the form of diagrams is not yet available. This requires a method for automatic assessment of the form of the answer.
This study proposes a method for calculating the similarity between two use case diagrams that model the same system. Measurements are made by comparing answers and key answers that pay attention to two aspects of measurement, namely semantics and structural. Semantic measurements are performed by comparing lexical information on two graphs of use case diagrams using the Topology-Attributed Coupling Similarity (TACSim) method, while structural measurements are carried out by modeling a use case diagram into a graph and simulating similarity using a Graph Edit Distance and Greedy algorithm.
The results show that the proposed method for calculating the similarity between two use cases is as reliable as the software engineering subject in assessing students' answers to use case diagrams. The agreement value between subject teachers and the proposed method reaches 0.86. This shows an almost perfect agreement