Search CORE

5 research outputs found

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Author: Acevedo Francisco
Armengol Victor Diego
Bao Yujia
Barzilay Regina
Braun Danielle
Deng Zhengyi
Hughes Kevin S
Kim Heeyoon
Ouardaoui Nofal
Parmigiani Giovanni
Wang Cathy
Wang Yan
Publication venue
Publication date: 24/04/2019
Field of study

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

arXiv.org e-Print Archive

DSpace@MIT

A natural language processing pipeline for pairing measurements uniquely across free-text CT reports

Author: Abajian
Al-Haddad
Andrea Cowhy
Berger
Breiman
Carletta
Chang
Danforth
Dang
Dreyer
Friedman
Friedman
Friedman
Hall
Hasegawa
Hripcsak
Hsu
Hsu
Jaffe
Jaffe
Jeffrey Bozeman
Kushner
Levy
Levy
Mabotuwana
Manning
Merlijn Sevenster
Reiner
Rubin
Savova
Sevenster
Sevenster
Sevenster
Sponsler
Sullivan
Taira
Therasse
Travis
Warner
William Trost
Zimmerman
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building

Author
Publication venue: BioMed Central
Publication date: 17/11/2016
Field of study

Springer - Publisher Connector

Introducing Explorer of Taxon Concepts with a case study on spider measurement matrix building

Author: A Hardisty
AE Thessen
AR Deans
BC WorkShop
Bertram Ludäscher
DG Howe
Dongfang Xu
DR Maddison
Eduardo M. Soto
F Huang
FM Labarque
H Cui
H Cui
H Cui
H Cui
H Cui
H Cui
Hong Cui
J Liu
JA Blake
JA Miller
James A. Macklin
JB Bowes
JL Salle
JP Balhoff
L Màrquez
M Palmer
M Sevenster
Martin Ramirez
MJ Ramírez
MM Wood
Nicolás Mongiardino Koch
O Uzuner
PC Sereno
RA Vos
Robert A. Morris
RW Kiger
S Aisen
S Soderland
Steven S. Chong
T Catapano
Thomas Rodenhausen
Y Bradford
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Pengukuran Keserupaan Semantik Menggunakan TACSim dan Struktural Menggunakan Graph Edit Distance-Greedy pada Diagram Kasus Penggunaan

Author: Zulfa Fatimatus
Publication venue
Publication date: 04/02/2021
Field of study

Unified Modeling Language (UML) merupakan sekumpulan diagram sebagai perancangan perangkat lunak. Terdapat beberapa jenis diagram UML, salah satunya diagram kasus penggunaan. Perancangan perangkat lunak sangat dibutuhkan dalam membangun sebuah sistem. Dalam dunia pendidikan khususnya mata pelajaran desain, seperti Rekayasa Perangkat Lunak, UML merupakan salah satu topik penting. Saat ini, dalam pembelajaran banyak dilakukan dengan sistem digital seperti e-learning. E-learning menyediakan banyak fitur pembelajaran, salah satunya fitur penilaian otomatis dalam bentuk soal dengan jawaban pilihan ganda, daftar pilihan, isian, dan uraian. Pada mata pelajaran kuliah Rekayasa Perangkat Lunak terdapat jawaban salah satunya berupa diagram kasus penggunaan, namun saat ini penilaian otomatis terhadap jawaban berbentuk diagram tersebut belum tersedia. Untuk itu dibutuhkan metode untuk penilaian otomatis terhadap bentuk jawaban tersebut. Penelitian ini mengajukan metode untuk menghitung keserupaan antar dua buah diagram kasus penggunaan yang memodelkan sistem yang sama. Pengukuran dilakukan dengan membandingkan antara jawaban dan kunci jawaban yang memperhatikan dua aspek pengukuran, yaitu semantik dan struktural. Pengukuran semantik dilakukan dengan membandingkan informasi leksikal pada dua buah diagram kasus penggunaan dalam bentuk graf dengan menggunakan metode Topology-Attributed Coupling Similarity (TACSim), sedangkan pengukuran struktural dilakukan dengan memodelkan diagram kasus penggunaan menjadi graf dan dilakukan perhitungan keserupaan dengan menggunakan penggabungan Graph Edit Distance dan algoritma Greedy. Hasil pengujian menunjukkan bahwa metode yang diusulkan untuk menghitung keserupaan antara dua buah kasus penggunaan dapat diandalkan seperti pengampu mata pelajaran Rekayasa Perangkat Lunak dalam melakukan penilaian jawaban siswa terhadap diagram kasus penggunaan. Nilai kesepakatan antara pengampu mata pelajaran dan metode yang diusulkan mencapai 0,86. Hal tersebut menunjukkan kesepakatan yang hampir sempurna (almost perfect agrement). ==================================================================================================== Unified Modeling Language (UML) is a collection of diagrams as software design. There are several types of UML diagrams, one of which is a use case diagram. Software design is needed in building a system. In the world of education especially design subjects, such as Software Engineering, UML is one of the important topics. Currently, learning is mostly done with digital systems such as e-learning. E-learning provides many learning features, one of which is an automatic assessment feature in the form of questions with multiple choice answers, list of choices, fields, and descriptions. In the subject of Software Engineering, there are answers to one of them in the form of use case diagrams, but at this time automatic assessment of answers in the form of diagrams is not yet available. This requires a method for automatic assessment of the form of the answer. This study proposes a method for calculating the similarity between two use case diagrams that model the same system. Measurements are made by comparing answers and key answers that pay attention to two aspects of measurement, namely semantics and structural. Semantic measurements are performed by comparing lexical information on two graphs of use case diagrams using the Topology-Attributed Coupling Similarity (TACSim) method, while structural measurements are carried out by modeling a use case diagram into a graph and simulating similarity using a Graph Edit Distance and Greedy algorithm. The results show that the proposed method for calculating the similarity between two use cases is as reliable as the software engineering subject in assessing students' answers to use case diagrams. The agreement value between subject teachers and the proposed method reaches 0.86. This shows an almost perfect agreement

ITS Repository