5 research outputs found

    Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

    Full text link
    PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

    Pengukuran Keserupaan Semantik Menggunakan TACSim dan Struktural Menggunakan Graph Edit Distance-Greedy pada Diagram Kasus Penggunaan

    Get PDF
    Unified Modeling Language (UML) merupakan sekumpulan diagram sebagai perancangan perangkat lunak. Terdapat beberapa jenis diagram UML, salah satunya diagram kasus penggunaan. Perancangan perangkat lunak sangat dibutuhkan dalam membangun sebuah sistem. Dalam dunia pendidikan khususnya mata pelajaran desain, seperti Rekayasa Perangkat Lunak, UML merupakan salah satu topik penting. Saat ini, dalam pembelajaran banyak dilakukan dengan sistem digital seperti e-learning. E-learning menyediakan banyak fitur pembelajaran, salah satunya fitur penilaian otomatis dalam bentuk soal dengan jawaban pilihan ganda, daftar pilihan, isian, dan uraian. Pada mata pelajaran kuliah Rekayasa Perangkat Lunak terdapat jawaban salah satunya berupa diagram kasus penggunaan, namun saat ini penilaian otomatis terhadap jawaban berbentuk diagram tersebut belum tersedia. Untuk itu dibutuhkan metode untuk penilaian otomatis terhadap bentuk jawaban tersebut. Penelitian ini mengajukan metode untuk menghitung keserupaan antar dua buah diagram kasus penggunaan yang memodelkan sistem yang sama. Pengukuran dilakukan dengan membandingkan antara jawaban dan kunci jawaban yang memperhatikan dua aspek pengukuran, yaitu semantik dan struktural. Pengukuran semantik dilakukan dengan membandingkan informasi leksikal pada dua buah diagram kasus penggunaan dalam bentuk graf dengan menggunakan metode Topology-Attributed Coupling Similarity (TACSim), sedangkan pengukuran struktural dilakukan dengan memodelkan diagram kasus penggunaan menjadi graf dan dilakukan perhitungan keserupaan dengan menggunakan penggabungan Graph Edit Distance dan algoritma Greedy. Hasil pengujian menunjukkan bahwa metode yang diusulkan untuk menghitung keserupaan antara dua buah kasus penggunaan dapat diandalkan seperti pengampu mata pelajaran Rekayasa Perangkat Lunak dalam melakukan penilaian jawaban siswa terhadap diagram kasus penggunaan. Nilai kesepakatan antara pengampu mata pelajaran dan metode yang diusulkan mencapai 0,86. Hal tersebut menunjukkan kesepakatan yang hampir sempurna (almost perfect agrement). ==================================================================================================== Unified Modeling Language (UML) is a collection of diagrams as software design. There are several types of UML diagrams, one of which is a use case diagram. Software design is needed in building a system. In the world of education especially design subjects, such as Software Engineering, UML is one of the important topics. Currently, learning is mostly done with digital systems such as e-learning. E-learning provides many learning features, one of which is an automatic assessment feature in the form of questions with multiple choice answers, list of choices, fields, and descriptions. In the subject of Software Engineering, there are answers to one of them in the form of use case diagrams, but at this time automatic assessment of answers in the form of diagrams is not yet available. This requires a method for automatic assessment of the form of the answer. This study proposes a method for calculating the similarity between two use case diagrams that model the same system. Measurements are made by comparing answers and key answers that pay attention to two aspects of measurement, namely semantics and structural. Semantic measurements are performed by comparing lexical information on two graphs of use case diagrams using the Topology-Attributed Coupling Similarity (TACSim) method, while structural measurements are carried out by modeling a use case diagram into a graph and simulating similarity using a Graph Edit Distance and Greedy algorithm. The results show that the proposed method for calculating the similarity between two use cases is as reliable as the software engineering subject in assessing students' answers to use case diagrams. The agreement value between subject teachers and the proposed method reaches 0.86. This shows an almost perfect agreement
    corecore