Search CORE

17,769 research outputs found

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Author: Acevedo Francisco
Armengol Victor Diego
Bao Yujia
Barzilay Regina
Braun Danielle
Deng Zhengyi
Hughes Kevin S
Kim Heeyoon
Ouardaoui Nofal
Parmigiani Giovanni
Wang Cathy
Wang Yan
Publication venue
Publication date: 24/04/2019
Field of study

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

arXiv.org e-Print Archive

DSpace@MIT

Recommended from our members

Cancer Informatics for Cancer Centers (CI4CC): Building a Community Focused on Sharing Ideas and Best Practices to Improve Cancer Care and Patient Outcomes.

Author: Barnholtz-Sloan Jill S
Basu Amrita
Borowsky Alexander D
Bui Alex
DiGiovanna Jack
Garcia-Closas Montserrat
Genkinger Jeanine M
Gerke Travis
Induni Marta
Kibbe Warren A
Lacey James V, Jr
Mirel Lisa
Nadaf Sorena
Permuth Jennifer B
Rollison Dana E
Saltz Joel
Shenkman Elizabeth A
Ulrich Cornelia M
Zheng W Jim
Publication venue: eScholarship, University of California
Publication date: 01/02/2020
Field of study

Cancer Informatics for Cancer Centers (CI4CC) is a grassroots, nonprofit 501c3 organization intended to provide a focused national forum for engagement of senior cancer informatics leaders, primarily aimed at academic cancer centers anywhere in the world but with a special emphasis on the 70 National Cancer Institute-funded cancer centers. Although each of the participating cancer centers is structured differently, and leaders' titles vary, we know firsthand there are similarities in both the issues we face and the solutions we achieve. As a consortium, we have initiated a dedicated listserv, an open-initiatives program, and targeted biannual face-to-face meetings. These meetings are a place to review our priorities and initiatives, providing a forum for discussion of the strategic and pragmatic issues we, as informatics leaders, individually face at our respective institutions and cancer centers. Here we provide a brief history of the CI4CC organization and meeting highlights from the latest CI4CC meeting that took place in Napa, California from October 14-16, 2019. The focus of this meeting was "intersections between informatics, data science, and population science." We conclude with a discussion on "hot topics" on the horizon for cancer informatics

eScholarship - University of California

Machine learning classification of surgical pathology reports and chunk recognition for information extraction noise reduction

Author: Gavin Anna T.
Hamilton Peter
Marshall Adele
Napolitano Giulio
Publication venue: 'Elsevier BV'
Publication date: 01/06/2016
Field of study

Queen's University Belfast Research Portal

Natural Language Processing of Clinical Notes on Chronic Diseases: Systematic Review

Author: Dudley Joel T
Lavelli Alberto
Miotto Riccardo
Osmani Venet
Rinaldi Fabio
Sheikhalishahi Seyedmostafa
Publication venue
Publication date: 01/04/2019
Field of study

Novel approaches that complement and go beyond evidence-based medicine are required in the domain of chronic diseases, given the growing incidence of such conditions on the worldwide population. A promising avenue is the secondary use of electronic health records (EHRs), where patient data are analyzed to conduct clinical and translational research. Methods based on machine learning to process EHRs are resulting in improved understanding of patient clinical trajectories and chronic disease risk prediction, creating a unique opportunity to derive previously unknown clinical insights. However, a wealth of clinical histories remains locked behind clinical narratives in free-form text. Consequently, unlocking the full potential of EHR data is contingent on the development of natural language processing (NLP) methods to automatically transform clinical text into structured clinical data that can guide clinical decisions and potentially delay or prevent disease onset

arXiv.org e-Print Archive

Archivio della ricerca - Fondazione Bruno Kessler

ZORA

Doctor of Philosophy

Author: AAlAbdulsalam Abdulrahman Khalifa
Publication venue: University of Utah
Publication date: 01/01/2018
Field of study

dissertationThe primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level

The University of Utah: J. Willard Marriott Digital Library

A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system

Author: Boulin Hou
Jiajia Hu
Ligang Luo
Liping Li
Lue Ping Zhao
Tianze Zhang
Xiaozhe Wang
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Springer - Publisher Connector