Search CORE

128,111 research outputs found

Using Machine Learning and Natural Language Processing to Review and Classify the Medical Literature on Cancer Susceptibility Genes

Author: Acevedo Francisco
Armengol Victor Diego
Bao Yujia
Barzilay Regina
Braun Danielle
Deng Zhengyi
Hughes Kevin S
Kim Heeyoon
Ouardaoui Nofal
Parmigiani Giovanni
Wang Cathy
Wang Yan
Publication venue
Publication date: 24/04/2019
Field of study

PURPOSE: The medical literature relevant to germline genetics is growing exponentially. Clinicians need tools monitoring and prioritizing the literature to understand the clinical implications of the pathogenic genetic variants. We developed and evaluated two machine learning models to classify abstracts as relevant to the penetrance (risk of cancer for germline mutation carriers) or prevalence of germline genetic mutations. METHODS: We conducted literature searches in PubMed and retrieved paper titles and abstracts to create an annotated dataset for training and evaluating the two machine learning classification models. Our first model is a support vector machine (SVM) which learns a linear decision rule based on the bag-of-ngrams representation of each title and abstract. Our second model is a convolutional neural network (CNN) which learns a complex nonlinear decision rule based on the raw title and abstract. We evaluated the performance of the two models on the classification of papers as relevant to penetrance or prevalence. RESULTS: For penetrance classification, we annotated 3740 paper titles and abstracts and used 60% for training the model, 20% for tuning the model, and 20% for evaluating the model. The SVM model achieves 89.53% accuracy (percentage of papers that were correctly classified) while the CNN model achieves 88.95 % accuracy. For prevalence classification, we annotated 3753 paper titles and abstracts. The SVM model achieves 89.14% accuracy while the CNN model achieves 89.13 % accuracy. CONCLUSION: Our models achieve high accuracy in classifying abstracts as relevant to penetrance or prevalence. By facilitating literature review, this tool could help clinicians and researchers keep abreast of the burgeoning knowledge of gene-cancer associations and keep the knowledge bases for clinical decision support tools up to date

arXiv.org e-Print Archive

DSpace@MIT

ALT-C 2010 - Conference Introduction and Abstracts

Author: Blackey Hayden
Habib Laurence
Jefferies Amanda
Johnson Mark
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 10/08/2010
Field of study

ALT Open Access Repository

Simulation of site-specific irrigation control strategies with sparse input data

Author: Hancock Nigel
McCarthy Alison
Raine Steven R.
Publication venue: 'Canadian Society for Bioengineering'
Publication date: 01/06/2010
Field of study

Crop and irrigation water use efficiencies may be improved by managing irrigation application timing and volumes using physical and agronomic principles. However, the crop water requirement may be spatially variable due to different soil properties and genetic variations in the crop across the field. Adaptive control strategies can be used to locally control water applications in response to in-field temporal and spatial variability with the aim of maximising both crop development and water use efficiency. A simulation framework ‘VARIwise’ has been created to aid the development, evaluation and management of spatially and temporally varied adaptive irrigation control strategies (McCarthy et al., 2010). VARIwise enables alternative control strategies to be simulated with different crop and environmental conditions and at a range of spatial resolutions. An iterative learning controller and model predictive controller have been implemented in VARIwise to improve the irrigation of cotton. The iterative learning control strategy involves using the soil moisture response to the previous irrigation volume to adjust the applied irrigation volume applied at the next irrigation event. For field implementation this controller has low data requirements as only soil moisture data is required after each irrigation event. In contrast, a model predictive controller has high data requirements as measured soil and plant data are required at a high spatial resolution in a field implementation. Model predictive control involves using a calibrated model to determine the irrigation application and/or timing which results in the highest predicted yield or water use efficiency. The implementation of these strategies is described and a case study is presented to demonstrate the operation of the strategies with various levels of data availability. It is concluded that in situations of sparse data, the iterative learning controller performs significantly better than a model predictive controller

University of Southern Queensland ePrints