Experimental comparison of machine learning approaches to medical domains: a case study of genotype influence on oral cancer development
- Publication date
- Publisher
Abstract
Summarization: Research in medical domains is facing new challenges as the available information increases in quantity and quality. In this context, Machine Learning methodologies can provide the right tools for data analysis, which can cope with recurring problems in medical research, such as the integration of clinical and genetic data. In this study we provide an experimental comparison of an heterogeneous subset of Machine Learning methods. For such a purpose, a representative dataset for medical analysis was chosen which regards Head and Neck Squamous Cell Carcinoma (HNSCC). HNSCC is a kind of oral cancer associated with smoking and alcohol drinking habits; however the individual risk could be modified by genetic polymorphisms of enzymes involved in the metabolism of tobacco carcinogens and in the DNA repair mechanisms. To study this relationship, the data set comprised demographic and life-style (age, gender, smoke and alcohol), and genetic data (the individual genotype of 11 polymorphic genes), with the information on 124 HNSCC patients and 231 healthy controls. Strengths and weaknesses of the different algorithms when applied to medical datasets, such as the one considered, will be analyzed, with particular attention to the issue of missing values.Presented on