Search CORE

805 research outputs found

Bagging linear sparse Bayesian learning models for variable selection in cancer diagnosis

Author: Arus Carles
Devos Andy
Lu Chuan
Suykens Johan A. K.
Van Huffel Sabine
Publication venue
Publication date: 01/05/2007
Field of study

Gene set based ensemble methods for cancer classification

Author: Duncan William Evans
Publication venue: LSU Digital Commons
Publication date: 01/01/2013
Field of study

Diagnosis of cancer very often depends on conclusions drawn after both clinical and microscopic examinations of tissues to study the manifestation of the disease in order to place tumors in known categories. One factor which determines the categorization of cancer is the tissue from which the tumor originates. Information gathered from clinical exams may be partial or not completely predictive of a specific category of cancer. Further complicating the problem of categorizing various tumors is that the histological classification of the cancer tissue and description of its course of development may be atypical. Gene expression data gleaned from micro-array analysis provides tremendous promise for more accurate cancer diagnosis. One hurdle in the classification of tumors based on gene expression data is that the data space is ultra-dimensional with relatively few points; that is, there are a small number of examples with a large number of genes. A second hurdle is expression bias caused by the correlation of genes. Analysis of subsets of genes, known as gene set analysis, provides a mechanism by which groups of differentially expressed genes can be identified. We propose an ensemble of classifiers whose base classifiers are ℓ1-regularized logistic regression models with restriction of the feature space to biologically relevant genes. Some researchers have already explored the use of ensemble classifiers to classify cancer but the effect of the underlying base classifiers in conjunction with biologically-derived gene sets on cancer classification has not been explored

Louisiana State University

An adaptive ensemble learner function via bagging and rank aggregation with applications to high dimensional data.

Author: Shah Jasmit SureshKumar
Publication venue: ThinkIR: The University of Louisville\u27s Institutional Repository
Publication date: 01/08/2011
Field of study

An ensemble consists of a set of individual predictors whose predictions are combined. Generally, different classification and regression models tend to work well for different types of data and also, it is usually not know which algorithm will be optimal in any given application. In this thesis an ensemble regression function is presented which is adapted from Datta et al. 2010. The ensemble function is constructed by combining bagging and rank aggregation that is capable of changing its performance depending on the type of data that is being used. In the classification approach, the results can be optimized with respect to performance measures such as accuracy, sensitivity, specificity and area under the curve (AUC) whereas in the regression approach, it can be optimized with respect to measures such as mean square error and mean absolute error. The ensemble classifier and ensemble regressor performs at the level of the best individual classifier or regression model. For complex high-dimensional datasets, it may be advisable to combine a number of classification algorithms or regression algorithms rather than using one specific algorithm

University of Louisville

Inferring gene regulatory networks using ensembles of feature selection techniques

Author: Demeester Piet
Dhaene Tom
Geurts Pierre
Huynh-thu Vân anh
Ruyssinck Joeri
Saeys Yvan
Publication venue
Publication date: 01/01/2012
Field of study

Ghent University Academic Bibliography

Breast Cancer Classification: Features Investigation using Machine Learning Approaches

Author: Ahmad Norulhusna
Mashudi Nurul Amirah
Mohd Noor Norliza
Rossli Syaidathul Amaleena
Publication venue: 'Penerbit UTHM'
Publication date: 20/05/2021
Field of study

Breast cancer is the second most common cancer after lung cancer and one of the main causes of death worldwide. Women have a higher risk of breast cancer as compared to men. Thus, one of the early diagnosis with an accurate and reliable system is critical in breast cancer treatment. Machine learning techniques are well known and popular among researchers, especially for classification and prediction. An investigation was conducted to evaluate the performance of breast cancer classification for malignant tumors and benign tumors using various machine learning techniques, namely k-Nearest Neighbors (k-NN), Random Forest, and Support Vector Machine (SVM) and ensemble techniques to compute the prediction of the breast cancer survival by implementing 10-fold cross validation. This study used a dataset obtained from Wisconsin Diagnostic Breast Cancer (WDBC) with 23 selected features measured from 569 patients, from which 212 patients have malignant tumors and 357 patients have benign tumors. The analysis was performed to investigate the feature of the tumors based on its mean, standard error, and worst. Each feature has ten properties which are radius, texture, perimeter, area, smoothness, compactness, concavity, concave, symmetry and fractal dimensions. The selection of features was considered a significant influence to the breast cancer. The analysis is compared and evaluated with thirty features to determine the features used for breast cancer classification. The result shown AdaBoost has obtained the highest accuracy for thirty features at 98.95%, ten features of mean at 98.07%, and ten features of worst at 98.77% with a lowest error rate. Additionally, the proposed methods are classified using 2-fold, 3-fold, and 5-fold cross validation to meet the best accuracy rate. Comparison results between all methods show that AdaBoost ensemble methods gave the highest accuracy at 98.77% for 10-fold cross validation, while 2-fold and 3-fold cross validation at 98.41% and 98.24%, respectively. Nevertheless, the result with 5-fold cross validation shows SVM produced the best accuracy rate at 98.60% with the lowest error rate

Journals of Universiti Tun Hussein Onn Malaysia (UTHM)

International Journal of Integrated Engineering

Likelihood Adaptively Modified Penalties

Author: Feng Yang
Li Tengfei
Ying Zhiliang
Publication venue
Publication date: 22/08/2013
Field of study

A new family of penalty functions, adaptive to likelihood, is introduced for model selection in general regression models. It arises naturally through assuming certain types of prior distribution on the regression parameters. To study stability properties of the penalized maximum likelihood estimator, two types of asymptotic stability are defined. Theoretical properties, including the parameter estimation consistency, model selection consistency, and asymptotic stability, are established under suitable regularity conditions. An efficient coordinate-descent algorithm is proposed. Simulation results and real data analysis show that the proposed method has competitive performance in comparison with existing ones.Comment: 42 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX