4 research outputs found
Design of Machine Learning Algorithms with Applications to Breast Cancer Detection
Machine learning is concerned with the design and development of algorithms and
techniques that allow computers to 'learn' from experience with respect to some class
of tasks and performance measure. One application of machine learning is to improve
the accuracy and efficiency of computer-aided diagnosis systems to assist physician,
radiologists, cardiologists, neuroscientists, and health-care technologists. This thesis
focuses on machine learning and the applications to breast cancer detection. Emphasis
is laid on preprocessing of features, pattern classification, and model selection.
Before the classification task, feature selection and feature transformation may be
performed to reduce the dimensionality of the features and to improve the classification
performance. Genetic algorithm (GA) can be employed for feature selection based
on different measures of data separability or the estimated risk of a chosen classifier.
A separate nonlinear transformation can be performed by applying kernel principal
component analysis and kernel partial least squares.
Different classifiers are proposed in this work: The SOM-RBF network combines
self-organizing maps (SOMs) and radial basis function (RBF) networks, with the RBF
centers set as the weight vectors of neurons from the competitive layer of a trained
SaM. The pairwise Rayleigh quotient (PRQ) classifier seeks one discriminating boundary
by maximizing an unconstrained optimization objective, named as the PRQ criterion,
formed with a set of pairwise const~aints instead of individual training samples.
The strict 2-surface proximal (S2SP) classifier seeks two proximal planes that are not
necessary parallel to fit the distribution of the samples in the original feature space or
a kernel-defined feature space, by ma-ximizing two strict optimization objectives with
a 'square of sum' optimization factor. Two variations of the support vector data description
(SVDD) with negative samples (NSVDD) are proposed by involving different
forms of slack vectors, which learn a closed spherically shaped boundary, named as the
supervised compact hypersphere (SCH), around a set of samples in the target class. \Ve
extend the NSVDDs to solve the multi-class classification problems based on distances
between the samples and the centers of the learned SCHs in a kernel-defined feature
space, using a combination of linear discriminant analysis and the nearest-neighbor rule.
The problem of model selection is studied to pick the best values of the hyperparameters
for a parametric classifier. To choose the optimal kernel or regularization
parameters of a classifier, we investigate different criteria, such as the validation error
estimate and the leave-out-out bound, as well as different optimization methods, such
as grid search, gradient descent, and GA. By viewing the tuning problem of the multiple
parameters of an 2-norm support vector machine (SVM) as an identification problem
of a nonlinear dynamic system, we design a tuning system by employing the extended
Kalman filter based on cross validation. Independent kernel optimization based on
different measures of data separability are a~so investigated for different kernel-based
classifiers.
Numerous computer experiments using the benchmark datasets verify the theoretical
results, make comparisons among the techniques in measures of classification
accuracy or area under the receiver operating characteristics curve. Computational
requirements, such as the computing time and the number of hyper-parameters, are
also discussed.
All of the presented methods are applied to breast cancer detection from fine-needle
aspiration and in mammograms, as well as screening of knee-joint vibroarthrographic
signals and automatic monitoring of roller bearings with vibration signals. Experimental
results demonstrate the excellence of these methods with improved classification
performance.
For breast cancer detection, instead of only providing a binary diagnostic decision
of 'malignant' or 'benign', we propose methods to assign a measure of confidence
of malignancy to an individual mass, by calculating probabilities of being benign and
malignant with a single classifier or a set of classifiers
Development of Features and Feature Reduction Techniques for Mammogram Classification
Breast cancer is one of the most widely recognized reasons for increased death rate among women. For reduction of the death rate due to breast cancer, early detection and treatment are of utmost necessity. Recent developments in digital mammography imaging systems have aimed to better diagnosis of abnormalities present in the breast. In the current scenario, mammography is an effectual and reliable method for an accurate detection of breast cancer. Digital mammograms are computerized X-ray images of breasts. Reading of mammograms is a crucial task for radiologists as they suggest patients for biopsy. It has been studied that radiologists report several interpretations for the same mammographic image. Thus, mammogram interpretation is a repetitive task that requires maximum attention for the avoidance of misinterpretation. Therefore, at present, Computer-Aided Diagnosis (CAD) system is exceptionally popular which analyzes the mammograms with the usage of image processing and pattern recognition techniques and classify them into several classes namely, malignant, benign, and normal. The CAD system recognizes the type of tissues automatically by collecting and analyzing significant features from mammographic images. In this thesis, the contributions aim at developing the new and useful features from mammograms for classification of the pattern of tissues. Additionally, some feature reduction techniques have been proposed to select the reduced set of significant features prior to classification. In this context, five different schemes have been proposed for extraction and selection of relevant features for subsequent classification. Using the relevant features, several classifiers are employed for classification of mammograms to derive an overall inference. Each scheme has been validated using two standard databases, namely MIAS and DDSM in isolation. The achieved results are very promising with respect to classification accuracy in comparison to the existing schemes and have been elaborated in each chapter. In Chapter 2, hybrid features are developed using Two-Dimensional Discrete Wavelet Transform (2D-DWT) and Gray-Level Co-occurrence Matrix (GLCM) in succession. Subsequently relevant features are selected using t-test. The resultant feature set is of substantially lower dimension. On application of various classifiers it is observed that Back-Propagation Neural Network (BPNN) gives better classification accuracy as compared to others. In Chapter 3, a Segmentation-based Fractal Texture Analysis (SFTA) is used to extract the texture features from the mammograms. A Fast Correlation-Based Filter (FCBF) method has been used to generate a significant feature subset. Among all classifiers, Support Vector Machine (SVM) results superior classification accuracy. In Chapter 4, Two-Dimensional Discrete Orthonormal S-Transform (2D-DOST) is used to extract the features from mammograms. A feature selection methodology based on null-hypothesis with statistical two-sample t-test method has been suggested to select most significant features. This feature with AdaBoost and Random Forest (AdaBoost-RF) classifier outperforms other classifierswith respect to accuracy. In Chapter 5, features are derived using Two-Dimensional Slantlet Transform (2D-SLT) from mammographic images. The most significant features are selected by utilizing the Bayesian Logistic Regression (BLogR) method. Utilizing these features, LogitBoost and Random Forest (LogitBoost-RF) classifier gives the better classification accuracy among all the classifiers. In Chapter 6, Fast Radial Symmetry Transform (FRST) is applied to mammographic images for derivation of radially symmetric features. A t-distributed Stochastic Neighbor Embedding (t-SNE) method has been utilized to select most relevant features. Using these features, classification experiments have been carried out through all the classifiers. A Logistic Model Tree (LMT) classifier achieves optimal results among all classifiers. An overall comparative analysis has also been made among all our suggested features and feature reduction techniques along with the corresponding classifier where they show superior results