236 research outputs found

    One-Class Classification: Taxonomy of Study and Review of Techniques

    Full text link
    One-class classification (OCC) algorithms aim to build classification models when the negative class is either absent, poorly sampled or not well defined. This unique situation constrains the learning of efficient classifiers by defining class boundary just with the knowledge of positive class. The OCC problem has been considered and applied under many research themes, such as outlier/novelty detection and concept learning. In this paper we present a unified view of the general problem of OCC by presenting a taxonomy of study for OCC problems, which is based on the availability of training data, algorithms used and the application domains applied. We further delve into each of the categories of the proposed taxonomy and present a comprehensive literature review of the OCC algorithms, techniques and methodologies with a focus on their significance, limitations and applications. We conclude our paper by discussing some open research problems in the field of OCC and present our vision for future research.Comment: 24 pages + 11 pages of references, 8 figure

    ENSEMBLE CLASSIFICATION BASED MICROARRAY GENE RETRIEVAL SYSTEM

    Get PDF
    Data mining plays an important role in the process of classifying between the normal and the cancerous samples by utilizing microarray gene data. As this classification process is related to the human lives, greater sensitivity and specificity rates are mandatory. Taking this challenge into account, this work presents a technique to classify between the normal and cancerous samples by means of efficient feature selection and classification. The process of feature selection is achieved by Information Gain Ratio (IGR) and the selected features are forwarded to the classification process, which is achieved by ensemble classification. The classifiers being employed to attain ensemble classification are k-Nearest Neighbour (k-NN), Support Vector Machine (SVM) and Extreme Learning Machine (ELM). The performance of the proposed approach is analysed with respect to three different datasets such as Leukemia, Colon and Breast cancer in terms of accuracy, sensitivity and specificity. The experimental results prove that the proposed work shows better results, when compared to the existing techniques

    Clustering in Conjunction with Wrapper Approach to Select Discriminatory Genes for Microarray Dataset Classification

    Get PDF
    With the advent of microarray technology, it is possible to measure gene expression levels of thousands of genes simultaneously. This helps us diagnose and classify some particular cancers directly using DNA microarray. High-dimensionality and small sample size of microarray datasets has made the task of classification difficult. These datasets contain a large number of redundant and irrelevant genes. For efficient classification of samples there is a need of selecting a smaller set of relevant and non-redundant genes. In this paper, we have proposed a two stage algorithm for finding a set of discriminatory genes responsible for classification of high dimensional microarray datasets. In the first stage redundancy is reduced by grouping correlated genes into clusters and selecting a representative gene from each cluster. Maximal information compression index is used to measure similarity between genes. In the second stage a wrapper based forward feature selection method is used to obtain a set of discriminatory genes for a given classifier. We have investigated three different techniques for clustering and four classifiers in our experiments. The proposed algorithm is tested on six well known publicly available datasets. Comparison with the other state-of-the-art methods show that our proposed algorithm is able to achieve better classification accuracy with less number of genes

    Using spectral imaging for the analysis of abnormalities for colorectal cancer: When is it helpful?

    Get PDF
    © 2018 Awan et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. The spectral imaging technique has been shown to provide more discriminative information than the RGB images and has been proposed for a range of problems. There are many studies demonstrating its potential for the analysis of histopathology images for abnormality detection but there have been discrepancies among previous studies as well. Many multispectral based methods have been proposed for histopathology images but the significance of the use of whole multispectral cube versus a subset of bands or a single band is still arguable. We performed comprehensive analysis using individual bands and different subsets of bands to determine the effectiveness of spectral information for determining the anomaly in colorectal images. Our multispectral colorectal dataset consists of four classes, each represented by infra-red spectrum bands in addition to the visual spectrum bands. We performed our analysis of spectral imaging by stratifying the abnormalities using both spatial and spectral information. For our experiments, we used a combination of texture descriptors with an ensemble classification approach that performed best on our dataset. We applied our method to another dataset and got comparable results with those obtained using the state-of-the-art method and convolutional neural network based method. Moreover, we explored the relationship of the number of bands with the problem complexity and found that higher number of bands is required for a complex task to achieve improved performance. Our results demonstrate a synergy between infra-red and visual spectrum by improving the classification accuracy (by 6%) on incorporating the infra-red representation. We also highlight the importance of how the dataset should be divided into training and testing set for evaluating the histopathology image-based approaches, which has not been considered in previous studies on multispectral histopathology images.This publication was made possible using a grant from the Qatar National Research Fund through National Priority Research Program (NPRP) No. 6-249-1-053. The content of this publication are solely the responsibility of the authors and do not necessarily represent the official views of the Qatar National Research Fund or Qatar University

    Histopathological image analysis : a review

    Get PDF
    Over the past decade, dramatic increases in computational power and improvement in image analysis algorithms have allowed the development of powerful computer-assisted analytical approaches to radiological data. With the recent advent of whole slide digital scanners, tissue histopathology slides can now be digitized and stored in digital image form. Consequently, digitized tissue histopathology has now become amenable to the application of computerized image analysis and machine learning techniques. Analogous to the role of computer-assisted diagnosis (CAD) algorithms in medical imaging to complement the opinion of a radiologist, CAD algorithms have begun to be developed for disease detection, diagnosis, and prognosis prediction to complement the opinion of the pathologist. In this paper, we review the recent state of the art CAD technology for digitized histopathology. This paper also briefly describes the development and application of novel image analysis technology for a few specific histopathology related problems being pursued in the United States and Europe

    An effective method for lung cancer diagnosis from CT scan using deep learning-based support vector network

    Get PDF
    Producción CientíficaThe diagnosis of early-stage lung cancer is challenging due to its asymptomatic nature, especially given the repeated radiation exposure and high cost of computed tomography(CT). Examining the lung CT images to detect pulmonary nodules, especially the cell lung cancer lesions, is also tedious and prone to errors even by a specialist. This study proposes a cancer diagnostic model based on a deep learning-enabled support vector machine (SVM). The proposed computer-aided design (CAD) model identifies the physiological and pathological changes in the soft tissues of the cross-section in lung cancer lesions. The model is first trained to recognize lung cancer by measuring and comparing the selected profile values in CT images obtained from patients and control patients at their diagnosis. Then, the model is tested and validated using the CT scans of both patients and control patients that are not shown in the training phase. The study investigates 888 annotated CT scans from the publicly available LIDC/IDRI database. The proposed deep learning-assisted SVM-based model yields 94% accuracy for pulmonary nodule detection representing early-stage lung cancer. It is found superior to other existing methods including complex deep learning, simple machine learning, and the hybrid techniques used on lung CT images for nodule detection. Experimental results demonstrate that the proposed approach can greatly assist radiologists in detecting early lung cancer and facilitating the timely management of patients

    An ensemble of classifiers with genetic algorithmBased Feature Selection

    Full text link
    Different data classification algorithms have been developed and applied in various areas to analyze and extract valuable information and patterns from large datasets with noise and missing values. However, none of them could consistently perform well over all datasets. To this end, ensemble methods have been suggested as the promising measures. This paper proposes a novel hybrid algorithm, which is the combination of a multi-objective Genetic Algorithm (GA) and an ensemble classifier. While the ensemble classifier, which consists of a decision tree classifier, an Artificial Neural Network (ANN) classifier, and a Support Vector Machine (SVM) classifier, is used as the classification committee, the multi-objective Genetic Algorithm is employed as the feature selector to facilitate the ensemble classifier to improve the overall sample classification accuracy while also identifying the most important features in the dataset of interest. The proposed GA-Ensemble method is tested on three benchmark datasets, and compared with each individual classifier as well as the methods based on mutual information theory, bagging and boosting. The results suggest that this GA-Ensemble method outperform other algorithms in comparison, and be a useful method for classification and feature selection problems.<br /
    • …
    corecore