7 research outputs found

    Clasificación de prescripciones médicas en español

    Get PDF
    El siguiente trabajo describe la problemática de la clasificación de textos médicos libres en español. Y propone una solución basada en los algoritmos de clasificación de texto: Naïve Bayes Multinomial (NBM) y Support Vector Machines (SVMs) justificando dichas decisiones y mostrando los resultados obtenidos con ambos métodos.Eje: XV Workshop de Agentes y Sistemas InteligentesRed de Universidades con Carreras de Informática (RedUNCI

    On Multilabel Classification Methods of Incompletely Labeled Biomedical Text Data

    Get PDF
    Multilabel classification is often hindered by incompletely labeled training datasets; for some items of such dataset (or even for all of them) some labels may be omitted. In this case, we cannot know if any item is labeled fully and correctly. When we train a classifier directly on incompletely labeled dataset, it performs ineffectively. To overcome the problem, we added an extra step, training set modification, before training a classifier. In this paper, we try two algorithms for training set modification: weighted k-nearest neighbor (WkNN) and soft supervised learning (SoftSL). Both of these approaches are based on similarity measurements between data vectors. We performed the experiments on AgingPortfolio (text dataset) and then rechecked on the Yeast (nontext genetic data). We tried SVM and RF classifiers for the original datasets and then for the modified ones. For each dataset, our experiments demonstrated that both classification algorithms performed considerably better when preceded by the training set modification step

    A Roller Bearing Fault Diagnosis Method Based on LCD Energy Entropy and ACROA-SVM

    Get PDF

    Recognition of Multiple Imbalanced Cancer Types Based on DNA Microarray Data Using Ensemble Classifiers

    Get PDF

    A Probabilistic Multi-Objective Artificial Bee Colony Algorithm for Gene Selection

    Get PDF
    Microarray technology is widely used to report gene expression data. The inclusion of many features and few samples is one of the characteristic features of this platform. In order to define significant genes for a particular disease, the problem of high-dimensionality microarray data should be overcome. The Artificial Bee Colony (ABC) Algorithm is a successful meta-heuristic algorithm that solves optimization problems effectively. In this paper, we propose a hybrid gene selection method for discriminatively selecting genes. We propose a new probabilistic binary Artificial Bee Colony Algorithm, namely PrBABC, that is hybridized with three different filter methods. The proposed method is applied to nine microarray datasets in order to detect distinctive genes for classifying cancer data. Results are compared with other wellknown meta-heuristic algorithms: Binary Differential Evolution Algorithm (BinDE), Binary Particle Swarm Optimization Algorithm (BinPSO), and Genetic Algorithm (GA), as well as with other methods in the literature. Experimental results show that the probabilistic self-adaptive learning strategy integrated into the employed-bee phase can boost classification accuracy with a minimal number of genes

    A Novel Weighted Support Vector Machine Based on Particle Swarm Optimization for Gene Selection and Tumor Classification

    Get PDF
    We develop a detection model based on support vector machines (SVMs) and particle swarm optimization (PSO) for gene selection and tumor classification problems. The proposed model consists of two stages: first, the well-known minimum redundancy-maximum relevance (mRMR) method is applied to preselect genes that have the highest relevance with the target class and are maximally dissimilar to each other. Then, PSO is proposed to form a novel weighted SVM (WSVM) to classify samples. In this WSVM, PSO not only discards redundant genes, but also especially takes into account the degree of importance of each gene and assigns diverse weights to the different genes. We also use PSO to find appropriate kernel parameters since the choice of gene weights influences the optimal kernel parameters and vice versa. Experimental results show that the proposed mRMR-PSO-WSVM model achieves highest classification accuracy on two popular leukemia and colon gene expression datasets obtained from DNA microarrays. Therefore, we can conclude that our proposed method is very promising compared to the previously reported results

    Pareto optimal-based feature selection framework for biomarker identification

    Get PDF
    Numerous computational techniques have been applied to identify the vital features of gene expression datasets in aiming to increase the efficiency of biomedical applications. The classification of microarray data samples is an important task to correctly recognise diseases by identifying small but clinically meaningful genes. However, identification of disease representative genes or biomarkers in high dimensional microarray gene-expression datasets remains a challenging task. This thesis investigates the viability of Pareto optimisation in identifying relevant subsets of biomarkers in high-dimensional microarray datasets. A robust Pareto Optimal based feature selection framework for biomarker discovery is then proposed. First, a two-stage feature selection approach using ensemble filter methods and Pareto Optimality is proposed. The integration of the multi-objective approach employing Pareto Optimality starts with well-known filter methods applied to various microarray gene-expression datasets. Although filter methods provide ranked lists of features, they do not give information about optimum subsets of features, which are namely genes in this study. To address this limitation, the Pareto Optimality is incorporated along with filter methods. The robustness of the proposed framework is successfully demonstrated on several well-known microarray gene expression datasets and it is shown to achieve comparable or up to 100% predictive accuracy with comparatively fewer features. Better performance results are obtained in comparison with other approaches, which are single-objective approaches. Furthermore, cross-validation and k-fold approaches are integrated into the framework, which can enhance the over-fitting problem and the gene selection process is subsequently more accurate under various conditions. Then the proposed framework is developed in several phases. The Sequential Forward Selection method (SFS) is first used to represent wrapper techniques, and the developed Pareto Optimality based framework is applied multiple times and tested on different data types. Given the nature of most real-life data, imbalanced classes are examined using the proposed framework. The classifier achieves high performance at a similar level of different cases using the proposed Pareto Optimal based feature selection framework, which has a novel structure for imbalanced classes. Comparable or better gene subset sizes are obtained using the proposed framework. Finally, handling missing data within the proposed framework is investigated and it is demonstrated that different data imputation methods can also help in the effective integration of various feature selection methods
    corecore