140 research outputs found

    A Multi-Population FA for Automatic Facial Emotion Recognition

    Get PDF
    Automatic facial emotion recognition system is popular in various domains such as health care, surveillance and human-robot interaction. In this paper we present a novel multi-population FA for automatic facial emotion recognition. The overall system is equipped with horizontal vertical neighborhood local binary patterns (hvnLBP) for feature extraction, a novel multi-population FA for feature selection and diverse classifiers for emotion recognition. First, we extract features using hvnLBP, which are robust to illumination changes, scaling and rotation variations. Then, a novel FA variant is proposed to further select most important and emotion specific features. These selected features are used as input to the classifier to further classify seven basic emotions. The proposed system is evaluated with multiple facial expression datasets and also compared with other state-of-the-art models

    Feature Selection Inspired Classifier Ensemble Reduction

    Get PDF
    Classifier ensembles constitute one of the main research directions in machine learning and data mining. The use of multiple classifiers generally allows better predictive performance than that achievable with a single model. Several approaches exist in the literature that provide means to construct and aggregate such ensembles. However, these ensemble systems contain redundant members that, if removed, may further increase group diversity and produce better results. Smaller ensembles also relax the memory and storage requirements, reducing system's run-time overhead while improving overall efficiency. This paper extends the ideas developed for feature selection problems to support classifier ensemble reduction, by transforming ensemble predictions into training samples, and treating classifiers as features. Also, the global heuristic harmony search is used to select a reduced subset of such artificial features, while attempting to maximize the feature subset evaluation. The resulting technique is systematically evaluated using high dimensional and large sized benchmark datasets, showing a superior classification performance against both original, unreduced ensembles, and randomly formed subsets. ? 2013 IEEE

    Laplacian Mixture Modeling for Network Analysis and Unsupervised Learning on Graphs

    Full text link
    Laplacian mixture models identify overlapping regions of influence in unlabeled graph and network data in a scalable and computationally efficient way, yielding useful low-dimensional representations. By combining Laplacian eigenspace and finite mixture modeling methods, they provide probabilistic or fuzzy dimensionality reductions or domain decompositions for a variety of input data types, including mixture distributions, feature vectors, and graphs or networks. Provable optimal recovery using the algorithm is analytically shown for a nontrivial class of cluster graphs. Heuristic approximations for scalable high-performance implementations are described and empirically tested. Connections to PageRank and community detection in network analysis demonstrate the wide applicability of this approach. The origins of fuzzy spectral methods, beginning with generalized heat or diffusion equations in physics, are reviewed and summarized. Comparisons to other dimensionality reduction and clustering methods for challenging unsupervised machine learning problems are also discussed.Comment: 13 figures, 35 reference

    QSAR Classification Models for Predicting the Activity of Inhibitors of Beta-Secretase (BACE1) Associated with Alzheimer’s Disease

    Get PDF
    Alzheimer’s disease is one of the most common neurodegenerative disorders in elder population. The β-site amyloid cleavage enzyme 1 (BACE1) is the major constituent of amyloid plaques and plays a central role in this brain pathogenesis, thus it constitutes an auspicious pharmacological target for its treatment. In this paper, a QSAR model for identification of potential inhibitors of BACE1 protein is designed by using classification methods. For building this model, a database with 215 molecules collected from different sources has been assembled. This dataset contains diverse compounds with different scaffolds and physical-chemical properties, covering a wide chemical space in the drug-like range. The most distinctive aspect of the applied QSAR strategy is the combination of hybridization with backward elimination of models, which contributes to improve the quality of the final QSAR model. Another relevant step is the visual analysis of the molecular descriptors that allows guaranteeing the absence of information redundancy in the model. The QSAR model performances have been assessed by traditional metrics, and the final proposed model has low cardinality, and reaches a high percentage of chemical compounds correctly classified.Fil: Ponzoni, Ignacio. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Sebastián Pérez, Víctor. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Martínez, María J.. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Instituto de Ciencias e Ingeniería de la Computación. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; Argentina. Universidad Nacional del Sur. Departamento de Ciencias e Ingeniería de la Computación. Instituto de Ciencias e Ingeniería de la Computación; ArgentinaFil: Roca, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: De la Cruz Pérez, Carlos. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; EspañaFil: Cravero, Fiorella. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; ArgentinaFil: Vazquez, Gustavo Esteban. Consejo Nacional de Investigaciones Científicas y Técnicas; Argentina. Universidad Católica del Uruguay; UruguayFil: Páez, Juan A.. Consejo Superior de Investigaciones Científicas. Instituto de Química Médica; EspañaFil: Diaz, Monica Fatima. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Bahía Blanca. Planta Piloto de Ingeniería Química. Universidad Nacional del Sur. Planta Piloto de Ingeniería Química; Argentina. Universidad Nacional del Sur. Departamento de Ingeniería Química; ArgentinaFil: Campillo Martín, Nuria Eugenia. Consejo Superior de Investigaciones Científicas. Centro de Investigaciones Biológicas; Españ

    Classification Model for Meticulous Presaging of Heart Disease Detection through SDA and NCA using Machine learning :CMSDANCA

    Get PDF
    For the design and implementation of CDSS, computation time and prognostic accuracy are very important. To analyze the large collection of a dataset for detecting and diagnosis disease ML techniques are used. According to the reports of World Health Organizations, HD is a major cause of death and killer in urban and rural areas or worldwide. The main reason for this is a shortage of doctors and delay in the diagnosis. In this research work, heart disease is a diagnosis by the data mining techniques and used the clinical parameters of patients for early stages diagnosis. The intend of this learning to develop a representation that relies on the prediction method for coronary heart disease. This proposed work used the approach of self-diagnosis Algorithm, Fuzzy Artificial neural network, and NCA & PCA and imputation methods. By the use of this technique computation time for prediction of Coronary HD can be reduced. For the implementation of this the two datasets are using such as Cleveland and Statlog datasets that is collected from the UCI kaggle the ML repository. The datasets for the disease prediction measure are used to accurately calculate the difference between variables and to determine whether they are correlated or not. For this classification model, the performance measure is calculated in requisites of their accuracy, precision, recall, and specificity. This approach is evaluated on the heart disease datasets for improving the accuracy performance results obtained. The outcome for KNN+SDA+NCA+FuzzyANN for Cleveland dataset accuracy achieved 98.56 %.and for Statlog dataset 98.66 %.

    A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0

    Get PDF
    Information of subcellular locations of proteins is important for in-depth studies of cell biology. It is very useful for proteomics, system biology and drug development as well. However, most existing methods for predicting protein subcellular location can only cover 5 to 12 location sites. Also, they are limited to deal with single-location proteins and hence failed to work for multiplex proteins, which can simultaneously exist at, or move between, two or more location sites. Actually, multiplex proteins of this kind usually posses some important biological functions worthy of our special notice. A new predictor called “Euk-mPLoc 2.0” is developed by hybridizing the gene ontology information, functional domain information, and sequential evolutionary information through three different modes of pseudo amino acid composition. It can be used to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell wall, (3) centriole, (4) chloroplast, (5) cyanelle, (6) cytoplasm, (7) cytoskeleton, (8) endoplasmic reticulum, (9) endosome, (10) extracell, (11) Golgi apparatus, (12) hydrogenosome, (13) lysosome, (14) melanosome, (15) microsome (16) mitochondria, (17) nucleus, (18) peroxisome, (19) plasma membrane, (20) plastid, (21) spindle pole body, and (22) vacuole. Compared with the existing methods for predicting eukaryotic protein subcellular localization, the new predictor is much more powerful and flexible, particularly in dealing with proteins with multiple locations and proteins without available accession numbers. For a newly-constructed stringent benchmark dataset which contains both single- and multiple-location proteins and in which none of proteins has pairwise sequence identity to any other in a same location, the overall jackknife success rate achieved by Euk-mPLoc 2.0 is more than 24% higher than those by any of the existing predictors. As a user-friendly web-server, Euk-mPLoc 2.0 is freely accessible at http://www.csbio.sjtu.edu.cn/bioinf/euk-multi-2/. For a query protein sequence of 400 amino acids, it will take about 15 seconds for the web-server to yield the predicted result; the longer the sequence is, the more time it may usually need. It is anticipated that the novel approach and the powerful predictor as presented in this paper will have a significant impact to Molecular Cell Biology, System Biology, Proteomics, Bioinformatics, and Drug Development
    corecore