29 research outputs found

    Estimation of obesity levels based on computational intelligence

    Get PDF
    Obesity is a worldwide disease that affects people of all ages and gender; in consequence, researchers have made great efforts to identify factors that cause it early. In this study, an intelligent method is created, based on supervised and unsupervised techniques of data mining such as Simple K-Means, Decision Trees (DT), and Support Vector Machines (SVM) to detect obesity levels and help people and health professionals to have a healthier lifestyle against this global epidemic. In this research the primary source of collection was from students 18 and 25 years old at institutions in the countries of Colombia, Mexico, and Peru. The study takes a dataset relating to the main causes of obesity, based on the aim to reference high caloric intake, a decrease of energy expenditure due to the lack of physical activity, alimentary disorders, genetics, socioeconomic factors, and/or anxiety and depression. In the selected dataset, 178 students participated in the study, 81 male and 97 female. Using algorithms including Decision Tree, Support Vector Machine (SVM), and Simple K-Means, the results show a relevant tool to perform a comparative analysis among the mentioned algorithms

    Combining Multiple Clusterings via Crowd Agreement Estimation and Multi-Granularity Link Analysis

    Full text link
    The clustering ensemble technique aims to combine multiple clusterings into a probably better and more robust clustering and has been receiving an increasing attention in recent years. There are mainly two aspects of limitations in the existing clustering ensemble approaches. Firstly, many approaches lack the ability to weight the base clusterings without access to the original data and can be affected significantly by the low-quality, or even ill clusterings. Secondly, they generally focus on the instance level or cluster level in the ensemble system and fail to integrate multi-granularity cues into a unified model. To address these two limitations, this paper proposes to solve the clustering ensemble problem via crowd agreement estimation and multi-granularity link analysis. We present the normalized crowd agreement index (NCAI) to evaluate the quality of base clusterings in an unsupervised manner and thus weight the base clusterings in accordance with their clustering validity. To explore the relationship between clusters, the source aware connected triple (SACT) similarity is introduced with regard to their common neighbors and the source reliability. Based on NCAI and multi-granularity information collected among base clusterings, clusters, and data instances, we further propose two novel consensus functions, termed weighted evidence accumulation clustering (WEAC) and graph partitioning with multi-granularity link analysis (GP-MGLA) respectively. The experiments are conducted on eight real-world datasets. The experimental results demonstrate the effectiveness and robustness of the proposed methods.Comment: The MATLAB source code of this work is available at: https://www.researchgate.net/publication/28197031

    Fuzzy-Citation-KNN: a fuzzy nearest neighbor approach for multi-instance classification

    Get PDF
    This contribution deals with multi-instance classification, where the labeled data samples are bags composed on instances instead of labeled instances as in standard classification. Every bag contains a number of traditional instances (described by a number of attributes) and the number of instances is not usually the same in all the bags. So, the whole bag is labeled but the instances that compose the bag are not individually labeled. We propose a fuzzy sets based extension of the well known algorithm called Citation-KNN, a reference method in multi-instance classification. Citation-KNN uses two types of examples in the classification rule: neighbors and citers of the bag to be classified. We analyze two versions of our proposal, one of them using both neighbors and citers, and the other one using only neighbors. Our approach uses the Hausdorff distance and it is based on the FuzzyKNN algorithm. Several data-sets from KEEL data-set repository are used in the experimental study and we compare our proposals with the original Citation-KNN algorithm

    Vector-valued distribution regression: a simple and consistent approach

    Get PDF
    We address the distribution regression problem (DRP): regressing on the domain of probability measures, in the two-stage sampled setup when only samples from the distributions are given. The DRP formulation offers a unified framework for several important tasks in statistics and machine learning including multi-instance learning (MIL), or point estimation problems without analytical solution. Despite the large number of MIL heuristics, essentially there is no theoretically grounded approach to tackle the DRP problem in two-stage sampled case. To the best of our knowledge, the only existing technique with consistency guarantees requires kernel density estimation as an intermediate step (which often scale poorly in practice), and the domain of the distributions to be compact Euclidean. We analyse a simple (analytically computable) ridge regression alternative to DRP: we embed the distributions to a reproducing kernel Hilbert space, and learn the regressor from the embeddings to the outputs. We show that this scheme is consistent in the two-stage sampled setup under mild conditions, for probability measure inputs defined on separable, topological domains endowed with kernels, with vector-valued outputs belonging to an arbitrary separable Hilbert space. Specially, choosing the kernel on the space of embedded distributions to be linear and the output space to the real line, we get the consistency of set kernels in regression, which was a 15-year-old open question. In our talk we are going to present (i) the main ideas and results of consistency, (ii) concrete kernel constructions on mean embedded distributions, and (iii) two applications (supervised entropy learning, aerosol prediction based on multispectral satellite images) demonstrating the efficiency of our approach
    corecore