18 research outputs found

    Robust EM algorithm for model-based curve clustering

    Full text link
    Model-based clustering approaches concern the paradigm of exploratory data analysis relying on the finite mixture model to automatically find a latent structure governing observed data. They are one of the most popular and successful approaches in cluster analysis. The mixture density estimation is generally performed by maximizing the observed-data log-likelihood by using the expectation-maximization (EM) algorithm. However, it is well-known that the EM algorithm initialization is crucial. In addition, the standard EM algorithm requires the number of clusters to be known a priori. Some solutions have been provided in [31, 12] for model-based clustering with Gaussian mixture models for multivariate data. In this paper we focus on model-based curve clustering approaches, when the data are curves rather than vectorial data, based on regression mixtures. We propose a new robust EM algorithm for clustering curves. We extend the model-based clustering approach presented in [31] for Gaussian mixture models, to the case of curve clustering by regression mixtures, including polynomial regression mixtures as well as spline or B-spline regressions mixtures. Our approach both handles the problem of initialization and the one of choosing the optimal number of clusters as the EM learning proceeds, rather than in a two-fold scheme. This is achieved by optimizing a penalized log-likelihood criterion. A simulation study confirms the potential benefit of the proposed algorithm in terms of robustness regarding initialization and funding the actual number of clusters.Comment: In Proceedings of the 2013 International Joint Conference on Neural Networks (IJCNN), 2013, Dallas, TX, US

    TWO-DIMENSIONAL GMM-BASED CLUSTERING IN THE PRESENCE OF QUANTIZATION NOISE

    Get PDF
    In this paper, unlike to the commonly considered clustering, wherein data attributes are accurately presented, it is researched how successful clustering can be performed when data attributes are represented with smaller accuracy, i.e. by using the small number of bits. In particular, the effect of data attributes quantization on the two-dimensional two-component Gaussian mixture model (GMM)-based clustering by using expectation–maximization (EM) algorithm is analyzed. An independent quantization of data attributes by using uniform quantizers with the support limits adjusted to the minimal and maximal attribute values is assumed. The analysis makes it possible to determine the number of bits for data presentation that provides the accurate clustering. These findings can be useful in clustering wherein before being grouped the data have to be represented with a finite small number of bits due to their transmission through the bandwidth-limited channel.

    A Learning-Based EM Clustering for Circular Data with Unknown Number of Clusters

    Get PDF
    Clustering is a method for analyzing grouped data. Circular data were well used in various applications, such as wind directions, departure directions of migrating birds or animals, etc. The expectation & maximization (EM) algorithm on mixtures of von Mises distributions is popularly used for clustering circular data. In general, the EM algorithm is sensitive to initials and not robust to outliers in which it is also necessary to give a number of clusters a priori. In this paper, we consider a learning-based schema for EM, and then propose a learning-based EM algorithm on mixtures of von Mises distributions for clustering grouped circular data. The proposed clustering method is without any initial and robust to outliers with automatically finding the number of clusters. Some numerical and real data sets are used to compare the proposed algorithm with existing methods. Experimental results and comparisons actually demonstrate these good aspects of effectiveness and superiority of the proposed learning-based EM algorithm

    Improved Correction of Atmospheric Pressure Data Obtained by Smartphones through Machine Learning

    Get PDF
    A correction method using machine learning aims to improve the conventional linear regression (LR) based method for correction of atmospheric pressure data obtained by smartphones. The method proposed in this study conducts clustering and regression analysis with time domain classification. Data obtained in Gyeonggi-do, one of the most populous provinces in South Korea surrounding Seoul with the size of 10,000 km2, from July 2014 through December 2014, using smartphones were classified with respect to time of day (daytime or nighttime) as well as day of the week (weekday or weekend) and the user’s mobility, prior to the expectation-maximization (EM) clustering. Subsequently, the results were analyzed for comparison by applying machine learning methods such as multilayer perceptron (MLP) and support vector regression (SVR). The results showed a mean absolute error (MAE) 26% lower on average when regression analysis was performed through EM clustering compared to that obtained without EM clustering. For machine learning methods, the MAE for SVR was around 31% lower for LR and about 19% lower for MLP. It is concluded that pressure data from smartphones are as good as the ones from national automatic weather station (AWS) network

    visClust: A visual clustering algorithm based on orthogonal projections

    Full text link
    We present a novel clustering algorithm, visClust, that is based on lower dimensional data representations and visual interpretation. Thereto, we design a transformation that allows the data to be represented by a binary integer array enabling the further use of image processing methods to select a partition. Qualitative and quantitative analyses show that the algorithm obtains high accuracy (measured with an adjusted one-sided Rand-Index) and requires low runtime and RAM. We compare the results to 6 state-of-the-art algorithms, confirming the quality of visClust by outperforming in most experiments. Moreover, the algorithm asks for just one obligatory input parameter while allowing optimization via optional parameters. The code is made available on GitHub.Comment: 23 page

    Estimation of 5G Core and RAN End-to-End Delay through Gaussian Mixture Models

    Get PDF
    Funding Information: This research was funded by Fundação para a Ciência e Tecnologia (FCT) under the projects 2022.08786.PTDC and UIDB/50008/2020. Publisher Copyright: © 2022 by the authors.Network analytics provide a comprehensive picture of the network’s Quality of Service (QoS), including the End-to-End (E2E) delay. In this paper, we characterize the Core and the Radio Access Network (RAN) E2E delay of 5G networks with the Standalone (SA) and Non-Standalone (NSA) topologies when a single known Probability Density Function (PDF) is not suitable to model its distribution. To this end, multiple PDFs, denominated as components, are combined in a Gaussian Mixture Model (GMM) to represent the distribution of the E2E delay. The accuracy and computation time of the GMM is evaluated for a different number of components and a number of samples. The results presented in the paper are based on a dataset of E2E delay values sampled from both SA and NSA 5G networks. Finally, we show that the GMM can be adopted to estimate a high diversity of E2E delay patterns found in 5G networks and its computation time can be adequate for a large range of applications.publishersversionpublishe

    RECOMED: A Comprehensive Pharmaceutical Recommendation System

    Full text link
    A comprehensive pharmaceutical recommendation system was designed based on the patients and drugs features extracted from Drugs.com and Druglib.com. First, data from these databases were combined, and a dataset of patients and drug information was built. Secondly, the patients and drugs were clustered, and then the recommendation was performed using different ratings provided by patients, and importantly by the knowledge obtained from patients and drug specifications, and considering drug interactions. To the best of our knowledge, we are the first group to consider patients conditions and history in the proposed approach for selecting a specific medicine appropriate for that particular user. Our approach applies artificial intelligence (AI) models for the implementation. Sentiment analysis using natural language processing approaches is employed in pre-processing along with neural network-based methods and recommender system algorithms for modeling the system. In our work, patients conditions and drugs features are used for making two models based on matrix factorization. Then we used drug interaction to filter drugs with severe or mild interactions with other drugs. We developed a deep learning model for recommending drugs by using data from 2304 patients as a training set, and then we used data from 660 patients as our validation set. After that, we used knowledge from critical information about drugs and combined the outcome of the model into a knowledge-based system with the rules obtained from constraints on taking medicine.Comment: 39 pages, 14 figures, 13 table

    NetCluster: A clustering-based framework to analyze internet passive measurements data

    Get PDF
    Internet measured data collected via passive measurement are analyzed to obtain localization information on nodes by clustering (i.e., grouping together) nodes that exhibit similar network path properties. Since traditional clustering algorithms fail to correctly identify clusters of homogeneous nodes, we propose the NetCluster novel framework, suited to analyze Internet measurement datasets. We show that the proposed framework correctly analyzes synthetically generated traces. Finally, we apply it to real traces collected at the access link of Politecnico di Torino campus LAN and discuss the network characteristics as seen at the vantage point

    Unsupervised online clustering and detection algorithms using crowdsourced data for malaria diagnosis

    Get PDF
    © . This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Crowdsourced data in science might be severely error-prone due to the inexperience of annotators participating in the project. In this work, we present a procedure to detect specific structures in an image given tags provided by multiple annotators and collected through a crowdsourcing methodology. The procedure consists of two stages based on the Expectation–Maximization (EM) algorithm, one for clustering and the other one for detection, and it gracefully combines data coming from annotators with unknown reliability in an unsupervised manner. An online implementation of the approach is also presented that is well suited to crowdsourced streaming data. Comprehensive experimental results with real data from the MalariaSpot project are also included.Peer ReviewedPreprin
    corecore