85 research outputs found

    Segmentation of sales for a mobile phone service through CART classification tree algorithm

    Get PDF
    The work consisted of detailing the CRISP-DM method in order to identify optimal groups of customers who are more likely to migrate from a prepaid to postpaid option in order to formulate an improvement plan for in call management by sorting the database. Classification models were applied to analyze the characteristics generated by the purchase of the different services. The CART Classification Tree algorithm. As a result, groups differentiated by probabilities of sales success (migrate from a prepaid to postpaid plan) were found, segments that reflect particular needs and characteristics to design marketing actions focused on the objective of increasing the effectiveness rate, contact information, and sales increase

    Simultaneous model-based clustering and visualization in the Fisher discriminative subspace

    Full text link
    Clustering in high-dimensional spaces is nowadays a recurrent problem in many scientific domains but remains a difficult task from both the clustering accuracy and the result understanding points of view. This paper presents a discriminative latent mixture (DLM) model which fits the data in a latent orthonormal discriminative subspace with an intrinsic dimension lower than the dimension of the original space. By constraining model parameters within and between groups, a family of 12 parsimonious DLM models is exhibited which allows to fit onto various situations. An estimation algorithm, called the Fisher-EM algorithm, is also proposed for estimating both the mixture parameters and the discriminative subspace. Experiments on simulated and real datasets show that the proposed approach performs better than existing clustering methods while providing a useful representation of the clustered data. The method is as well applied to the clustering of mass spectrometry data

    High-Dimensional Data Clustering

    Get PDF
    Clustering in high-dimensional spaces is a difficult problem which is recurrent in many domains, for example in image analysis. The difficulty is due to the fact that high-dimensional data usually live in different low-dimensional subspaces hidden in the original space. This paper presents a family of Gaussian mixture models designed for high-dimensional data which combine the ideas of dimension reduction and parsimonious modeling. These models give rise to a clustering method based on the Expectation-Maximization algorithm which is called High-Dimensional Data Clustering (HDDC). In order to correctly fit the data, HDDC estimates the specific subspace and the intrinsic dimension of each group. Our experiments on artificial and real datasets show that HDDC outperforms existing methods for clustering high-dimensional dat

    Green efficiency performance analysis of the logistics industry in China: based on a kind of machine learning methods

    Get PDF
    This paper aims to analyze the green efficiency performance of the logistics industry in China’s 30 provinces from 2008 to 2017. We first evaluate the green efficiency of the logistics industry through the non-directional distance function (NDDF) method. Then, we use the functional clustering method funHDDC, which is one of the popular machine learning methods, to divide 30 provinces into 4 clusters and analyze the similarities and differences in green efficiency performance patterns among different groups. Further, we explore the driving factors of dynamic changes in green efficiency through the decomposition method. The main conclusions of this paper are as follows: (1) In general, the level of green efficiency is closely related to the geographical location. From the clustering results, we can find that most of the eastern regions belong to the cluster with higher green efficiency, while most of the western regions belong to the cluster with lower green efficiency. However, the green efficiency performance in several regions with high economic levels, such as Beijing and Shanghai, is not satisfactory. (2) Based on the analysis of decomposition results, the innovation effect of China’s logistics industry is the most obvious, but the efficiency change still needs to be improved, and technical leadership should be strengthened. Based on these conclusions, we further propose some policy recommendations for the green development of the logistics industry in China

    Robot Learning with Task-Parameterized Generative Models

    Get PDF
    Task-parameterized models provide a representation of movement/behavior that can adapt to a set of task parameters describing the current situation encountered by the robot, such as location of objects or landmarks in its workspace. This paper gives an overview of the task-parameterized Gaussian mixture model (TP-GMM) introduced in previous publications, and introduces a number of extensions and ongoing challenges required to move the approach toward unconstrained environments. In particular, it discusses its generalization capability and the handling of movements with a high number of degrees of freedom. It then shows that the method is not restricted to movements in task space, but that it can also be exploited to handle constraints in joint space, including priority constraints

    Model-based Clustering of High-Dimensional Data in Astrophysics

    Get PDF
    The nature of data in Astrophysics has changed, as in other scientific fields, in the past decades due to the increase of the measurement capabilities. As a consequence, data are nowadays frequently of high dimensionality and available in mass or stream. Model-based techniques for clustering are popular tools which are renowned for their probabilistic foundations and their flexibility. However, classical model-based techniques show a disappointing behavior in high-dimensional spaces which is mainly due to their dramatical over-parametrization. The recent developments in model-based classification overcome these drawbacks and allow to efficiently classify high-dimensional data, even in the “small n / large p” situation. This work presents a comprehensive review of these recent approaches, including regularization-based techniques, parsimonious modeling, subspace classification methods and classification methods based on variable selection. The use of these model-based methods is also illustrated on real-world classification problems in Astrophysics using R packages

    Anomaly Detection Based on Confidence Intervals Using SOM with an Application to Health Monitoring

    No full text
    International audienceWe develop an application of SOM for the task of anomaly detection and visualization. To remove the effect of exogenous independent variables, we use a correction model which is more accurate than the usual one, since we apply different linear models in each cluster of context. We do not assume any particular probability distribution of the data and the detection method is based on the distance of new data to the Kohonen map learned with corrected healthy data. We apply the proposed method to the detection of aircraft engine anomalies

    Object localization by subspace clustering of local descriptors

    No full text
    Abstract. This paper presents a probabilistic approach for object localization which combines subspace clustering with the selection of discriminative clusters. Clustering is often a key step in object recognition and is penalized by the high dimensionality of the descriptors. Indeed, local descriptors, such as SIFT, which have shown excellent results in recognition, are high-dimensional and live in different low-dimensional subspaces. We therefore use a subspace clustering method called High-Dimensional Data Clustering (HDDC) which overcomes the curse of dimensionality. Furthermore, in many cases only a few of the clusters are useful to discriminate the object. We, thus, evaluate the discriminative capacity of clusters and use it to compute the probability that a local descriptor belongs to the object. Experimental results demonstrate the effectiveness of our probabilistic approach for object localization and show that subspace clustering gives better results compared to standard clustering methods. Furthermore, our approach outperforms existing results for the Pascal 2005 dataset.
    corecore