19,864 research outputs found

    A Probabilistic Embedding Clustering Method for Urban Structure Detection

    Full text link
    Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by learning via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.Comment: 6 pages, 7 figures, ICSDM201

    Optimal Clustering under Uncertainty

    Full text link
    Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl

    Belief Hierarchical Clustering

    Get PDF
    In the data mining field many clustering methods have been proposed, yet standard versions do not take into account uncertain databases. This paper deals with a new approach to cluster uncertain data by using a hierarchical clustering defined within the belief function framework. The main objective of the belief hierarchical clustering is to allow an object to belong to one or several clusters. To each belonging, a degree of belief is associated, and clusters are combined based on the pignistic properties. Experiments with real uncertain data show that our proposed method can be considered as a propitious tool

    Statistical Traffic State Analysis in Large-scale Transportation Networks Using Locality-Preserving Non-negative Matrix Factorization

    Get PDF
    Statistical traffic data analysis is a hot topic in traffic management and control. In this field, current research progresses focus on analyzing traffic flows of individual links or local regions in a transportation network. Less attention are paid to the global view of traffic states over the entire network, which is important for modeling large-scale traffic scenes. Our aim is precisely to propose a new methodology for extracting spatio-temporal traffic patterns, ultimately for modeling large-scale traffic dynamics, and long-term traffic forecasting. We attack this issue by utilizing Locality-Preserving Non-negative Matrix Factorization (LPNMF) to derive low-dimensional representation of network-level traffic states. Clustering is performed on the compact LPNMF projections to unveil typical spatial patterns and temporal dynamics of network-level traffic states. We have tested the proposed method on simulated traffic data generated for a large-scale road network, and reported experimental results validate the ability of our approach for extracting meaningful large-scale space-time traffic patterns. Furthermore, the derived clustering results provide an intuitive understanding of spatial-temporal characteristics of traffic flows in the large-scale network, and a basis for potential long-term forecasting.Comment: IET Intelligent Transport Systems (2013
    • …
    corecore