Search CORE

34,398 research outputs found

Optimal Clustering under Uncertainty

Author: Benalcázar Marco E.
Dalton Lori A.
Dougherty Edward R.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

Classical clustering algorithms typically either lack an underlying probability framework to make them predictive or focus on parameter estimation rather than defining and minimizing a notion of error. Recent work addresses these issues by developing a probabilistic framework based on the theory of random labeled point processes and characterizing a Bayes clusterer that minimizes the number of misclustered points. The Bayes clusterer is analogous to the Bayes classifier. Whereas determining a Bayes classifier requires full knowledge of the feature-label distribution, deriving a Bayes clusterer requires full knowledge of the point process. When uncertain of the point process, one would like to find a robust clusterer that is optimal over the uncertainty, just as one may find optimal robust classifiers with uncertain feature-label distributions. Herein, we derive an optimal robust clusterer by first finding an effective random point process that incorporates all randomness within its own probabilistic structure and from which a Bayes clusterer can be derived that provides an optimal robust clusterer relative to the uncertainty. This is analogous to the use of effective class-conditional distributions in robust classification. After evaluating the performance of robust clusterers in synthetic mixtures of Gaussians models, we apply the framework to granular imaging, where we make use of the asymptotic granulometric moment theory for granular images to relate robust clustering theory to the application.Comment: 19 pages, 5 eps figures, 1 tabl

arXiv.org e-Print Archive

Directory of Open Access Journals

FigShare

An anthology of non-local QFT and QFT on noncommutative spacetime

Author: Bahns
Bahns
Bakamjian
Bert Schroer
Bloch
Borchers
Borchers
Bros
Bros
Brunetti
Brunetti
Buchholz
Buchholz
Buchholz
Coester
Coester
Dimock
Doplicher
Guido
Haag
Halvorson
Hayashi
Hegerfeldt
Jordan
Kristensen
Lechner
Malament
Marnelius
Marques
Martinec
Mead
Mund
Mund
Newton
Piacitelli
Pohlmeyer
Polyzou
Schroer
Schroer
Schroer
Seiberg
Smirnov
Snyder
Sokolov
Steinmann
Wald
Weinberg
Yngvason
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

Ever since the appearance of renormalization theory there have been several differently motivated attempts at non-localized (in the sense of not generated by point-like fields) relativistic particle theories, the most recent one being at QFT on non-commutative Minkowski spacetime. The often conceptually uncritical and historically forgetful contemporary approach to these problems calls for a critical review the light of previous results on this subject.Comment: 33 pages tci-latex, improvements of formulations, shortening of sentences, addition of some reference

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A Probabilistic Embedding Clustering Method for Urban Structure Detection

Author: H. Li
L. Gao
L. Zhao
M. Deng
X. Lin
X. Lin
Y. Zhang
Publication venue
Publication date: 12/07/2017
Field of study

Urban structure detection is a basic task in urban geography. Clustering is a core technology to detect the patterns of urban spatial structure, urban functional region, and so on. In big data era, diverse urban sensing datasets recording information like human behaviour and human social activity, suffer from complexity in high dimension and high noise. And unfortunately, the state-of-the-art clustering methods does not handle the problem with high dimension and high noise issues concurrently. In this paper, a probabilistic embedding clustering method is proposed. Firstly, we come up with a Probabilistic Embedding Model (PEM) to find latent features from high dimensional urban sensing data by learning via probabilistic model. By latent features, we could catch essential features hidden in high dimensional data known as patterns; with the probabilistic model, we can also reduce uncertainty caused by high noise. Secondly, through tuning the parameters, our model could discover two kinds of urban structure, the homophily and structural equivalence, which means communities with intensive interaction or in the same roles in urban structure. We evaluated the performance of our model by conducting experiments on real-world data and experiments with real data in Shanghai (China) proved that our method could discover two kinds of urban structure, the homophily and structural equivalence, which means clustering community with intensive interaction or under the same roles in urban space.Comment: 6 pages, 7 figures, ICSDM201

arXiv.org e-Print Archive

Directory of Open Access Journals

Qualitative Effects of Knowledge Rules in Probabilistic Data Integration

Author: Keijzer A. de
Keulen M. van
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2008
Field of study

One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used

CiteSeerX

University of Twente Research Information