Search CORE

18,876 research outputs found

$k$ -MLE: A fast algorithm for learning statistical mixture models

Author: Nielsen Frank
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 23/03/2012
Field of study

We describe

k

-MLE, a fast and efficient local search algorithm for learning finite statistical mixtures of exponential families such as Gaussian mixture models. Mixture models are traditionally learned using the expectation-maximization (EM) soft clustering technique that monotonically increases the incomplete (expected complete) likelihood. Given prescribed mixture weights, the hard clustering

k

-MLE algorithm iteratively assigns data to the most likely weighted component and update the component models using Maximum Likelihood Estimators (MLEs). Using the duality between exponential families and Bregman divergences, we prove that the local convergence of the complete likelihood of

k

-MLE follows directly from the convergence of a dual additively weighted Bregman hard clustering. The inner loop of

k

-MLE can be implemented using any

k

-means heuristic like the celebrated Lloyd's batched or Hartigan's greedy swap updates. We then show how to update the mixture weights by minimizing a cross-entropy criterion that implies to update weights by taking the relative proportion of cluster points, and reiterate the mixture parameter update and mixture weight update processes until convergence. Hard EM is interpreted as a special case of

k

-MLE when both the component update and the weight update are performed successively in the inner loop. To initialize

k

-MLE, we propose

k

-MLE++, a careful initialization of

k

-MLE guaranteeing probabilistically a global bound on the best possible complete likelihood.Comment: 31 pages, Extend preliminary paper presented at IEEE ICASSP 201

arXiv.org e-Print Archive

Crossref

Semi-supervised model-based clustering with controlled clusters leakage

Author: Struski Łukasz
Tabor Jacek
Śmieja Marek
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we focus on finding clusters in partially categorized data sets. We propose a semi-supervised version of Gaussian mixture model, called C3L, which retrieves natural subgroups of given categories. In contrast to other semi-supervised models, C3L is parametrized by user-defined leakage level, which controls maximal inconsistency between initial categorization and resulting clustering. Our method can be implemented as a module in practical expert systems to detect clusters, which combine expert knowledge with true distribution of data. Moreover, it can be used for improving the results of less flexible clustering techniques, such as projection pursuit clustering. The paper presents extensive theoretical analysis of the model and fast algorithm for its efficient optimization. Experimental results show that C3L finds high quality clustering model, which can be applied in discovering meaningful groups in partially classified data

arXiv.org e-Print Archive

Jagiellonian Univeristy Repository

Segmentation of Fault Networks Determined from Spatial Clustering of Earthquakes

Author: Allégre
Ben-Zion
Ben-Zion
Bishop
Bowman
Corduneanu
D. Sornette
de Joussineau
Duda
Eneva
Faulkner
G. Ouillon
Gardner
Hauksson
Hauksson
Helmstetter
Helmstetter
Helmstetter
Helmstetter
Kagan
Kagan
Kagan
Kagan
Kagan
Kagan
Kilb
Knopoff
Liu
Lyakhovsky
MacQueen
Manighetti
Marsan
Martel
Martel
Miltenberger
Nemser
Ouillon
Ouillon
Ouillon
Ouillon
Pisarenko
Plesch
Powers
Reasenberg
Robertson
Scholz
Scholz
Sornette
Sornette
Sornette
Sornette
Sornette
Sornette
Sornette
Sornette
Vermilye
Waldhauser
Wang
Weatherley
Willemse
Wilson
Zhou
Zhuang
Zhuang
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 04/06/2010
Field of study

We present a new method of data clustering applied to earthquake catalogs, with the goal of reconstructing the seismically active part of fault networks. We first use an original method to separate clustered events from uncorrelated seismicity using the distribution of volumes of tetrahedra defined by closest neighbor events in the original and randomized seismic catalogs. The spatial disorder of the complex geometry of fault networks is then taken into account by defining faults as probabilistic anisotropic kernels, whose structures are motivated by properties of discontinuous tectonic deformation and previous empirical observations of the geometry of faults and of earthquake clusters at many spatial and temporal scales. Combining this a priori knowledge with information theoretical arguments, we propose the Gaussian mixture approach implemented in an Expectation-Maximization (EM) procedure. A cross-validation scheme is then used and allows the determination of the number of kernels that should be used to provide an optimal data clustering of the catalog. This three-steps approach is applied to a high quality relocated catalog of the seismicity following the 1986 Mount Lewis (

M_l=5.7

) event in California and reveals that events cluster along planar patches of about 2 km

^2

, i.e. comparable to the size of the main event. The finite thickness of those clusters (about 290 m) suggests that events do not occur on well-defined euclidean fault core surfaces, but rather that the damage zone surrounding faults may be seismically active at depth. Finally, we propose a connection between our methodology and multi-scale spatial analysis, based on the derivation of spatial fractal dimension of about 1.8 for the set of hypocenters in the Mnt Lewis area, consistent with recent observations on relocated catalogs

arXiv.org e-Print Archive

Crossref

Accurate detection of dysmorphic nuclei using dynamic programming and supervised classification

Author: Catrysse Hannes
De Vos Winnok
De Vylder Jonas
Philips Wilfried
Robijns Joke
Verschuuren Marlies
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2017
Field of study

A vast array of pathologies is typified by the presence of nuclei with an abnormal morphology. Dysmorphic nuclear phenotypes feature dramatic size changes or foldings, but also entail much subtler deviations such as nuclear protrusions called blebs. Due to their unpredictable size, shape and intensity, dysmorphic nuclei are often not accurately detected in standard image analysis routines. To enable accurate detection of dysmorphic nuclei in confocal and widefield fluorescence microscopy images, we have developed an automated segmentation algorithm, called Blebbed Nuclei Detector (BleND), which relies on two-pass thresholding for initial nuclear contour detection, and an optimal path finding algorithm, based on dynamic programming, for refining these contours. Using a robust error metric, we show that our method matches manual segmentation in terms of precision and outperforms state-of-the-art nuclear segmentation methods. Its high performance allowed for building and integrating a robust classifier that recognizes dysmorphic nuclei with an accuracy above 95%. The combined segmentation-classification routine is bound to facilitate nucleus-based diagnostics and enable real-time recognition of dysmorphic nuclei in intelligent microscopy workflows

Ghent University Academic Bibliography

Directory of Open Access Journals

PubMed Central

Institutional Repository Universiteit Antwerpen

FigShare