Search CORE

18,640 research outputs found

Spatio-Temporal Surrogates for Interaction of a Jet with High Explosives: Part II -- Clustering Extremely High-Dimensional Grid-Based Data

Author: Franzman Juliette S.
Kamath Chandrika
Publication venue: 'Office of Scientific and Technical Information (OSTI)'
Publication date: 03/07/2023
Field of study

Building an accurate surrogate model for the spatio-temporal outputs of a computer simulation is a challenging task. A simple approach to improve the accuracy of the surrogate is to cluster the outputs based on similarity and build a separate surrogate model for each cluster. This clustering is relatively straightforward when the output at each time step is of moderate size. However, when the spatial domain is represented by a large number of grid points, numbering in the millions, the clustering of the data becomes more challenging. In this report, we consider output data from simulations of a jet interacting with high explosives. These data are available on spatial domains of different sizes, at grid points that vary in their spatial coordinates, and in a format that distributes the output across multiple files at each time step of the simulation. We first describe how we bring these data into a consistent format prior to clustering. Borrowing the idea of random projections from data mining, we reduce the dimension of our data by a factor of thousand, making it possible to use the iterative k-means method for clustering. We show how we can use the randomness of both the random projections, and the choice of initial centroids in k-means clustering, to determine the number of clusters in our data set. Our approach makes clustering of extremely high dimensional data tractable, generating meaningful cluster assignments for our problem, despite the approximation introduced in the random projections

arXiv.org e-Print Archive

SIDE : a web app for interactive visual data exploration with subjective feedback

Author: De Bie Tijl
Kang Bo
Lijffijt Jefrey
Puolamäki Kai
Publication venue
Publication date: 01/01/2016
Field of study

Ghent University Academic Bibliography

Innovation Pursuit: A New Approach to Subspace Clustering

Author: Atia George
Rahmani Mostafa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 26/11/2017
Field of study

In subspace clustering, a group of data points belonging to a union of subspaces are assigned membership to their respective subspaces. This paper presents a new approach dubbed Innovation Pursuit (iPursuit) to the problem of subspace clustering using a new geometrical idea whereby subspaces are identified based on their relative novelties. We present two frameworks in which the idea of innovation pursuit is used to distinguish the subspaces. Underlying the first framework is an iterative method that finds the subspaces consecutively by solving a series of simple linear optimization problems, each searching for a direction of innovation in the span of the data potentially orthogonal to all subspaces except for the one to be identified in one step of the algorithm. A detailed mathematical analysis is provided establishing sufficient conditions for iPursuit to correctly cluster the data. The proposed approach can provably yield exact clustering even when the subspaces have significant intersections. It is shown that the complexity of the iterative approach scales only linearly in the number of data points and subspaces, and quadratically in the dimension of the subspaces. The second framework integrates iPursuit with spectral clustering to yield a new variant of spectral-clustering-based algorithms. The numerical simulations with both real and synthetic data demonstrate that iPursuit can often outperform the state-of-the-art subspace clustering algorithms, more so for subspaces with significant intersections, and that it significantly improves the state-of-the-art result for subspace-segmentation-based face clustering

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)

Structural Variability from Noisy Tomographic Projections

Author: Andén Joakim
Singer Amit
Publication venue
Publication date: 07/02/2018
Field of study

In cryo-electron microscopy, the 3D electric potentials of an ensemble of molecules are projected along arbitrary viewing directions to yield noisy 2D images. The volume maps representing these potentials typically exhibit a great deal of structural variability, which is described by their 3D covariance matrix. Typically, this covariance matrix is approximately low-rank and can be used to cluster the volumes or estimate the intrinsic geometry of the conformation space. We formulate the estimation of this covariance matrix as a linear inverse problem, yielding a consistent least-squares estimator. For

n

images of size

N

-by-

N

pixels, we propose an algorithm for calculating this covariance estimator with computational complexity

\mathcal{O}(nN^4+\sqrt{\kappa}N^6 \log N)

, where the condition number

\kappa

is empirically in the range

10

200

. Its efficiency relies on the observation that the normal equations are equivalent to a deconvolution problem in 6D. This is then solved by the conjugate gradient method with an appropriate circulant preconditioner. The result is the first computationally efficient algorithm for consistent estimation of 3D covariance from noisy projections. It also compares favorably in runtime with respect to previously proposed non-consistent estimators. Motivated by the recent success of eigenvalue shrinkage procedures for high-dimensional covariance matrices, we introduce a shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We evaluate our methods on simulated datasets and achieve classification results comparable to state-of-the-art methods in shorter running time. We also present results on clustering volumes in an experimental dataset, illustrating the power of the proposed algorithm for practical determination of structural variability.Comment: 52 pages, 11 figure

arXiv.org e-Print Archive

Princeton University Open Access Repository

Crossref

Recovering the Optimal Solution by Dual Random Projection

Author: Jin Rong
Mahdavi Mehrdad
Yang Tianbao
Zhang Lijun
Zhu Shenghuo
Publication venue
Publication date: 21/02/2014
Field of study

Random projection has been widely used in data classification. It maps high-dimensional data into a low-dimensional subspace in order to reduce the computational cost in solving the related optimization problem. While previous studies are focused on analyzing the classification performance of using random projection, in this work, we consider the recovery problem, i.e., how to accurately recover the optimal solution to the original optimization problem in the high-dimensional space based on the solution learned from the subspace spanned by random projections. We present a simple algorithm, termed Dual Random Projection, that uses the dual solution of the low-dimensional optimization problem to recover the optimal solution to the original problem. Our theoretical analysis shows that with a high probability, the proposed algorithm is able to accurately recover the optimal solution to the original problem, provided that the data matrix is of low rank or can be well approximated by a low rank matrix.Comment: The 26th Annual Conference on Learning Theory (COLT 2013

arXiv.org e-Print Archive

CiteSeerX