12 research outputs found
Structural Variability from Noisy Tomographic Projections
In cryo-electron microscopy, the 3D electric potentials of an ensemble of
molecules are projected along arbitrary viewing directions to yield noisy 2D
images. The volume maps representing these potentials typically exhibit a great
deal of structural variability, which is described by their 3D covariance
matrix. Typically, this covariance matrix is approximately low-rank and can be
used to cluster the volumes or estimate the intrinsic geometry of the
conformation space. We formulate the estimation of this covariance matrix as a
linear inverse problem, yielding a consistent least-squares estimator. For
images of size -by- pixels, we propose an algorithm for calculating this
covariance estimator with computational complexity
, where the condition number
is empirically in the range --. Its efficiency relies on the
observation that the normal equations are equivalent to a deconvolution problem
in 6D. This is then solved by the conjugate gradient method with an appropriate
circulant preconditioner. The result is the first computationally efficient
algorithm for consistent estimation of 3D covariance from noisy projections. It
also compares favorably in runtime with respect to previously proposed
non-consistent estimators. Motivated by the recent success of eigenvalue
shrinkage procedures for high-dimensional covariance matrices, we introduce a
shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We
evaluate our methods on simulated datasets and achieve classification results
comparable to state-of-the-art methods in shorter running time. We also present
results on clustering volumes in an experimental dataset, illustrating the
power of the proposed algorithm for practical determination of structural
variability.Comment: 52 pages, 11 figure
Multitaper estimation on arbitrary domains
Multitaper estimators have enjoyed significant success in estimating spectral
densities from finite samples using as tapers Slepian functions defined on the
acquisition domain. Unfortunately, the numerical calculation of these Slepian
tapers is only tractable for certain symmetric domains, such as rectangles or
disks. In addition, no performance bounds are currently available for the mean
squared error of the spectral density estimate. This situation is inadequate
for applications such as cryo-electron microscopy, where noise models must be
estimated from irregular domains with small sample sizes. We show that the
multitaper estimator only depends on the linear space spanned by the tapers. As
a result, Slepian tapers may be replaced by proxy tapers spanning the same
subspace (validating the common practice of using partially converged solutions
to the Slepian eigenproblem as tapers). These proxies may consequently be
calculated using standard numerical algorithms for block diagonalization. We
also prove a set of performance bounds for multitaper estimators on arbitrary
domains. The method is demonstrated on synthetic and experimental datasets from
cryo-electron microscopy, where it reduces mean squared error by a factor of
two or more compared to traditional methods.Comment: 28 pages, 11 figure
Efficient high-resolution refinement in cryo-EM with stochastic gradient descent
Electron cryomicroscopy (cryo-EM) is an imaging technique widely used in
structural biology to determine the three-dimensional structure of biological
molecules from noisy two-dimensional projections with unknown orientations. As
the typical pipeline involves processing large amounts of data, efficient
algorithms are crucial for fast and reliable results. The stochastic gradient
descent (SGD) algorithm has been used to improve the speed of ab initio
reconstruction, which results in a first, low-resolution estimation of the
volume representing the molecule of interest, but has yet to be applied
successfully in the high-resolution regime, where expectation-maximization
algorithms achieve state-of-the-art results, at a high computational cost. In
this article, we investigate the conditioning of the optimization problem and
show that the large condition number prevents the successful application of
gradient descent-based methods at high resolution. Our results include a
theoretical analysis of the condition number of the optimization problem in a
simplified setting where the individual projection directions are known, an
algorithm based on computing a diagonal preconditioner using Hutchinson's
diagonal estimator, and numerical experiments showing the improvement in the
convergence speed when using the estimated preconditioner with SGD. The
preconditioned SGD approach can potentially enable a simple and unified
approach to ab initio reconstruction and high-resolution refinement with faster
convergence speed and higher flexibility, and our results are a promising step
in this direction.Comment: 22 pages, 7 figure
Matrix Denoising with Partial Noise Statistics: Optimal Singular Value Shrinkage of Spiked F-Matrices
We study the problem of estimating a large, low-rank matrix corrupted by
additive noise of unknown covariance, assuming one has access to additional
side information in the form of noise-only measurements. We study the
Whiten-Shrink-reColor (WSC) workflow, where a "noise covariance whitening"
transformation is applied to the observations, followed by appropriate singular
value shrinkage and a "noise covariance re-coloring" transformation. We show
that under the mean square error loss, a unique, asymptotically optimal
shrinkage nonlinearity exists for the WSC denoising workflow, and calculate it
in closed form. To this end, we calculate the asymptotic eigenvector rotation
of the random spiked F-matrix ensemble, a result which may be of independent
interest. With sufficiently many pure-noise measurements, our optimally-tuned
WSC denoising workflow outperforms, in mean square error, matrix denoising
algorithms based on optimal singular value shrinkage which do not make similar
use of noise-only side information; numerical experiments show that our
procedure's relative performance is particularly strong in challenging
statistical settings with high dimensionality and large degree of
heteroscedasticity