76,116 research outputs found
Structural Variability from Noisy Tomographic Projections
In cryo-electron microscopy, the 3D electric potentials of an ensemble of
molecules are projected along arbitrary viewing directions to yield noisy 2D
images. The volume maps representing these potentials typically exhibit a great
deal of structural variability, which is described by their 3D covariance
matrix. Typically, this covariance matrix is approximately low-rank and can be
used to cluster the volumes or estimate the intrinsic geometry of the
conformation space. We formulate the estimation of this covariance matrix as a
linear inverse problem, yielding a consistent least-squares estimator. For
images of size -by- pixels, we propose an algorithm for calculating this
covariance estimator with computational complexity
, where the condition number
is empirically in the range --. Its efficiency relies on the
observation that the normal equations are equivalent to a deconvolution problem
in 6D. This is then solved by the conjugate gradient method with an appropriate
circulant preconditioner. The result is the first computationally efficient
algorithm for consistent estimation of 3D covariance from noisy projections. It
also compares favorably in runtime with respect to previously proposed
non-consistent estimators. Motivated by the recent success of eigenvalue
shrinkage procedures for high-dimensional covariance matrices, we introduce a
shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We
evaluate our methods on simulated datasets and achieve classification results
comparable to state-of-the-art methods in shorter running time. We also present
results on clustering volumes in an experimental dataset, illustrating the
power of the proposed algorithm for practical determination of structural
variability.Comment: 52 pages, 11 figure
Estimating Time-Varying Effective Connectivity in High-Dimensional fMRI Data Using Regime-Switching Factor Models
Recent studies on analyzing dynamic brain connectivity rely on sliding-window
analysis or time-varying coefficient models which are unable to capture both
smooth and abrupt changes simultaneously. Emerging evidence suggests
state-related changes in brain connectivity where dependence structure
alternates between a finite number of latent states or regimes. Another
challenge is inference of full-brain networks with large number of nodes. We
employ a Markov-switching dynamic factor model in which the state-driven
time-varying connectivity regimes of high-dimensional fMRI data are
characterized by lower-dimensional common latent factors, following a
regime-switching process. It enables a reliable, data-adaptive estimation of
change-points of connectivity regimes and the massive dependencies associated
with each regime. We consider the switching VAR to quantity the dynamic
effective connectivity. We propose a three-step estimation procedure: (1)
extracting the factors using principal component analysis (PCA) and (2)
identifying dynamic connectivity states using the factor-based switching vector
autoregressive (VAR) models in a state-space formulation using Kalman filter
and expectation-maximization (EM) algorithm, and (3) constructing the
high-dimensional connectivity metrics for each state based on subspace
estimates. Simulation results show that our proposed estimator outperforms the
K-means clustering of time-windowed coefficients, providing more accurate
estimation of regime dynamics and connectivity metrics in high-dimensional
settings. Applications to analyzing resting-state fMRI data identify dynamic
changes in brain states during rest, and reveal distinct directed connectivity
patterns and modular organization in resting-state networks across different
states.Comment: 21 page
A novel traveling-wave-based method improved by unsupervised learning for fault location of power cables via sheath current monitoring
In order to improve the practice in maintenance of power cables, this paper proposes a novel traveling-wave-based fault location method improved by unsupervised learning. The improvement mainly lies in the identification of the arrival time of the traveling wave. The proposed approach consists of four steps: (1) The traveling wave associated with the sheath currents of the cables are grouped in a matrix; (2) the use of dimensionality reduction by t-SNE (t-distributed Stochastic Neighbor Embedding) to reconstruct the matrix features in a low dimension; (3) application of the DBSCAN (density-based spatial clustering of applications with noise) clustering to cluster the sample points by the closeness of the sample distribution; (4) the arrival time of the traveling wave can be identified by searching for the maximum slope point of the non-noise cluster with the fewest samples. Simulations and calculations have been carried out for both HV (high voltage) and MV (medium voltage) cables. Results indicate that the arrival time of the traveling wave can be identified for both HV cables and MV cables with/without noise, and the method is suitable with few random time errors of the recorded data. A lab-based experiment was carried out to validate the proposed method and helped to prove the effectiveness of the clustering and the fault location
Entropy-scaling search of massive biological data
Many datasets exhibit a well-defined structure that can be exploited to
design faster search tools, but it is not always clear when such acceleration
is possible. Here, we introduce a framework for similarity search based on
characterizing a dataset's entropy and fractal dimension. We prove that
searching scales in time with metric entropy (number of covering hyperspheres),
if the fractal dimension of the dataset is low, and scales in space with the
sum of metric entropy and information-theoretic entropy (randomness of the
data). Using these ideas, we present accelerated versions of standard tools,
with no loss in specificity and little loss in sensitivity, for use in three
domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics
(MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search
(esFragBag, 10x speedup of FragBag). Our framework can be used to achieve
"compressive omics," and the general theory can be readily applied to data
science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
- âŠ