76,116 research outputs found

    Structural Variability from Noisy Tomographic Projections

    Full text link
    In cryo-electron microscopy, the 3D electric potentials of an ensemble of molecules are projected along arbitrary viewing directions to yield noisy 2D images. The volume maps representing these potentials typically exhibit a great deal of structural variability, which is described by their 3D covariance matrix. Typically, this covariance matrix is approximately low-rank and can be used to cluster the volumes or estimate the intrinsic geometry of the conformation space. We formulate the estimation of this covariance matrix as a linear inverse problem, yielding a consistent least-squares estimator. For nn images of size NN-by-NN pixels, we propose an algorithm for calculating this covariance estimator with computational complexity O(nN4+ÎșN6log⁥N)\mathcal{O}(nN^4+\sqrt{\kappa}N^6 \log N), where the condition number Îș\kappa is empirically in the range 1010--200200. Its efficiency relies on the observation that the normal equations are equivalent to a deconvolution problem in 6D. This is then solved by the conjugate gradient method with an appropriate circulant preconditioner. The result is the first computationally efficient algorithm for consistent estimation of 3D covariance from noisy projections. It also compares favorably in runtime with respect to previously proposed non-consistent estimators. Motivated by the recent success of eigenvalue shrinkage procedures for high-dimensional covariance matrices, we introduce a shrinkage procedure that improves accuracy at lower signal-to-noise ratios. We evaluate our methods on simulated datasets and achieve classification results comparable to state-of-the-art methods in shorter running time. We also present results on clustering volumes in an experimental dataset, illustrating the power of the proposed algorithm for practical determination of structural variability.Comment: 52 pages, 11 figure

    Estimating Time-Varying Effective Connectivity in High-Dimensional fMRI Data Using Regime-Switching Factor Models

    Full text link
    Recent studies on analyzing dynamic brain connectivity rely on sliding-window analysis or time-varying coefficient models which are unable to capture both smooth and abrupt changes simultaneously. Emerging evidence suggests state-related changes in brain connectivity where dependence structure alternates between a finite number of latent states or regimes. Another challenge is inference of full-brain networks with large number of nodes. We employ a Markov-switching dynamic factor model in which the state-driven time-varying connectivity regimes of high-dimensional fMRI data are characterized by lower-dimensional common latent factors, following a regime-switching process. It enables a reliable, data-adaptive estimation of change-points of connectivity regimes and the massive dependencies associated with each regime. We consider the switching VAR to quantity the dynamic effective connectivity. We propose a three-step estimation procedure: (1) extracting the factors using principal component analysis (PCA) and (2) identifying dynamic connectivity states using the factor-based switching vector autoregressive (VAR) models in a state-space formulation using Kalman filter and expectation-maximization (EM) algorithm, and (3) constructing the high-dimensional connectivity metrics for each state based on subspace estimates. Simulation results show that our proposed estimator outperforms the K-means clustering of time-windowed coefficients, providing more accurate estimation of regime dynamics and connectivity metrics in high-dimensional settings. Applications to analyzing resting-state fMRI data identify dynamic changes in brain states during rest, and reveal distinct directed connectivity patterns and modular organization in resting-state networks across different states.Comment: 21 page

    A novel traveling-wave-based method improved by unsupervised learning for fault location of power cables via sheath current monitoring

    Get PDF
    In order to improve the practice in maintenance of power cables, this paper proposes a novel traveling-wave-based fault location method improved by unsupervised learning. The improvement mainly lies in the identification of the arrival time of the traveling wave. The proposed approach consists of four steps: (1) The traveling wave associated with the sheath currents of the cables are grouped in a matrix; (2) the use of dimensionality reduction by t-SNE (t-distributed Stochastic Neighbor Embedding) to reconstruct the matrix features in a low dimension; (3) application of the DBSCAN (density-based spatial clustering of applications with noise) clustering to cluster the sample points by the closeness of the sample distribution; (4) the arrival time of the traveling wave can be identified by searching for the maximum slope point of the non-noise cluster with the fewest samples. Simulations and calculations have been carried out for both HV (high voltage) and MV (medium voltage) cables. Results indicate that the arrival time of the traveling wave can be identified for both HV cables and MV cables with/without noise, and the method is suitable with few random time errors of the recorded data. A lab-based experiment was carried out to validate the proposed method and helped to prove the effectiveness of the clustering and the fault location

    Entropy-scaling search of massive biological data

    Get PDF
    Many datasets exhibit a well-defined structure that can be exploited to design faster search tools, but it is not always clear when such acceleration is possible. Here, we introduce a framework for similarity search based on characterizing a dataset's entropy and fractal dimension. We prove that searching scales in time with metric entropy (number of covering hyperspheres), if the fractal dimension of the dataset is low, and scales in space with the sum of metric entropy and information-theoretic entropy (randomness of the data). Using these ideas, we present accelerated versions of standard tools, with no loss in specificity and little loss in sensitivity, for use in three domains---high-throughput drug screening (Ammolite, 150x speedup), metagenomics (MICA, 3.5x speedup of DIAMOND [3,700x BLASTX]), and protein structure search (esFragBag, 10x speedup of FragBag). Our framework can be used to achieve "compressive omics," and the general theory can be readily applied to data science problems outside of biology.Comment: Including supplement: 41 pages, 6 figures, 4 tables, 1 bo
    • 

    corecore