212,273 research outputs found

    Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm

    Full text link
    In single-particle cryo-electron microscopy (cryo-EM), K-means clustering algorithm is widely used in unsupervised 2D classification of projection images of biological macromolecules. 3D ab initio reconstruction requires accurate unsupervised classification in order to separate molecular projections of distinct orientations. Due to background noise in single-particle images and uncertainty of molecular orientations, traditional K-means clustering algorithm may classify images into wrong classes and produce classes with a large variation in membership. Overcoming these limitations requires further development on clustering algorithms for cryo-EM data analysis. We propose a novel unsupervised data clustering method building upon the traditional K-means algorithm. By introducing an adaptive constraint term in the objective function, our algorithm not only avoids a large variation in class sizes but also produces more accurate data clustering. Applications of this approach to both simulated and experimental cryo-EM data demonstrate that our algorithm is a significantly improved alterative to the traditional K-means algorithm in single-particle cryo-EM analysis.Comment: 35 pages, 14 figure

    Penalized Clustering of Large Scale Functional Data with Multiple Covariates

    Full text link
    In this article, we propose a penalized clustering method for large scale data with multiple covariates through a functional data approach. In the proposed method, responses and covariates are linked together through nonparametric multivariate functions (fixed effects), which have great flexibility in modeling a variety of function features, such as jump points, branching, and periodicity. Functional ANOVA is employed to further decompose multivariate functions in a reproducing kernel Hilbert space and provide associated notions of main effect and interaction. Parsimonious random effects are used to capture various correlation structures. The mixed-effect models are nested under a general mixture model, in which the heterogeneity of functional data is characterized. We propose a penalized Henderson's likelihood approach for model-fitting and design a rejection-controlled EM algorithm for the estimation. Our method selects smoothing parameters through generalized cross-validation. Furthermore, the Bayesian confidence intervals are used to measure the clustering uncertainty. Simulation studies and real-data examples are presented to investigate the empirical performance of the proposed method. Open-source code is available in the R package MFDA

    Graph based gene/protein prediction and clustering over uncertain medical databases.

    Get PDF
    Clustering over protein or gene data is now a popular issue in biomedical databases. In general, large sets of gene tags are clustered using high computation techniques over gene or protein distributed data. Most of the traditional clustering techniques are based on subspace, hierarchical and partitioning feature extraction. Various clustering techniques have been proposed in the literature with different cluster measures, but their performance is limited due to spatial noise and uncertainty. In this paper, an improved graph-based clustering technique is proposed for the generation of efficient gene or protein clusters over uncertain and noisy data. The proposed graph-based visualization can effectively identify different types of genes or proteins along with relational attributes. Experimental results show that the proposed graph model more effectively clusters complex gene or protein data when compared with conventional clustering approaches

    Predictive Inference for Spatio-temporal Precipitation Data and Its Extremes

    Full text link
    Modelling of precipitation and its extremes is important for urban and agriculture planning purposes. We present a method for producing spatial predictions and measures of uncertainty for spatio-temporal data that is heavy-tailed and subject to substaintial skewness which often arise in measurements of many environmental processes, and we apply the method to precipitation data in south-west Western Australia. A generalised hyperbolic Bayesian hierarchical model is constructed for the intensity, frequency and duration of daily precipitation, including the extremes. Unlike models based on extreme value theory, which only model maxima of finite-sized blocks or exceedances above a large threshold, the proposed model uses all the data available efficiently, and hence not only fits the extremes but also models the entire rainfall distribution. It captures spatial and temporal clustering, as well as spatially and temporally varying volatility and skewness. The model assumes that the regional precipitation is driven by a latent process characterised by geographical and climatological covariates. Effects not fully described by the covariates are captured by spatial and temporal structure in the hierarchies. Inference is provided by MCMC using a Metropolis-Hastings algorithm and spatial interpolation method, which provide a natural approach for estimating uncertainty. Similarly both spatial and temporal predictions with uncertainty can be produced with the model.Comment: Under review at Journal of the American Statistical Association. 27 pages, 10 figure

    The 3D soft X-ray cluster-AGN cross-correlation function in the ROSAT NEP survey

    Full text link
    X-ray surveys facilitate investigations of the environment of AGNs. Deep Chandra observations revealed that the AGNs source surface density rises near clusters of galaxies. The natural extension of these works is the measurement of spatial clustering of AGNs around clusters and the investigation of relative biasing between active galactic nuclei and galaxies near clusters.The major aims of this work are to obtain a measurement of the correlation length of AGNs around clusters and a measure of the averaged clustering properties of a complete sample of AGNs in dense environments. We present the first measurement of the soft X-ray cluster-AGN cross-correlation function in redshift space using the data of the ROSAT-NEP survey. The survey covers 9x9 deg^2 around the North Ecliptic Pole where 442 X-ray sources were detected and almost completely spectroscopically identified. We detected a >3sigma significant clustering signal on scales s<50 h70^-1 Mpc. We performed a classical maximum-likelihood power-law fit to the data and obtained a correlation length s_0=8.7+1.2-0.3 h_70-1 Mpc and a slope gamma=1.7$^+0.2_-0.7 (1sigma errors). This is a strong evidence that AGNs are good tracers of the large scale structure of the Universe. Our data were compared to the results obtained by cross-correlating X-ray clusters and galaxies. We observe, with a large uncertainty, that the bias factor of AGN is similar to that of galaxies.Comment: 4 pages, 2 figure, proceedings of the Conference "At the edge of the Universe", Sintra Portugal, October 2006. To be published on the Astronomical Society of the Pacific Conference Series (ASPCS
    • …
    corecore