212,273 research outputs found
Unsupervised cryo-EM data clustering through adaptively constrained K-means algorithm
In single-particle cryo-electron microscopy (cryo-EM), K-means clustering
algorithm is widely used in unsupervised 2D classification of projection images
of biological macromolecules. 3D ab initio reconstruction requires accurate
unsupervised classification in order to separate molecular projections of
distinct orientations. Due to background noise in single-particle images and
uncertainty of molecular orientations, traditional K-means clustering algorithm
may classify images into wrong classes and produce classes with a large
variation in membership. Overcoming these limitations requires further
development on clustering algorithms for cryo-EM data analysis. We propose a
novel unsupervised data clustering method building upon the traditional K-means
algorithm. By introducing an adaptive constraint term in the objective
function, our algorithm not only avoids a large variation in class sizes but
also produces more accurate data clustering. Applications of this approach to
both simulated and experimental cryo-EM data demonstrate that our algorithm is
a significantly improved alterative to the traditional K-means algorithm in
single-particle cryo-EM analysis.Comment: 35 pages, 14 figure
Penalized Clustering of Large Scale Functional Data with Multiple Covariates
In this article, we propose a penalized clustering method for large scale
data with multiple covariates through a functional data approach. In the
proposed method, responses and covariates are linked together through
nonparametric multivariate functions (fixed effects), which have great
flexibility in modeling a variety of function features, such as jump points,
branching, and periodicity. Functional ANOVA is employed to further decompose
multivariate functions in a reproducing kernel Hilbert space and provide
associated notions of main effect and interaction. Parsimonious random effects
are used to capture various correlation structures. The mixed-effect models are
nested under a general mixture model, in which the heterogeneity of functional
data is characterized. We propose a penalized Henderson's likelihood approach
for model-fitting and design a rejection-controlled EM algorithm for the
estimation. Our method selects smoothing parameters through generalized
cross-validation. Furthermore, the Bayesian confidence intervals are used to
measure the clustering uncertainty. Simulation studies and real-data examples
are presented to investigate the empirical performance of the proposed method.
Open-source code is available in the R package MFDA
Graph based gene/protein prediction and clustering over uncertain medical databases.
Clustering over protein or gene data is now a popular issue in biomedical databases. In general, large sets of gene tags are clustered using high computation techniques over gene or protein distributed data. Most of the traditional clustering techniques are based on subspace, hierarchical and partitioning feature extraction. Various clustering techniques have been proposed in the literature with different cluster measures, but their performance is limited due to spatial noise and uncertainty. In this paper, an improved graph-based clustering technique is proposed for the generation of efficient gene or protein clusters over uncertain and noisy data. The proposed graph-based visualization can effectively identify different types of genes or proteins along with relational attributes. Experimental results show that the proposed graph model more effectively clusters complex gene or protein data when compared with conventional clustering approaches
Predictive Inference for Spatio-temporal Precipitation Data and Its Extremes
Modelling of precipitation and its extremes is important for urban and
agriculture planning purposes. We present a method for producing spatial
predictions and measures of uncertainty for spatio-temporal data that is
heavy-tailed and subject to substaintial skewness which often arise in
measurements of many environmental processes, and we apply the method to
precipitation data in south-west Western Australia. A generalised hyperbolic
Bayesian hierarchical model is constructed for the intensity, frequency and
duration of daily precipitation, including the extremes. Unlike models based on
extreme value theory, which only model maxima of finite-sized blocks or
exceedances above a large threshold, the proposed model uses all the data
available efficiently, and hence not only fits the extremes but also models the
entire rainfall distribution. It captures spatial and temporal clustering, as
well as spatially and temporally varying volatility and skewness. The model
assumes that the regional precipitation is driven by a latent process
characterised by geographical and climatological covariates. Effects not fully
described by the covariates are captured by spatial and temporal structure in
the hierarchies. Inference is provided by MCMC using a Metropolis-Hastings
algorithm and spatial interpolation method, which provide a natural approach
for estimating uncertainty. Similarly both spatial and temporal predictions
with uncertainty can be produced with the model.Comment: Under review at Journal of the American Statistical Association. 27
pages, 10 figure
The 3D soft X-ray cluster-AGN cross-correlation function in the ROSAT NEP survey
X-ray surveys facilitate investigations of the environment of AGNs. Deep
Chandra observations revealed that the AGNs source surface density rises near
clusters of galaxies. The natural extension of these works is the measurement
of spatial clustering of AGNs around clusters and the investigation of relative
biasing between active galactic nuclei and galaxies near clusters.The major
aims of this work are to obtain a measurement of the correlation length of AGNs
around clusters and a measure of the averaged clustering properties of a
complete sample of AGNs in dense environments. We present the first measurement
of the soft X-ray cluster-AGN cross-correlation function in redshift space
using the data of the ROSAT-NEP survey. The survey covers 9x9 deg^2 around the
North Ecliptic Pole where 442 X-ray sources were detected and almost completely
spectroscopically identified. We detected a >3sigma significant clustering
signal on scales s<50 h70^-1 Mpc. We performed a classical maximum-likelihood
power-law fit to the data and obtained a correlation length s_0=8.7+1.2-0.3
h_70-1 Mpc and a slope gamma=1.7$^+0.2_-0.7 (1sigma errors). This is a strong
evidence that AGNs are good tracers of the large scale structure of the
Universe. Our data were compared to the results obtained by cross-correlating
X-ray clusters and galaxies. We observe, with a large uncertainty, that the
bias factor of AGN is similar to that of galaxies.Comment: 4 pages, 2 figure, proceedings of the Conference "At the edge of the
Universe", Sintra Portugal, October 2006. To be published on the Astronomical
Society of the Pacific Conference Series (ASPCS
- …