258,168 research outputs found
Redshift Weights for Baryon Acoustic Oscillations : Application to Mock Galaxy Catalogs
Large redshift surveys capable of measuring the Baryon Acoustic Oscillation
(BAO) signal have proven to be an effective way of measuring the
distance-redshift relation in cosmology. Building off the work in Zhu et al.
(2015), we develop a technique to directly constrain the distance-redshift
relation from BAO measurements without splitting the sample into redshift bins.
We parametrize the distance-redshift relation, relative to a fiducial model, as
a quadratic expansion. We measure its coefficients and reconstruct the
distance-redshift relation from the expansion. We apply the redshift weighting
technique in Zhu et al. (2015) to the clustering of galaxies from 1000 QuickPM
(QPM) mock simulations after reconstruction and achieve a 0.75% measurement of
the angular diameter distance at and the same precision for
Hubble parameter H at . These QPM mock catalogs are designed to mimic
the clustering and noise level of the Baryon Oscillation Spectroscopic Survey
(BOSS) Data Release 12 (DR12). We compress the correlation functions in the
redshift direction onto a set of weighted correlation functions. These
estimators give unbiased and measurements at all redshifts within the
range of the combined sample. We demonstrate the effectiveness of redshift
weighting in improving the distance and Hubble parameter estimates. Instead of
measuring at a single 'effective' redshift as in traditional analyses, we
report our and measurements at all redshifts. The measured fractional
error of ranges from 1.53% at to 0.75% at . The
fractional error of ranges from 0.75% at to 2.45% at .
Our measurements are consistent with a Fisher forecast to within 10% to 20%
depending on the pivot redshift. We further show the results are robust against
the choice of fiducial cosmologies, galaxy bias models, and RSD streaming
parameters.Comment: 13 pages, 8 figures, submitted to MNRA
A Modified Overlapping Partitioning Clustering Algorithm for Categorical Data Clustering
Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data by partitioning data into clusters with similar patterns. Over the past decades, many clustering algorithms have been developed for various clustering problems. An overlapping partitioning clustering (OPC) algorithm can only handle numerical data. Hence, novel clustering algorithms have been studied extensively to overcome this issue. By increasing the number of objects belonging to one cluster and distance between cluster centers, the study aimed to cluster the textual data type without losing the main functions. The proposed study herein included over twenty newsgroup dataset, which consisted of approximately 20000 textual documents. By introducing some modifications to the traditional algorithm, an acceptable level of homogeneity and completeness of clusters were generated. Modifications were performed on the pre-processing phase and data representation, along with the number methods which influence the primary function of the algorithm. Subsequently, the results were evaluated and compared with the k-means algorithm of the training and test datasets. The results indicated that the modified algorithm could successfully handle the categorical data and produce satisfactory clusters
GScluster: Network-weighted gene-set clustering analysis
Background: Gene-set analysis (GSA) has been commonly used to identify significantly altered pathways or functions from omics data. However, GSA often yields a long list of gene-sets, necessitating efficient post-processing for improved interpretation. Existing methods cluster the gene-sets based on the extent of their overlap to summarize the GSA results without considering interactions between gene-sets. Results: Here, we presented a novel network-weighted gene-set clustering that incorporates both the gene-set overlap and protein-protein interaction (PPI) networks. Three examples were demonstrated for microarray gene expression, GWAS summary, and RNA-sequencing data to which different GSA methods were applied. These examples as well as a global analysis show that the proposed method increases PPI densities and functional relevance of the resulting clusters. Additionally, distinct properties of gene-set distance measures were compared. The methods are implemented as an R/Shiny package GScluster that provides gene-set clustering and diverse functions for visualization of gene-sets and PPI networks. Conclusions: Network-weighted gene-set clustering provides functionally more relevant gene-set clusters and related network analysis
Cosmic Acceleration from Causal Backreaction with Recursive Nonlinearities
We revisit the causal backreaction paradigm, in which the need for Dark
Energy is eliminated via the generation of an apparent cosmic acceleration from
the causal flow of inhomogeneity information coming in towards each observer
from distant structure-forming regions. This second-generation formalism
incorporates "recursive nonlinearities": the process by which
already-established metric perturbations will then act to slow down all future
flows of inhomogeneity information. Here, the long-range effects of causal
backreaction are now damped, weakening its impact for models that were
previously best-fit cosmologies. Nevertheless, we find that causal backreaction
can be recovered as a replacement for Dark Energy via the adoption of larger
values for the dimensionless `strength' of the clustering evolution functions
being modeled -- a change justified by the hierarchical nature of clustering
and virialization in the universe, occurring on multiple cosmic length scales
simultaneously. With this, and with one new model parameter representing the
slowdown of clustering due to astrophysical feedback processes, an alternative
cosmic concordance can once again be achieved for a matter-only universe in
which the apparent acceleration is generated entirely by causal backreaction
effects. One drawback is a new degeneracy which broadens our predicted range
for the observed jerk parameter , thus removing what had
appeared to be a clear signature for distinguishing causal backreaction from
Cosmological Constant CDM. As for the long-term fate of the universe,
incorporating recursive nonlinearities appears to make the possibility of an
`eternal' acceleration due to causal backreaction far less likely; though this
does not take into account gravitational nonlinearities or the large-scale
breakdown of cosmological isotropy, effects not easily modeled within this
formalism.Comment: 53 pages, 7 figures, 3 tables. This paper is an advancement of
previous research on Causal Backreaction; the earlier work is available at
arXiv:1109.4686 and arXiv:1109.515
Nearest Neighbor Networks: clustering expression data based on gene neighborhoods
<p>Abstract</p> <p>Background</p> <p>The availability of microarrays measuring thousands of genes simultaneously across hundreds of biological conditions represents an opportunity to understand both individual biological pathways and the integrated workings of the cell. However, translating this amount of data into biological insight remains a daunting task. An important initial step in the analysis of microarray data is clustering of genes with similar behavior. A number of classical techniques are commonly used to perform this task, particularly hierarchical and K-means clustering, and many novel approaches have been suggested recently. While these approaches are useful, they are not without drawbacks; these methods can find clusters in purely random data, and even clusters enriched for biological functions can be skewed towards a small number of processes (e.g. ribosomes).</p> <p>Results</p> <p>We developed Nearest Neighbor Networks (NNN), a graph-based algorithm to generate clusters of genes with similar expression profiles. This method produces clusters based on overlapping cliques within an interaction network generated from mutual nearest neighborhoods. This focus on nearest neighbors rather than on absolute distance measures allows us to capture clusters with high connectivity even when they are spatially separated, and requiring mutual nearest neighbors allows genes with no sufficiently similar partners to remain unclustered. We compared the clusters generated by NNN with those generated by eight other clustering methods. NNN was particularly successful at generating functionally coherent clusters with high precision, and these clusters generally represented a much broader selection of biological processes than those recovered by other methods.</p> <p>Conclusion</p> <p>The Nearest Neighbor Networks algorithm is a valuable clustering method that effectively groups genes that are likely to be functionally related. It is particularly attractive due to its simplicity, its success in the analysis of large datasets, and its ability to span a wide range of biological functions with high precision.</p
A directed isoperimetric inequality with application to Bregman near neighbor lower bounds
Bregman divergences are a class of divergences parametrized by a
convex function and include well known distance functions like
and the Kullback-Leibler divergence. There has been extensive
research on algorithms for problems like clustering and near neighbor search
with respect to Bregman divergences, in all cases, the algorithms depend not
just on the data size and dimensionality , but also on a structure
constant that depends solely on and can grow without bound
independently.
In this paper, we provide the first evidence that this dependence on
might be intrinsic. We focus on the problem of approximate near neighbor search
for Bregman divergences. We show that under the cell probe model, any
non-adaptive data structure (like locality-sensitive hashing) for
-approximate near-neighbor search that admits probes must use space
. In contrast, for LSH under the best
bound is .
Our new tool is a directed variant of the standard boolean noise operator. We
show that a generalization of the Bonami-Beckner hypercontractivity inequality
exists "in expectation" or upon restriction to certain subsets of the Hamming
cube, and that this is sufficient to prove the desired isoperimetric inequality
that we use in our data structure lower bound.
We also present a structural result reducing the Hamming cube to a Bregman
cube. This structure allows us to obtain lower bounds for problems under
Bregman divergences from their analog. In particular, we get a
(weaker) lower bound for approximate near neighbor search of the form
for an -query non-adaptive data structure,
and new cell probe lower bounds for a number of other near neighbor questions
in Bregman space.Comment: 27 page
- …