288,816 research outputs found
Cosmological Constraints with Clustering-Based Redshifts
We demonstrate that observations lacking reliable redshift information, such
as photometric and radio continuum surveys, can produce robust measurements of
cosmological parameters when empowered by clustering-based redshift estimation.
This method infers the redshift distribution based on the spatial clustering of
sources, using cross-correlation with a reference dataset with known redshifts.
Applying this method to the existing SDSS photometric galaxies, and projecting
to future radio continuum surveys, we show that sources can be efficiently
divided into several redshift bins, increasing their ability to constrain
cosmological parameters. We forecast constraints on the dark-energy
equation-of-state and on local non-gaussianity parameters. We explore several
pertinent issues, including the tradeoff between including more sources versus
minimizing the overlap between bins, the shot-noise limitations on binning, and
the predicted performance of the method at high redshifts. Remarkably, we find
that, once this technique is implemented, constraints on dynamical dark energy
from the SDSS imaging catalog can be competitive with, or better than, those
from the spectroscopic BOSS survey and even future planned experiments.
Further, constraints on primordial non-Gaussianity from future large-sky
radio-continuum surveys can outperform those from the Planck CMB experiment,
and rival those from future spectroscopic galaxy surveys. The application of
this method thus holds tremendous promise for cosmology.Comment: 7 pages, 3 figures, 2 tables; to be submitted to MNRA
Multi-view constrained clustering with an incomplete mapping between views
Multi-view learning algorithms typically assume a complete bipartite mapping
between the different views in order to exchange information during the
learning process. However, many applications provide only a partial mapping
between the views, creating a challenge for current methods. To address this
problem, we propose a multi-view algorithm based on constrained clustering that
can operate with an incomplete mapping. Given a set of pairwise constraints in
each view, our approach propagates these constraints using a local similarity
measure to those instances that can be mapped to the other views, allowing the
propagated constraints to be transferred across views via the partial mapping.
It uses co-EM to iteratively estimate the propagation within each view based on
the current clustering model, transfer the constraints across views, and then
update the clustering model. By alternating the learning process between views,
this approach produces a unified clustering model that is consistent with all
views. We show that this approach significantly improves clustering performance
over several other methods for transferring constraints and allows multi-view
clustering to be reliably applied when given a limited mapping between the
views. Our evaluation reveals that the propagated constraints have high
precision with respect to the true clusters in the data, explaining their
benefit to clustering performance in both single- and multi-view learning
scenarios
Clustering documents with active learning using Wikipedia
Wikipedia has been applied as a background knowledge base to various text mining problems, but very few attempts have been made to utilize it for document clustering. In this paper we propose to exploit the semantic knowledge in Wikipedia for clustering, enabling the automatic grouping of documents with similar themes. Although clustering is intrinsically unsupervised, recent research has shown that incorporating supervision improves clustering performance, even when limited supervision is provided. The approach presented in this paper applies supervision using active learning. We first utilize Wikipedia to create a concept-based representation of a text document, with each concept associated to a Wikipedia article. We then exploit the semantic relatedness between Wikipedia concepts to find pair-wise instance-level constraints for supervised clustering, guiding clustering towards the direction indicated by the constraints. We test our approach on three standard text document datasets. Empirical results show that our basic document representation strategy yields comparable performance to previous attempts; and adding constraints improves clustering performance further by up to 20%
Statistical Mechanics of Semi-Supervised Clustering in Sparse Graphs
We theoretically study semi-supervised clustering in sparse graphs in the
presence of pairwise constraints on the cluster assignments of nodes. We focus
on bi-cluster graphs, and study the impact of semi-supervision for varying
constraint density and overlap between the clusters. Recent results for
unsupervised clustering in sparse graphs indicate that there is a critical
ratio of within-cluster and between-cluster connectivities below which clusters
cannot be recovered with better than random accuracy. The goal of this paper is
to examine the impact of pairwise constraints on the clustering accuracy. Our
results suggests that the addition of constraints does not provide automatic
improvement over the unsupervised case. When the density of the constraints is
sufficiently small, their only impact is to shift the detection threshold while
preserving the criticality. Conversely, if the density of (hard) constraints is
above the percolation threshold, the criticality is suppressed and the
detection threshold disappears.Comment: 8 pages, 4 figure
- …