51,494 research outputs found

    Ensemble Generation Methods and Cluster Ensemble Selection with Constraints

    Get PDF
    聚类融合首先生成一个包含多个不同聚类成员的聚类成员集,然后将其合并为一个更准确的共识分区。学者们普遍认为对于优质的聚类融合,其聚类成员应彼此不同,同时每个聚类成员的质量也应维持在一个可接受的水平。许多算法可用于生成不同的基聚类划分。与分类集成相似,诸多研究关注不同聚类成员的生成过程,例如对不同数据子集进行聚类(随机抽样)以及对不同特征子集进行聚类(随机投影)。然而,很少有研究关注这两种不同的抽样方法在质量和差异性上的性能比较。在本文中,我们提出了一种基于随机抽样的聚类成员生成新方法,通过寻找最近邻样本的方式来填补抽样时缺失样本的类别信息(简称为RS-NN)。我们通过与基于传统K-means的聚...Cluster ensemble first generates a large library of different clustering solutions and then combines them into a more accurate consensus clustering. It is commonly accepted that for cluster ensemble to work well the member partitions should be different from each other, and meanwhile the quality of each partition should remain at an acceptable level. Many different strategies have been used to gen...学位:工学硕士院系专业:信息科学与技术学院_模式识别与智能系统学号:2322011115323

    Statistical Thermodynamics of Clustered Populations

    Full text link
    We present a thermodynamic theory for a generic population of MM individuals distributed into NN groups (clusters). We construct the ensemble of all distributions with fixed MM and NN, introduce a selection functional that embodies the physics that governs the population, and obtain the distribution that emerges in the scaling limit as the most probable among all distributions consistent with the given physics. We develop the thermodynamics of the ensemble and establish a rigorous mapping to thermodynamics. We treat the emergence of a so-called "giant component" as a formal phase transition and show that the criteria for its emergence are entirely analogous to the equilibrium conditions in molecular systems. We demonstrate the theory by an analytic model and confirm the predictions by Monte Carlo simulation.Comment: Minor edits to tex

    A Parameterized Galaxy Catalog Simulator for Testing Cluster Finding, Mass Estimation, and Photometric Redshift Estimation in Optical and Near-infrared Surveys

    Full text link
    We present a galaxy catalog simulator that converts N -body simulations with halo and subhalo catalogs into mock, multiband photometric catalogs. The simulator assigns galaxy properties to each subhalo in a way that reproduces the observed cluster galaxy halo occupation distribution, the radial and mass-dependent variation in fractions of blue galaxies, the luminosity functions in the cluster and the field, and the color-magnitude relation in clusters. Moreover, the evolution of these parameters is tuned to match existing observational constraints. Parameterizing an ensemble of cluster galaxy properties enables us to create mock catalogs with variations in those properties, which in turn allows us to quantify the sensitivity of cluster finding to current observational uncertainties in these properties. Field galaxies are sampled from existing multiband photometric surveys of similar depth. We present an application of the catalog simulator to characterize the selection function and contamination of a galaxy cluster finder that utilizes the cluster red sequence together with galaxy clustering on the sky. We estimate systematic uncertainties in the selection to be at the ≤15% level with current observational constraints on cluster galaxy populations and their evolution. We find the contamination in this cluster finder to be ~35% to redshift z ~ 0.6. In addition, we use the mock galaxy catalogs to test the optical mass indicator B gc and a red-sequence redshift estimator. We measure the intrinsic scatter of the B gc -mass relation to be approximately log normal with ##IMG## [http://ej.iop.org/images/0004-637X/747/1/58/apj417488ieqn1.gif] {\sigma _{\log _{10M\sim 0.25 and we demonstrate photometric redshift accuracies for massive clusters at the ~3% level out to z ~ 0.7.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/98548/1/0004-637X_747_1_58.pd

    Analysis of Sunyaev-Zel'dovich Effect Mass-Observable Relations using South Pole Telescope Observations of an X-ray Selected Sample of Low Mass Galaxy Clusters and Groups

    Full text link
    (Abridged) We use 95, 150, and 220GHz observations from the SPT to examine the SZE signatures of a sample of 46 X-ray selected groups and clusters drawn from ~6 deg^2 of the XMM-BCS. These systems extend to redshift z=1.02, have characteristic masses ~3x lower than clusters detected directly in the SPT data and probe the SZE signal to the lowest X-ray luminosities (>10^42 erg s^-1) yet. We develop an analysis tool that combines the SZE information for the full ensemble of X-ray-selected clusters. Using X-ray luminosity as a mass proxy, we extract selection-bias corrected constraints on the SZE significance- and Y_500-mass relations. The SZE significance- mass relation is in good agreement with an extrapolation of the relation obtained from high mass clusters. However, the fit to the Y_500-mass relation at low masses, while in good agreement with the extrapolation from high mass SPT clusters, is in tension at 2.8 sigma with the constraints from the Planck sample. We examine the tension with the Planck relation, discussing sample differences and biases that could contribute. We also present an analysis of the radio galaxy point source population in this ensemble of X-ray selected systems. We find 18 of our systems have 843 MHz SUMSS sources within 2 arcmin of the X-ray centre, and three of these are also detected at significance >4 by SPT. Of these three, two are associated with the group brightest cluster galaxies, and the third is likely an unassociated quasar candidate. We examine the impact of these point sources on our SZE scaling relation analyses and find no evidence of biases. We also examine the impact of dusty galaxies using constraints from the 220 GHz data. The stacked sample provides 2.8σ\sigma significant evidence of dusty galaxy flux, which would correspond to an average underestimate of the SPT Y_500 signal that is (17+-9) per cent in this sample of low mass systems.Comment: 15 pages, 7 figure

    Implications of multiple high-redshift galaxy clusters

    Get PDF
    To date, 14 high-redshift (z>1.0) galaxy clusters with mass measurements have been observed, spectroscopically confirmed and are reported in the literature. These objects should be exceedingly rare in the standard LCDM model. We conservatively approximate the selection functions of these clusters' parent surveys, and quantify the tension between the abundances of massive clusters as predicted by the standard LCDM model and the observed ones. We alleviate the tension considering non-Gaussian primordial perturbations of the local type, characterized by the parameter fnl and derive constraints on fnl arising from the mere existence of these clusters. At the 95% confidence level, fnl>467 with cosmological parameters fixed to their most likely WMAP5 values, or fnl > 123 (at 95% confidence) if we marginalize over WMAP5 parameters priors. In combination with fnl constraints from Cosmic Microwave Background and halo bias, this determination implies a scale-dependence of fnl at approx. 3 sigma. Given the assumptions made in the analysis, we expect any future improvements to the modeling of the non-Gaussian mass function, survey volumes, or selection functions to increase the significance of fnl>0 found here. In order to reconcile these massive, high-z clusters with an fnl=0, their masses would need to be systematically lowered by 1.5 sigma or the sigma8 parameter should be approx. 3 sigma higher than CMB (and large-scale structure) constraints. The existence of these objects is a puzzle: it either represents a challenge to the LCDM paradigme or it is an indication that the mass estimates of clusters is dramatically more uncertain than we think.Comment: 11 pages, 7 figures, modified to match published versio

    CLASH: Weak-Lensing Shear-and-Magnification Analysis of 20 Galaxy Clusters

    Get PDF
    We present a joint shear-and-magnification weak-lensing analysis of a sample of 16 X-ray-regular and 4 high-magnification galaxy clusters at 0.19<z<0.69 selected from the Cluster Lensing And Supernova survey with Hubble (CLASH). Our analysis uses wide-field multi-color imaging, taken primarily with Suprime-Cam on the Subaru Telescope. From a stacked shear-only analysis of the X-ray-selected subsample, we detect the ensemble-averaged lensing signal with a total signal-to-noise ratio of ~25 in the radial range of 200 to 3500kpc/h. The stacked tangential-shear signal is well described by a family of standard density profiles predicted for dark-matter-dominated halos in gravitational equilibrium, namely the Navarro-Frenk-White (NFW), truncated variants of NFW, and Einasto models. For the NFW model, we measure a mean concentration of c200c=4.010.32+0.35c_{200c}=4.01^{+0.35}_{-0.32} at M200c=1.340.09+0.101015MM_{200c}=1.34^{+0.10}_{-0.09} 10^{15}M_{\odot}. We show this is in excellent agreement with Lambda cold-dark-matter (LCDM) predictions when the CLASH X-ray selection function and projection effects are taken into account. The best-fit Einasto shape parameter is αE=0.1910.068+0.071\alpha_E=0.191^{+0.071}_{-0.068}, which is consistent with the NFW-equivalent Einasto parameter of 0.18\sim 0.18. We reconstruct projected mass density profiles of all CLASH clusters from a joint likelihood analysis of shear-and-magnification data, and measure cluster masses at several characteristic radii. We also derive an ensemble-averaged total projected mass profile of the X-ray-selected subsample by stacking their individual mass profiles. The stacked total mass profile, constrained by the shear+magnification data, is shown to be consistent with our shear-based halo-model predictions including the effects of surrounding large-scale structure as a two-halo term, establishing further consistency in the context of the LCDM model.Comment: Accepted by ApJ on 11 August 2014. Textual changes to improve clarity (e.g., Sec.3.2.2 "Number-count Depletion", Sec.4.3 "Shape Measurement", Sec.4.4 "Background Galaxy Selection"). Results and conclusions remain unchanged. For the public release of Subaru data, see http://archive.stsci.edu/prepds/clash

    EC3: Combining Clustering and Classification for Ensemble Learning

    Full text link
    Classification and clustering algorithms have been proved to be successful individually in different contexts. Both of them have their own advantages and limitations. For instance, although classification algorithms are more powerful than clustering methods in predicting class labels of objects, they do not perform well when there is a lack of sufficient manually labeled reliable data. On the other hand, although clustering algorithms do not produce label information for objects, they provide supplementary constraints (e.g., if two objects are clustered together, it is more likely that the same label is assigned to both of them) that one can leverage for label prediction of a set of unknown objects. Therefore, systematic utilization of both these types of algorithms together can lead to better prediction performance. In this paper, We propose a novel algorithm, called EC3 that merges classification and clustering together in order to support both binary and multi-class classification. EC3 is based on a principled combination of multiple classification and multiple clustering methods using an optimization function. We theoretically show the convexity and optimality of the problem and solve it by block coordinate descent method. We additionally propose iEC3, a variant of EC3 that handles imbalanced training data. We perform an extensive experimental analysis by comparing EC3 and iEC3 with 14 baseline methods (7 well-known standalone classifiers, 5 ensemble classifiers, and 2 existing methods that merge classification and clustering) on 13 standard benchmark datasets. We show that our methods outperform other baselines for every single dataset, achieving at most 10% higher AUC. Moreover our methods are faster (1.21 times faster than the best baseline), more resilient to noise and class imbalance than the best baseline method.Comment: 14 pages, 7 figures, 11 table

    Looking for bimodal distributions in multi-fragmentation reactions

    Get PDF
    The presence of a phase transition in a finite system can be deduced, together with its order, from the shape of the distribution of the order parameter. This issue has been extensively studied in multifragmentation experiments, with results that do not appear fully consistent. In this paper we discuss the effect of the statistical ensemble or sorting conditions on the shape of fragment distributions, and propose a new method, which can be easily implemented experimentally, to discriminate between different fragmentation scenarii. This method, based on a reweighting of the measured distribution to account for the experimental constraints linked to the energy deposit, is tested on different simple models, and appears to provide a powerful discrimination.Comment: 11 pages, 7 figure

    Probing dark energy with cluster counts and cosmic shear power spectra: including the full covariance

    Full text link
    (Abridged) Combining cosmic shear power spectra and cluster counts is powerful to improve cosmological parameter constraints and/or test inherent systematics. However they probe the same cosmic mass density field, if the two are drawn from the same survey region, and therefore the combination may be less powerful than first thought. We investigate the cross-covariance between the cosmic shear power spectra and the cluster counts based on the halo model approach, where the cross-covariance arises from the three-point correlations of the underlying mass density field. Fully taking into account the cross-covariance as well as non-Gaussian errors on the lensing power spectrum covariance, we find a significant cross-correlation between the lensing power spectrum signals at multipoles l~10^3 and the cluster counts containing halos with masses M>10^{14}Msun. Including the cross-covariance for the combined measurement degrades and in some cases improves the total signal-to-noise ratios up to plus or minus 20% relative to when the two are independent. For cosmological parameter determination, the cross-covariance has a smaller effect as a result of working in a multi-dimensional parameter space, implying that the two observables can be considered independent to a good approximation. We also discuss that cluster count experiments using lensing-selected mass peaks could be more complementary to cosmic shear tomography than mass-selected cluster counts of the corresponding mass threshold. Using lensing selected clusters with a realistic usable detection threshold (S/N~6 for a ground-based survey), the uncertainty on each dark energy parameter may be roughly halved by the combined experiments, relative to using the power spectra alone.Comment: 32 pages, 15 figures. Revised version, invited original contribution to gravitational lensing focus issue, New Journal of Physic
    corecore