215,431 research outputs found

    Robust Correlation Clustering

    Get PDF
    In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets. In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem

    The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: Analysis of potential systematics

    Get PDF
    We analyze the density field of galaxies observed by the Sloan Digital Sky Survey (SDSS)-III Baryon Oscillation Spectroscopic Survey (BOSS) included in the SDSS Data Release Nine (DR9). DR9 includes spectroscopic redshifts for over 400,000 galaxies spread over a footprint of 3,275 deg^2. We identify, characterize, and mitigate the impact of sources of systematic uncertainty on large-scale clustering measurements, both for angular moments of the redshift-space correlation function and the spherically averaged power spectrum, P(k), in order to ensure that robust cosmological constraints will be obtained from these data. A correlation between the projected density of stars and the higher redshift (0.43 < z < 0.7) galaxy sample (the `CMASS' sample) due to imaging systematics imparts a systematic error that is larger than the statistical error of the clustering measurements at scales s > 120h^-1Mpc or k < 0.01hMpc^-1. We find that these errors can be ameliorated by weighting galaxies based on their surface brightness and the local stellar density. We use mock galaxy catalogs that simulate the CMASS selection function to determine that randomly selecting galaxy redshifts in order to simulate the radial selection function of a random sample imparts the least systematic error on correlation function measurements and that this systematic error is negligible for the spherically averaged correlation function. The methods we recommend for the calculation of clustering measurements using the CMASS sample are adopted in companion papers that locate the position of the baryon acoustic oscillation feature (Anderson et al. 2012), constrain cosmological models using the full shape of the correlation function (Sanchez et al. 2012), and measure the rate of structure growth (Reid et al. 2012). (abridged)Comment: Matches version accepted by MNRAS. Clarifications and references have been added. See companion papers that share the "The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey:" titl

    Sharpest possible clustering bounds using robust random graph analysis

    Full text link
    Complex network theory crucially depends on the assumptions made about the degree distribution, while fitting degree distributions to network data is challenging, in particular for scale-free networks with power-law degrees. We present a robust assessment of complex networks that does not depend on the entire degree distribution, but only on its mean, range and dispersion: summary statistics that are easy to obtain for most real-world networks. By solving several semi-infinite linear programs, we obtain tight (the sharpest possible) bounds for correlation and clustering measures, for all networks with degree distributions that share the same summary statistics. We identify various extremal random graphs that attain these tight bounds as the graphs with specific three-point degree distributions. We leverage the tight bounds to obtain robust laws that explain how degree-degree correlations and local clustering evolve as function of node degrees and network size. These robust laws indicate that power-law networks with diverging variance are among the most extreme networks in terms of correlation and clustering, building further theoretical foundation for widely reported scale-free network phenomena such as correlation and clustering decay

    AMADA-Analysis of Multidimensional Astronomical Datasets

    Get PDF
    We present AMADA, an interactive web application to analyse multidimensional datasets. The user uploads a simple ASCII file and AMADA performs a number of exploratory analysis together with contemporary visualizations diagnostics. The package performs a hierarchical clustering in the parameter space, and the user can choose among linear, monotonic or non-linear correlation analysis. AMADA provides a number of clustering visualization diagnostics such as heatmaps, dendrograms, chord diagrams, and graphs. In addition, AMADA has the option to run a standard or robust principal components analysis, displaying the results as polar bar plots. The code is written in R and the web interface was created using the Shiny framework. AMADA source-code is freely available at https://goo.gl/KeSPue, and the shiny-app at http://goo.gl/UTnU7I.Comment: Accepted for publication in Astronomy & Computin

    Robust Inference with Clustered Data

    Get PDF
    In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical significance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Duflo and Mullainathan (2004). We emphasize OLS estimation with statistical inference based on minimal assumptions regarding the error correlation process. Complications we consider include cluster-specific fixed effects, few clusters, multi-way clustering, more efficient feasible GLS estimation, and adaptation to nonlinear and instrumental variables estimators.Cluster robust, random eects, xed eects, dierences in dierences, cluster bootstrap, few clusters, multi-way clusters.

    Robust Inference with Clustered Data

    Get PDF
    In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical significance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Duflo and Mullainathan (2004). We emphasize OLS estimation with statistical inference based on minimal assumptions regarding the error correlation process. Complications we consider include cluster-specific fixed effects, few clusters, multi-way clustering, more efficient feasible GLS estimation, and adaptation to nonlinear and instrumental variables estimators.Cluster robust, random effects, fixed effects, differences in differences, cluster bootstrap, few clusters, multi-way clusters.

    Fine timing synchronization based on modified expectation maximization clustering algorithm for OFDM systems

    Get PDF
    A novel fine timing synchronization method based on the modified expectation-maximization (EM) clustering algorithm is proposed for orthogonal frequency-division multiplexing systems. Using the cross-correlation metrics of one preamble symbol, the cross-correlation peaks corresponding to the channel arriving paths are identified by the proposed modified EM clustering algorithm, the position of the first coherent cross-correlation peak is then chosen as the start of the frame. Computer simulations show that the proposed method is robust in multipath dispersive channels and achieves superior performance to existing techniques in terms of timing accuracy
    corecore