Search CORE

215,431 research outputs found

Robust Correlation Clustering

Author: Devvrit
Krishnaswamy Ravishankar
Rajaraman Nived
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

In this paper, we introduce and study the Robust-Correlation-Clustering problem: given a graph G = (V,E) where every edge is either labeled + or - (denoting similar or dissimilar pairs of vertices), and a parameter m, the goal is to delete a set D of m vertices, and partition the remaining vertices V D into clusters to minimize the cost of the clustering, which is the sum of the number of + edges with end-points in different clusters and the number of - edges with end-points in the same cluster. This generalizes the classical Correlation-Clustering problem which is the special case when m = 0. Correlation clustering is useful when we have (only) qualitative information about the similarity or dissimilarity of pairs of points, and Robust-Correlation-Clustering equips this model with the capability to handle noise in datasets. In this work, we present a constant-factor bi-criteria algorithm for Robust-Correlation-Clustering on complete graphs (where our solution is O(1)-approximate w.r.t the cost while however discarding O(1) m points as outliers), and also complement this by showing that no finite approximation is possible if we do not violate the outlier budget. Our algorithm is very simple in that it first does a simple LP-based pre-processing to delete O(m) vertices, and subsequently runs a particular Correlation-Clustering algorithm ACNAlg [Ailon et al., 2005] on the residual instance. We then consider general graphs, and show (O(log n), O(log^2 n)) bi-criteria algorithms while also showing a hardness of alpha_MC on both the cost and the outlier violation, where alpha_MC is the lower bound for the Minimum-Multicut problem

Dagstuhl Research Online Publication Server

The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey: Analysis of potential systematics

Author: Abazajian
Adam S. Bolton
Adelman-McCarthy
Aihara
Alaina Shelden
Anderson
Antonio J. Cuesta
Antonio Montero Dorta
Ariel G. Sánchez
Ashley J. Ross
Audrey Simmons
Beth Reid
Blake
Blake
Blanton
Cameron K. McBride
Cole
Daniel J. Eisenstein
Daniel Oravetz
David A. Wake
David J. Schlegel
Dmitry Bizyaev
Donald P. Schneider
Eisenstein
Eisenstein
Elena Malanushenko
Eyal Kazin
Feldman
Francesco Montesano
Francisco Prada
Fukugita
Gong-bo Zhao
Gunn
Gunn
Gómez
Górski
Hamilton
Hamilton
Hamilton
Hong Guo
Howard Brewington
Idit Zehavi
Jean-Christophe Hamilton
John Parejko
Kaike Pan
Kaiser
Lado Samushia
Landy
Larson
Manera
Manera
Marc Manera
Mariana Vargas Magaña
Martin White
Masters
Michael A. Strauss
Michael Blanton
Molly E. C. Swanson
Montesano
Nikhil Padmanabhan
Norberg
Nuza
Padmanabhan
Percival
Reid
Reid
Reid
Rita Tojeiro
Robert C. Nichol
Ross
Ross
Samushia
Sanchez
Schlafly
Schlafly
Schlegel
Scoccimarro
Shirley Ho
Stephanie Snedden
Stephen Bailey
Stoughton
Strateva
Swanson
Swanson
Sylos
Tegmark
Tegmark
Thomas
Tinker
Tojeiro
Tojeiro
Viktor Malanushenko
Wake
White
Will J. Percival
Xiaoying Xu
Yahata
York
Zehavi
Zheng
Publication venue: 'Wiley'
Publication date: 01/01/2012
Field of study

We analyze the density field of galaxies observed by the Sloan Digital Sky Survey (SDSS)-III Baryon Oscillation Spectroscopic Survey (BOSS) included in the SDSS Data Release Nine (DR9). DR9 includes spectroscopic redshifts for over 400,000 galaxies spread over a footprint of 3,275 deg^2. We identify, characterize, and mitigate the impact of sources of systematic uncertainty on large-scale clustering measurements, both for angular moments of the redshift-space correlation function and the spherically averaged power spectrum, P(k), in order to ensure that robust cosmological constraints will be obtained from these data. A correlation between the projected density of stars and the higher redshift (0.43 < z < 0.7) galaxy sample (the `CMASS' sample) due to imaging systematics imparts a systematic error that is larger than the statistical error of the clustering measurements at scales s > 120h^-1Mpc or k < 0.01hMpc^-1. We find that these errors can be ameliorated by weighting galaxies based on their surface brightness and the local stellar density. We use mock galaxy catalogs that simulate the CMASS selection function to determine that randomly selecting galaxy redshifts in order to simulate the radial selection function of a random sample imparts the least systematic error on correlation function measurements and that this systematic error is negligible for the spherically averaged correlation function. The methods we recommend for the calculation of clustering measurements using the CMASS sample are adopted in companion papers that locate the position of the baryon acoustic oscillation feature (Anderson et al. 2012), constrain cosmological models using the full shape of the correlation function (Sanchez et al. 2012), and measure the rate of structure growth (Reid et al. 2012). (abridged)Comment: Matches version accepted by MNRAS. Clarifications and references have been added. See companion papers that share the "The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey:" titl

arXiv.org e-Print Archive

HAL-IN2P3

Crossref

Open Research Online (The Open University)

Portsmouth University Research Portal (Pure)

University of St. Andrews - Pure

St Andrews Research Repository

Sharpest possible clustering bounds using robust random graph analysis

Author: Brugman Judith
Stegehuis Clara
van Leeuwaarden Johan S. H.
Publication venue: 'American Physical Society (APS)'
Publication date: 10/11/2022
Field of study

Complex network theory crucially depends on the assumptions made about the degree distribution, while fitting degree distributions to network data is challenging, in particular for scale-free networks with power-law degrees. We present a robust assessment of complex networks that does not depend on the entire degree distribution, but only on its mean, range and dispersion: summary statistics that are easy to obtain for most real-world networks. By solving several semi-infinite linear programs, we obtain tight (the sharpest possible) bounds for correlation and clustering measures, for all networks with degree distributions that share the same summary statistics. We identify various extremal random graphs that attain these tight bounds as the graphs with specific three-point degree distributions. We leverage the tight bounds to obtain robust laws that explain how degree-degree correlations and local clustering evolve as function of node degrees and network size. These robust laws indicate that power-law networks with diverging variance are among the most extreme networks in terms of correlation and clustering, building further theoretical foundation for widely reported scale-free network phenomena such as correlation and clustering decay

arXiv.org e-Print Archive

AMADA-Analysis of Multidimensional Astronomical Datasets

Author: Ciardi Benedetta
de Souza Rafael S.
Publication venue
Publication date: 01/01/2015
Field of study

We present AMADA, an interactive web application to analyse multidimensional datasets. The user uploads a simple ASCII file and AMADA performs a number of exploratory analysis together with contemporary visualizations diagnostics. The package performs a hierarchical clustering in the parameter space, and the user can choose among linear, monotonic or non-linear correlation analysis. AMADA provides a number of clustering visualization diagnostics such as heatmaps, dendrograms, chord diagrams, and graphs. In addition, AMADA has the option to run a standard or robust principal components analysis, displaying the results as polar bar plots. The code is written in R and the web interface was created using the Shiny framework. AMADA source-code is freely available at https://goo.gl/KeSPue, and the shiny-app at http://goo.gl/UTnU7I.Comment: Accepted for publication in Astronomy & Computin

arXiv.org e-Print Archive

Repository of the Academy's Library

ELTE Digital Institutional Repository (EDIT)

MPG.PuRe

Robust Inference with Clustered Data

Author: A. Colin Cameron
Douglas L. Miller
Publication venue
Publication date
Field of study

In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical significance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Duflo and Mullainathan (2004). We emphasize OLS estimation with statistical inference based on minimal assumptions regarding the error correlation process. Complications we consider include cluster-specific fixed effects, few clusters, multi-way clustering, more efficient feasible GLS estimation, and adaptation to nonlinear and instrumental variables estimators.Cluster robust, random eects, xed eects, dierences in dierences, cluster bootstrap, few clusters, multi-way clusters.

Research Papers in Economics

Robust Inference with Clustered Data

Author: A. Colin Cameron
Douglas L. Miller
Publication venue
Publication date
Field of study

In this paper we survey methods to control for regression model error that is correlated within groups or clusters, but is uncorrelated across groups or clusters. Then failure to control for the clustering can lead to understatement of standard errors and overstatement of statistical significance, as emphasized most notably in empirical studies by Moulton (1990) and Bertrand, Duflo and Mullainathan (2004). We emphasize OLS estimation with statistical inference based on minimal assumptions regarding the error correlation process. Complications we consider include cluster-specific fixed effects, few clusters, multi-way clustering, more efficient feasible GLS estimation, and adaptation to nonlinear and instrumental variables estimators.Cluster robust, random effects, fixed effects, differences in differences, cluster bootstrap, few clusters, multi-way clusters.

Research Papers in Economics

Fine timing synchronization based on modified expectation maximization clustering algorithm for OFDM systems

Author: Wei Jibo
Xiao Pei
Zhang Lei
Zhang Xiaoying
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/06/2019
Field of study

A novel fine timing synchronization method based on the modified expectation-maximization (EM) clustering algorithm is proposed for orthogonal frequency-division multiplexing systems. Using the cross-correlation metrics of one preamble symbol, the cross-correlation peaks corresponding to the channel arriving paths are identified by the proposed modified EM clustering algorithm, the position of the first coherent cross-correlation peak is then chosen as the start of the frame. Computer simulations show that the proposed method is robust in multipath dispersive channels and achieves superior performance to existing techniques in terms of timing accuracy

University of Surrey

Enlighten

Surrey Research Insight