10,692 research outputs found

    Unifying Sparsest Cut, Cluster Deletion, and Modularity Clustering Objectives with Correlation Clustering

    Get PDF
    Graph clustering, or community detection, is the task of identifying groups of closely related objects in a large network. In this paper we introduce a new community-detection framework called LambdaCC that is based on a specially weighted version of correlation clustering. A key component in our methodology is a clustering resolution parameter, λ\lambda, which implicitly controls the size and structure of clusters formed by our framework. We show that, by increasing this parameter, our objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs. Our methodology unifies and generalizes a number of other important clustering quality functions including modularity, sparsest cut, and cluster deletion, and places them all within the context of an optimization problem that has been well studied from the perspective of approximation algorithms. Our approach is particularly relevant in the regime of finding dense clusters, as it leads to a 2-approximation for the cluster deletion problem. We use our approach to cluster several graphs, including large collaboration networks and social networks

    Robust Low-Rank Subspace Segmentation with Semidefinite Guarantees

    Full text link
    Recently there is a line of research work proposing to employ Spectral Clustering (SC) to segment (group){Throughout the paper, we use segmentation, clustering, and grouping, and their verb forms, interchangeably.} high-dimensional structural data such as those (approximately) lying on subspaces {We follow {liu2010robust} and use the term "subspace" to denote both linear subspaces and affine subspaces. There is a trivial conversion between linear subspaces and affine subspaces as mentioned therein.} or low-dimensional manifolds. By learning the affinity matrix in the form of sparse reconstruction, techniques proposed in this vein often considerably boost the performance in subspace settings where traditional SC can fail. Despite the success, there are fundamental problems that have been left unsolved: the spectrum property of the learned affinity matrix cannot be gauged in advance, and there is often one ugly symmetrization step that post-processes the affinity for SC input. Hence we advocate to enforce the symmetric positive semidefinite constraint explicitly during learning (Low-Rank Representation with Positive SemiDefinite constraint, or LRR-PSD), and show that factually it can be solved in an exquisite scheme efficiently instead of general-purpose SDP solvers that usually scale up poorly. We provide rigorous mathematical derivations to show that, in its canonical form, LRR-PSD is equivalent to the recently proposed Low-Rank Representation (LRR) scheme {liu2010robust}, and hence offer theoretic and practical insights to both LRR-PSD and LRR, inviting future research. As per the computational cost, our proposal is at most comparable to that of LRR, if not less. We validate our theoretic analysis and optimization scheme by experiments on both synthetic and real data sets.Comment: 10 pages, 4 figures. Accepted by ICDM Workshop on Optimization Based Methods for Emerging Data Mining Problems (OEDM), 2010. Main proof simplified and typos corrected. Experimental data slightly adde

    Distribution, diversity and evolution of endogenous retroviruses in perissodactyl genomes

    Get PDF
    The evolution of mammalian genomes has been shaped by interactions with endogenous retroviruses (ERVs). In this study, we investigated the distribution and diversity of ERVs in the mammalian order Perissodactyla, with a view to understanding their impact on the evolution of modern equids (family Equidae). We characterize the major ERV lineages in the horse genome in terms of their genomic distribution, ancestral genome organization and time of activity. Our results show that subsequent to their ancestral divergence from rhinos and tapirs, equids acquired four novel ERV lineages. We show that two of these proliferated extensively in the lineage leading to modern horses, and one contains loci that are actively transcribed in specific tissues. In addition, we show that the white rhinoceros has resisted germline colonisation by retroviruses for over 54 million years - longer than any other extant mammalian species. The map of equine ERVs that we provide here will be of great utility to future studies aiming to investigate the potential functional roles of equine ERVs, and their impact on equine evolution

    The clustering of galaxies in the SDSS-III Baryon Oscillation Spectroscopic Survey : measuring DA and H at z = 0.57 from the baryon acoustic peak in the Data Release 9 spectroscopic Galaxy sample

    Get PDF
    We present measurements of the angular diameter distance to and Hubble parameter at z = 0.57 from the measurement of the baryon acoustic peak in the correlation of galaxies from the Sloan Digital Sky Survey III Baryon Oscillation Spectroscopic Survey. Our analysis is based on a sample from Data Release 9 of 264 283 galaxies over 3275 square degrees in the redshift range 0.43 < z < 0.70. We use two different methods to provide robust measurement of the acoustic peak position across and along the line of sight in order to measure the cosmological distance scale. We find DA(0.57) = 1408 ± 45 Mpc and H(0.57) = 92.9 ± 7.8 km s−1 Mpc−1 for our fiducial value of the sound horizon. These results from the anisotropic fitting are fully consistent with the analysis of the spherically averaged acoustic peak position presented in Anderson et al. Our distance measurements are a close match to the predictions of the standard cosmological model featuring a cosmological constant and zero spatial curvature.Publisher PDFPeer reviewe

    A SOCIO-COGNITIVE BASIS FOR STRATEGIC GROUPS: COGNITIVE DISSONANCE IN SWINE GENETICS

    Get PDF
    Institutional and Behavioral Economics, Livestock Production/Industries,

    Analysis of the giant genomes of Fritillaria (Liliaceae) indicates that a lack of DNA removal characterizes extreme expansions in genome size.

    Get PDF
    This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.Plants exhibit an extraordinary range of genome sizes, varying by > 2000-fold between the smallest and largest recorded values. In the absence of polyploidy, changes in the amount of repetitive DNA (transposable elements and tandem repeats) are primarily responsible for genome size differences between species. However, there is ongoing debate regarding the relative importance of amplification of repetitive DNA versus its deletion in governing genome size. Using data from 454 sequencing, we analysed the most repetitive fraction of some of the largest known genomes for diploid plant species, from members of Fritillaria. We revealed that genomic expansion has not resulted from the recent massive amplification of just a handful of repeat families, as shown in species with smaller genomes. Instead, the bulk of these immense genomes is composed of highly heterogeneous, relatively low-abundance repeat-derived DNA, supporting a scenario where amplified repeats continually accumulate due to infrequent DNA removal. Our results indicate that a lack of deletion and low turnover of repetitive DNA are major contributors to the evolution of extremely large genomes and show that their size cannot simply be accounted for by the activity of a small number of high-abundance repeat families.Thiswork was supported by the Natural Environment ResearchCouncil (grant no. NE/G017 24/1), the Czech Science Fou nda-tion (grant no. P501/12/G090), the AVCR (grant no.RVO:60077344) and a Beatriu de Pinos postdoctoral fellowshipto J.P. (grant no. 2011-A-00292; Catalan Government-E.U. 7thF.P.)

    The association between county political inclination and obesity: Results from the 2012 presidential election in the United States.

    Get PDF
    ObjectiveWe examined whether stable, county-level, voter preferences were significantly associated with county-level obesity prevalence using data from the 2012 US Presidential election. County voting preference for the 2012 Republican Party presidential candidate was used as a proxy for voter endorsement of personal responsibility approaches to reducing population obesity risk versus approaches featuring government-sponsored, multi-sectoral efforts like those recommended by the Centers for Disease Control Centers for Disease Control (CDC, 2009).MethodCartographic visualization and spatial analysis were used to evaluate the geographic clustering of obesity prevalence rates by county, and county-level support for the Republican Party candidate in the 2012 U.S. presidential election. The spatial analysis informed the spatial econometric approach employed to model the relationship between political preferences and other covariates with obesity prevalence.ResultsAfter controlling for poverty rate, percent African American and Latino populations, educational attainment, and spatial autocorrelation in the error term, we found that higher county-level obesity prevalence rates were associated with higher levels of support for the 2012 Republican Party presidential candidate.ConclusionFuture public health efforts to understand and reduce obesity risk may benefit from increased surveillance of this and similar linkages between political preferences and health risks
    • 

    corecore