1,467 research outputs found

    Spectral Clustering with Imbalanced Data

    Full text link
    Spectral clustering is sensitive to how graphs are constructed from data particularly when proximal and imbalanced clusters are present. We show that Ratio-Cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced data since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced data. Our approach parameterizes a family of graphs, by adaptively modulating node degrees on a fixed node set, to yield a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach. We demonstrate the superiority of our method through unsupervised and semi-supervised experiments on synthetic and real data sets.Comment: 24 pages, 7 figures. arXiv admin note: substantial text overlap with arXiv:1302.513

    Clustering and Community Detection with Imbalanced Clusters

    Full text link
    Spectral clustering methods which are frequently used in clustering and community detection applications are sensitive to the specific graph constructions particularly when imbalanced clusters are present. We show that ratio cut (RCut) or normalized cut (NCut) objectives are not tailored to imbalanced cluster sizes since they tend to emphasize cut sizes over cut values. We propose a graph partitioning problem that seeks minimum cut partitions under minimum size constraints on partitions to deal with imbalanced cluster sizes. Our approach parameterizes a family of graphs by adaptively modulating node degrees on a fixed node set, yielding a set of parameter dependent cuts reflecting varying levels of imbalance. The solution to our problem is then obtained by optimizing over these parameters. We present rigorous limit cut analysis results to justify our approach and demonstrate the superiority of our method through experiments on synthetic and real datasets for data clustering, semi-supervised learning and community detection.Comment: Extended version of arXiv:1309.2303 with new applications. Accepted to IEEE TSIP

    Applications of incidence bounds in point covering problems

    Get PDF
    In the Line Cover problem a set of n points is given and the task is to cover the points using either the minimum number of lines or at most k lines. In Curve Cover, a generalization of Line Cover, the task is to cover the points using curves with d degrees of freedom. Another generalization is the Hyperplane Cover problem where points in d-dimensional space are to be covered by hyperplanes. All these problems have kernels of polynomial size, where the parameter is the minimum number of lines, curves, or hyperplanes needed. First we give a non-parameterized algorithm for both problems in O*(2^n) (where the O*(.) notation hides polynomial factors of n) time and polynomial space, beating a previous exponential-space result. Combining this with incidence bounds similar to the famous Szemeredi-Trotter bound, we present a Curve Cover algorithm with running time O*((Ck/log k)^((d-1)k)), where C is some constant. Our result improves the previous best times O*((k/1.35)^k) for Line Cover (where d=2), O*(k^(dk)) for general Curve Cover, as well as a few other bounds for covering points by parabolas or conics. We also present an algorithm for Hyperplane Cover in R^3 with running time O*((Ck^2/log^(1/5) k)^k), improving on the previous time of O*((k^2/1.3)^k).Comment: SoCG 201

    BSP-fields: An Exact Representation of Polygonal Objects by Differentiable Scalar Fields Based on Binary Space Partitioning

    Get PDF
    The problem considered in this work is to find a dimension independent algorithm for the generation of signed scalar fields exactly representing polygonal objects and satisfying the following requirements: the defining real function takes zero value exactly at the polygonal object boundary; no extra zero-value isosurfaces should be generated; C1 continuity of the function in the entire domain. The proposed algorithms are based on the binary space partitioning (BSP) of the object by the planes passing through the polygonal faces and are independent of the object genus, the number of disjoint components, and holes in the initial polygonal mesh. Several extensions to the basic algorithm are proposed to satisfy the selected optimization criteria. The generated BSP-fields allow for applying techniques of the function-based modeling to already existing legacy objects from CAD and computer animation areas, which is illustrated by several examples

    Ensemble Committees for Stock Return Classification and Prediction

    Full text link
    This paper considers a portfolio trading strategy formulated by algorithms in the field of machine learning. The profitability of the strategy is measured by the algorithm's capability to consistently and accurately identify stock indices with positive or negative returns, and to generate a preferred portfolio allocation on the basis of a learned model. Stocks are characterized by time series data sets consisting of technical variables that reflect market conditions in a previous time interval, which are utilized produce binary classification decisions in subsequent intervals. The learned model is constructed as a committee of random forest classifiers, a non-linear support vector machine classifier, a relevance vector machine classifier, and a constituent ensemble of k-nearest neighbors classifiers. The Global Industry Classification Standard (GICS) is used to explore the ensemble model's efficacy within the context of various fields of investment including Energy, Materials, Financials, and Information Technology. Data from 2006 to 2012, inclusive, are considered, which are chosen for providing a range of market circumstances for evaluating the model. The model is observed to achieve an accuracy of approximately 70% when predicting stock price returns three months in advance.Comment: 15 pages, 4 figures, Neukom Institute Computational Undergraduate Research prize - second plac
    corecore