81,834 research outputs found

    Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

    Full text link
    In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

    Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters

    Full text link
    A large body of work has been devoted to defining and identifying clusters or communities in social and information networks. We explore from a novel perspective several questions related to identifying meaningful communities in large social and information networks, and we come to several striking conclusions. We employ approximation algorithms for the graph partitioning problem to characterize as a function of size the statistical and structural properties of partitions of graphs that could plausibly be interpreted as communities. In particular, we define the network community profile plot, which characterizes the "best" possible community--according to the conductance measure--over a wide range of size scales. We study over 100 large real-world social and information networks. Our results suggest a significantly more refined picture of community structure in large networks than has been appreciated previously. In particular, we observe tight communities that are barely connected to the rest of the network at very small size scales; and communities of larger size scales gradually "blend into" the expander-like core of the network and thus become less "community-like." This behavior is not explained, even at a qualitative level, by any of the commonly-used network generation models. Moreover, it is exactly the opposite of what one would expect based on intuition from expander graphs, low-dimensional or manifold-like graphs, and from small social networks that have served as testbeds of community detection algorithms. We have found that a generative graph model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community profile plot similar to what we observe in our network datasets.Comment: 66 pages, a much expanded version of our WWW 2008 pape

    Min st-Cut of a Planar Graph in O(n loglog n) Time

    Full text link
    Given a planar undirected n-vertex graph G with non-negative edge weights, we show how to compute, for given vertices s and t in G, a min st-cut in G in O(n loglog n) time and O(n) space. The previous best time bound was O(n log n).Comment: Added mainly details and corrections to the r-division sectio

    Which Unbounded Protocol for Envy Free Cake Cutting is Better?

    Full text link
    A division of a cake by n people is envy free if everyone thinks they got the biggest pieces. Note that peoples tastes can differ. There is a discrete protocol for envy free division for n=3 which takes at most 5 cuts. For n=4 and beyond there is a protocol but the number of cuts it takes is unbounded. In particular the number of cuts depends on peoples tastes. Given any number N peoples tastes can be set so that the algorithm takes over N cuts. There are three such algorithms known. Which is better? We have devised a way to measure the number of cuts even though it is unbounded. We use ordinals; therefore, a statement like "this protocol takes at most 2omega steps" makes sense. We analyse all three discrete algorithms for envy free cake cutting with this measure

    Hinged Dissection of Polyominoes and Polyforms

    Full text link
    A hinged dissection of a set of polygons S is a collection of polygonal pieces hinged together at vertices that can be folded into any member of S. We present a hinged dissection of all edge-to-edge gluings of n congruent copies of a polygon P that join corresponding edges of P. This construction uses kn pieces, where k is the number of vertices of P. When P is a regular polygon, we show how to reduce the number of pieces to ceiling(k/2)*(n-1). In particular, we consider polyominoes (made up of unit squares), polyiamonds (made up of equilateral triangles), and polyhexes (made up of regular hexagons). We also give a hinged dissection of all polyabolos (made up of right isosceles triangles), which do not fall under the general result mentioned above. Finally, we show that if P can be hinged into Q, then any edge-to-edge gluing of n congruent copies of P can be hinged into any edge-to-edge gluing of n congruent copies of Q.Comment: 27 pages, 39 figures. Accepted to Computational Geometry: Theory and Applications. v3 incorporates several comments by referees. v2 added many new results and a new coauthor (Frederickson

    Better Tradeoffs for Exact Distance Oracles in Planar Graphs

    Full text link
    We present an O(n1.5)O(n^{1.5})-space distance oracle for directed planar graphs that answers distance queries in O(logn)O(\log n) time. Our oracle both significantly simplifies and significantly improves the recent oracle of Cohen-Addad, Dahlgaard and Wulff-Nilsen [FOCS 2017], which uses O(n5/3)O(n^{5/3})-space and answers queries in O(logn)O(\log n) time. We achieve this by designing an elegant and efficient point location data structure for Voronoi diagrams on planar graphs. We further show a smooth tradeoff between space and query-time. For any S[n,n2]S\in [n,n^2], we show an oracle of size SS that answers queries in O~(max{1,n1.5/S})\tilde O(\max\{1,n^{1.5}/S\}) time. This new tradeoff is currently the best (up to polylogarithmic factors) for the entire range of SS and improves by polynomial factors over all the previously known tradeoffs for the range S[n,n5/3]S \in [n,n^{5/3}]

    Waste Makes Haste: Bounded Time Protocols for Envy-Free Cake Cutting with Free Disposal

    Full text link
    We consider the classic problem of envy-free division of a heterogeneous good ("cake") among several agents. It is known that, when the allotted pieces must be connected, the problem cannot be solved by a finite algorithm for 3 or more agents. The impossibility result, however, assumes that the entire cake must be allocated. In this paper we replace the entire-allocation requirement with a weaker \emph{partial-proportionality} requirement: the piece given to each agent must be worth for it at least a certain positive fraction of the entire cake value. We prove that this version of the problem is solvable in bounded time even when the pieces must be connected. We present simple, bounded-time envy-free cake-cutting algorithms for: (1) giving each of nn agents a connected piece with a positive value; (2) giving each of 3 agents a connected piece worth at least 1/3; (3) giving each of 4 agents a connected piece worth at least 1/7; (4) giving each of 4 agents a disconnected piece worth at least 1/4; (5) giving each of nn agents a disconnected piece worth at least (1ϵ)/n(1-\epsilon)/n for any positive ϵ\epsilon.Comment: The first version was presented at AAMAS 2015: http://dl.acm.org/citation.cfm?id=2773268 . The current version is substantially revised and extende

    Intersection Graphs of Pseudosegments: Chordal Graphs

    Full text link
    We investigate which chordal graphs have a representation as intersection graphs of pseudosegments. For positive we have a construction which shows that all chordal graphs that can be represented as intersection graph of subpaths on a tree are pseudosegment intersection graphs. We then study the limits of representability. We describe a family of intersection graphs of substars of a star which is not representable as intersection graph of pseudosegments. The degree of the substars in this example, however, has to get large. A more intricate analysis involving a Ramsey argument shows that even in the class of intersection graphs of substars of degree three of a star there are graphs that are not representable as intersection graph of pseudosegments. Motivated by representability questions for chordal graphs we consider how many combinatorially different k-segments, i.e., curves crossing k distinct lines, an arrangement of n pseudolines can host. We show that for fixed k this number is in O(n^2). This result is based on a k-zone theorem for arrangements of pseudolines that should be of independent interest.Comment: 20 pages, 13 figure

    Borel circle squaring

    Get PDF
    We give a completely constructive solution to Tarski's circle squaring problem. More generally, we prove a Borel version of an equidecomposition theorem due to Laczkovich. If k1k \geq 1 and A,BRkA, B \subseteq \mathbb{R}^k are bounded Borel sets with the same positive Lebesgue measure whose boundaries have upper Minkowski dimension less than kk, then AA and BB are equidecomposable by translations using Borel pieces. This answers a question of Wagon. Our proof uses ideas from the study of flows in graphs, and a recent result of Gao, Jackson, Krohne, and Seward on special types of witnesses to the hyperfiniteness of free Borel actions of Zd\mathbb{Z}^d.Comment: Minor typos correcte
    corecore