81,834 research outputs found
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters
A large body of work has been devoted to defining and identifying clusters or
communities in social and information networks. We explore from a novel
perspective several questions related to identifying meaningful communities in
large social and information networks, and we come to several striking
conclusions. We employ approximation algorithms for the graph partitioning
problem to characterize as a function of size the statistical and structural
properties of partitions of graphs that could plausibly be interpreted as
communities. In particular, we define the network community profile plot, which
characterizes the "best" possible community--according to the conductance
measure--over a wide range of size scales. We study over 100 large real-world
social and information networks. Our results suggest a significantly more
refined picture of community structure in large networks than has been
appreciated previously. In particular, we observe tight communities that are
barely connected to the rest of the network at very small size scales; and
communities of larger size scales gradually "blend into" the expander-like core
of the network and thus become less "community-like." This behavior is not
explained, even at a qualitative level, by any of the commonly-used network
generation models. Moreover, it is exactly the opposite of what one would
expect based on intuition from expander graphs, low-dimensional or
manifold-like graphs, and from small social networks that have served as
testbeds of community detection algorithms. We have found that a generative
graph model, in which new edges are added via an iterative "forest fire"
burning process, is able to produce graphs exhibiting a network community
profile plot similar to what we observe in our network datasets.Comment: 66 pages, a much expanded version of our WWW 2008 pape
Min st-Cut of a Planar Graph in O(n loglog n) Time
Given a planar undirected n-vertex graph G with non-negative edge weights, we
show how to compute, for given vertices s and t in G, a min st-cut in G in O(n
loglog n) time and O(n) space. The previous best time bound was O(n log n).Comment: Added mainly details and corrections to the r-division sectio
Which Unbounded Protocol for Envy Free Cake Cutting is Better?
A division of a cake by n people is envy free if everyone thinks they got the
biggest pieces. Note that peoples tastes can differ. There is a discrete
protocol for envy free division for n=3 which takes at most 5 cuts. For n=4 and
beyond there is a protocol but the number of cuts it takes is unbounded. In
particular the number of cuts depends on peoples tastes. Given any number N
peoples tastes can be set so that the algorithm takes over N cuts. There are
three such algorithms known. Which is better?
We have devised a way to measure the number of cuts even though it is
unbounded. We use ordinals; therefore, a statement like "this protocol takes at
most 2omega steps" makes sense. We analyse all three discrete algorithms for
envy free cake cutting with this measure
Hinged Dissection of Polyominoes and Polyforms
A hinged dissection of a set of polygons S is a collection of polygonal
pieces hinged together at vertices that can be folded into any member of S. We
present a hinged dissection of all edge-to-edge gluings of n congruent copies
of a polygon P that join corresponding edges of P. This construction uses kn
pieces, where k is the number of vertices of P. When P is a regular polygon, we
show how to reduce the number of pieces to ceiling(k/2)*(n-1). In particular,
we consider polyominoes (made up of unit squares), polyiamonds (made up of
equilateral triangles), and polyhexes (made up of regular hexagons). We also
give a hinged dissection of all polyabolos (made up of right isosceles
triangles), which do not fall under the general result mentioned above.
Finally, we show that if P can be hinged into Q, then any edge-to-edge gluing
of n congruent copies of P can be hinged into any edge-to-edge gluing of n
congruent copies of Q.Comment: 27 pages, 39 figures. Accepted to Computational Geometry: Theory and
Applications. v3 incorporates several comments by referees. v2 added many new
results and a new coauthor (Frederickson
Better Tradeoffs for Exact Distance Oracles in Planar Graphs
We present an -space distance oracle for directed planar graphs
that answers distance queries in time. Our oracle both
significantly simplifies and significantly improves the recent oracle of
Cohen-Addad, Dahlgaard and Wulff-Nilsen [FOCS 2017], which uses
-space and answers queries in time. We achieve this by
designing an elegant and efficient point location data structure for Voronoi
diagrams on planar graphs.
We further show a smooth tradeoff between space and query-time. For any , we show an oracle of size that answers queries in time. This new tradeoff is currently the best (up to
polylogarithmic factors) for the entire range of and improves by polynomial
factors over all the previously known tradeoffs for the range
Waste Makes Haste: Bounded Time Protocols for Envy-Free Cake Cutting with Free Disposal
We consider the classic problem of envy-free division of a heterogeneous good
("cake") among several agents. It is known that, when the allotted pieces must
be connected, the problem cannot be solved by a finite algorithm for 3 or more
agents. The impossibility result, however, assumes that the entire cake must be
allocated. In this paper we replace the entire-allocation requirement with a
weaker \emph{partial-proportionality} requirement: the piece given to each
agent must be worth for it at least a certain positive fraction of the entire
cake value. We prove that this version of the problem is solvable in bounded
time even when the pieces must be connected. We present simple, bounded-time
envy-free cake-cutting algorithms for: (1) giving each of agents a
connected piece with a positive value; (2) giving each of 3 agents a connected
piece worth at least 1/3; (3) giving each of 4 agents a connected piece worth
at least 1/7; (4) giving each of 4 agents a disconnected piece worth at least
1/4; (5) giving each of agents a disconnected piece worth at least
for any positive .Comment: The first version was presented at AAMAS 2015:
http://dl.acm.org/citation.cfm?id=2773268 . The current version is
substantially revised and extende
Intersection Graphs of Pseudosegments: Chordal Graphs
We investigate which chordal graphs have a representation as intersection
graphs of pseudosegments. For positive we have a construction which shows that
all chordal graphs that can be represented as intersection graph of subpaths on
a tree are pseudosegment intersection graphs. We then study the limits of
representability. We describe a family of intersection graphs of substars of a
star which is not representable as intersection graph of pseudosegments. The
degree of the substars in this example, however, has to get large. A more
intricate analysis involving a Ramsey argument shows that even in the class of
intersection graphs of substars of degree three of a star there are graphs that
are not representable as intersection graph of pseudosegments.
Motivated by representability questions for chordal graphs we consider how
many combinatorially different k-segments, i.e., curves crossing k distinct
lines, an arrangement of n pseudolines can host. We show that for fixed k this
number is in O(n^2). This result is based on a k-zone theorem for arrangements
of pseudolines that should be of independent interest.Comment: 20 pages, 13 figure
Borel circle squaring
We give a completely constructive solution to Tarski's circle squaring
problem. More generally, we prove a Borel version of an equidecomposition
theorem due to Laczkovich. If and are
bounded Borel sets with the same positive Lebesgue measure whose boundaries
have upper Minkowski dimension less than , then and are
equidecomposable by translations using Borel pieces. This answers a question of
Wagon. Our proof uses ideas from the study of flows in graphs, and a recent
result of Gao, Jackson, Krohne, and Seward on special types of witnesses to the
hyperfiniteness of free Borel actions of .Comment: Minor typos correcte
- …