136,006 research outputs found
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
Network Information Flow in Small World Networks
Recent results from statistical physics show that large classes of complex
networks, both man-made and of natural origin, are characterized by high
clustering properties yet strikingly short path lengths between pairs of nodes.
This class of networks are said to have a small-world topology. In the context
of communication networks, navigable small-world topologies, i.e. those which
admit efficient distributed routing algorithms, are deemed particularly
effective, for example in resource discovery tasks and peer-to-peer
applications. Breaking with the traditional approach to small-world topologies
that privileges graph parameters pertaining to connectivity, and intrigued by
the fundamental limits of communication in networks that exploit this type of
topology, we investigate the capacity of these networks from the perspective of
network information flow. Our contribution includes upper and lower bounds for
the capacity of standard and navigable small-world models, and the somewhat
surprising result that, with high probability, random rewiring does not alter
the capacity of a small-world network.Comment: 23 pages, 8 fitures, submitted to the IEEE Transactions on
Information Theory, November 200
Distributed Minimum Cut Approximation
We study the problem of computing approximate minimum edge cuts by
distributed algorithms. We use a standard synchronous message passing model
where in each round, bits can be transmitted over each edge (a.k.a.
the CONGEST model). We present a distributed algorithm that, for any weighted
graph and any , with high probability finds a cut of size
at most in
rounds, where is the size of the minimum cut. This algorithm is based
on a simple approach for analyzing random edge sampling, which we call the
random layering technique. In addition, we also present another distributed
algorithm, which is based on a centralized algorithm due to Matula [SODA '93],
that with high probability computes a cut of size at most
in rounds for any .
The time complexities of both of these algorithms almost match the
lower bound of Das Sarma et al. [STOC '11], thus
leading to an answer to an open question raised by Elkin [SIGACT-News '04] and
Das Sarma et al. [STOC '11].
Furthermore, we also strengthen the lower bound of Das Sarma et al. by
extending it to unweighted graphs. We show that the same lower bound also holds
for unweighted multigraphs (or equivalently for weighted graphs in which
bits can be transmitted in each round over an edge of weight ),
even if the diameter is . For unweighted simple graphs, we show
that even for networks of diameter , finding an -approximate minimum cut
in networks of edge connectivity or computing an
-approximation of the edge connectivity requires rounds
XOR-Sampling for Network Design with Correlated Stochastic Events
Many network optimization problems can be formulated as stochastic network
design problems in which edges are present or absent stochastically.
Furthermore, protective actions can guarantee that edges will remain present.
We consider the problem of finding the optimal protection strategy under a
budget limit in order to maximize some connectivity measurements of the
network. Previous approaches rely on the assumption that edges are independent.
In this paper, we consider a more realistic setting where multiple edges are
not independent due to natural disasters or regional events that make the
states of multiple edges stochastically correlated. We use Markov Random Fields
to model the correlation and define a new stochastic network design framework.
We provide a novel algorithm based on Sample Average Approximation (SAA)
coupled with a Gibbs or XOR sampler. The experimental results on real road
network data show that the policies produced by SAA with the XOR sampler have
higher quality and lower variance compared to SAA with Gibbs sampler.Comment: In Proceedings of the Twenty-sixth International Joint Conference on
Artificial Intelligence (IJCAI-17). The first two authors contribute equall
Energy Minimization of Discrete Protein Titration State Models Using Graph Theory
There are several applications in computational biophysics which require the
optimization of discrete interacting states; e.g., amino acid titration states,
ligand oxidation states, or discrete rotamer angles. Such optimization can be
very time-consuming as it scales exponentially in the number of sites to be
optimized. In this paper, we describe a new polynomial-time algorithm for
optimization of discrete states in macromolecular systems. This algorithm was
adapted from image processing and uses techniques from discrete mathematics and
graph theory to restate the optimization problem in terms of "maximum
flow-minimum cut" graph analysis. The interaction energy graph, a graph in
which vertices (amino acids) and edges (interactions) are weighted with their
respective energies, is transformed into a flow network in which the value of
the minimum cut in the network equals the minimum free energy of the protein,
and the cut itself encodes the state that achieves the minimum free energy.
Because of its deterministic nature and polynomial-time performance, this
algorithm has the potential to allow for the ionization state of larger
proteins to be discovered
- …