Search CORE

136,006 research outputs found

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Author: Mahoney Michael W.
Publication venue
Publication date: 08/10/2010
Field of study

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors, "Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201

arXiv.org e-Print Archive

CiteSeerX

Network Information Flow in Small World Networks

Author: Barros Joao
Costa Rui A.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Recent results from statistical physics show that large classes of complex networks, both man-made and of natural origin, are characterized by high clustering properties yet strikingly short path lengths between pairs of nodes. This class of networks are said to have a small-world topology. In the context of communication networks, navigable small-world topologies, i.e. those which admit efficient distributed routing algorithms, are deemed particularly effective, for example in resource discovery tasks and peer-to-peer applications. Breaking with the traditional approach to small-world topologies that privileges graph parameters pertaining to connectivity, and intrigued by the fundamental limits of communication in networks that exploit this type of topology, we investigate the capacity of these networks from the perspective of network information flow. Our contribution includes upper and lower bounds for the capacity of standard and navigable small-world models, and the somewhat surprising result that, with high probability, random rewiring does not alter the capacity of a small-world network.Comment: 23 pages, 8 fitures, submitted to the IEEE Transactions on Information Theory, November 200

arXiv.org e-Print Archive

Crossref

Repositório Aberto da Universidade do Porto

Distributed Minimum Cut Approximation

Author: A. Das Sarma
A.A. Razborov
B. Kalyanasundaram
D.R. Karger
H. Nagamochi
J.C. Picard
L.R. Ford
M. Elkin
M.V. Lomonosov
P. Elias
R. Thurimella
Publication venue
Publication date: 01/01/2013
Field of study

We study the problem of computing approximate minimum edge cuts by distributed algorithms. We use a standard synchronous message passing model where in each round,

O(\log n)

bits can be transmitted over each edge (a.k.a. the CONGEST model). We present a distributed algorithm that, for any weighted graph and any

\epsilon \in (0, 1)

, with high probability finds a cut of size at most

O(\epsilon^{-1}\lambda)

O(D) + \tilde{O}(n^{1/2 + \epsilon})

rounds, where

\lambda

is the size of the minimum cut. This algorithm is based on a simple approach for analyzing random edge sampling, which we call the random layering technique. In addition, we also present another distributed algorithm, which is based on a centralized algorithm due to Matula [SODA '93], that with high probability computes a cut of size at most

(2+\epsilon)\lambda

\tilde{O}((D+\sqrt{n})/\epsilon^5)

rounds for any

\epsilon>0

. The time complexities of both of these algorithms almost match the

\tilde{\Omega}(D + \sqrt{n})

lower bound of Das Sarma et al. [STOC '11], thus leading to an answer to an open question raised by Elkin [SIGACT-News '04] and Das Sarma et al. [STOC '11]. Furthermore, we also strengthen the lower bound of Das Sarma et al. by extending it to unweighted graphs. We show that the same lower bound also holds for unweighted multigraphs (or equivalently for weighted graphs in which

O(w\log n)

bits can be transmitted in each round over an edge of weight

w

), even if the diameter is

D=O(\log n)

. For unweighted simple graphs, we show that even for networks of diameter

\tilde{O}(\frac{1}{\lambda}\cdot \sqrt{\frac{n}{\alpha\lambda}})

, finding an

\alpha

-approximate minimum cut in networks of edge connectivity

\lambda

or computing an

\alpha

-approximation of the edge connectivity requires

\tilde{\Omega}(D + \sqrt{\frac{n}{\alpha\lambda}})

rounds

arXiv.org e-Print Archive

CiteSeerX

Crossref

XOR-Sampling for Network Design with Correlated Stochastic Events

Author: Gomes Carla P.
Selman Bart
Wu Xiaojian
Xue Yexiang
Publication venue
Publication date: 23/05/2017
Field of study

Many network optimization problems can be formulated as stochastic network design problems in which edges are present or absent stochastically. Furthermore, protective actions can guarantee that edges will remain present. We consider the problem of finding the optimal protection strategy under a budget limit in order to maximize some connectivity measurements of the network. Previous approaches rely on the assumption that edges are independent. In this paper, we consider a more realistic setting where multiple edges are not independent due to natural disasters or regional events that make the states of multiple edges stochastically correlated. We use Markov Random Fields to model the correlation and define a new stochastic network design framework. We provide a novel algorithm based on Sample Average Approximation (SAA) coupled with a Gibbs or XOR sampler. The experimental results on real road network data show that the policies produced by SAA with the XOR sampler have higher quality and lower variance compared to SAA with Gibbs sampler.Comment: In Proceedings of the Twenty-sixth International Joint Conference on Artificial Intelligence (IJCAI-17). The first two authors contribute equall

arXiv.org e-Print Archive

Crossref

Energy Minimization of Discrete Protein Titration State Models Using Graph Theory

Author: Baker Nathan A.
Jurrus Elizabeth
Monson Kyle
Purvine Emilie
Star Keith
Publication venue: 'American Chemical Society (ACS)'
Publication date: 16/04/2016
Field of study

There are several applications in computational biophysics which require the optimization of discrete interacting states; e.g., amino acid titration states, ligand oxidation states, or discrete rotamer angles. Such optimization can be very time-consuming as it scales exponentially in the number of sites to be optimized. In this paper, we describe a new polynomial-time algorithm for optimization of discrete states in macromolecular systems. This algorithm was adapted from image processing and uses techniques from discrete mathematics and graph theory to restate the optimization problem in terms of "maximum flow-minimum cut" graph analysis. The interaction energy graph, a graph in which vertices (amino acids) and edges (interactions) are weighted with their respective energies, is transformed into a flow network in which the value of the minimum cut in the network equals the minimum free energy of the protein, and the cut itself encodes the state that achieves the minimum free energy. Because of its deterministic nature and polynomial-time performance, this algorithm has the potential to allow for the ionization state of larger proteins to be discovered

arXiv.org e-Print Archive

FigShare