Search CORE

172,679 research outputs found

Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs

Author: Assadi Sepehr
Bateni MohammadHossein
Bernstein Aaron
Mirrokni Vahab
Stein Cliff
Publication venue
Publication date: 27/12/2018
Field of study

As massive graphs become more prevalent, there is a rapidly growing need for scalable algorithms that solve classical graph problems, such as maximum matching and minimum vertex cover, on large datasets. For massive inputs, several different computational models have been introduced, including the streaming model, the distributed communication model, and the massively parallel computation (MPC) model that is a common abstraction of MapReduce-style computation. In each model, algorithms are analyzed in terms of resources such as space used or rounds of communication needed, in addition to the more traditional approximation ratio. In this paper, we give a single unified approach that yields better approximation algorithms for matching and vertex cover in all these models. The highlights include: * The first one pass, significantly-better-than-2-approximation for matching in random arrival streams that uses subquadratic space, namely a

(1.5+\epsilon)

-approximation streaming algorithm that uses

O(n^{1.5})

space for constant

\epsilon > 0

. * The first 2-round, better-than-2-approximation for matching in the MPC model that uses subquadratic space per machine, namely a

(1.5+\epsilon)

-approximation algorithm with

O(\sqrt{mn} + n)

memory per machine for constant

\epsilon > 0

. By building on our unified approach, we further develop parallel algorithms in the MPC model that give a

(1 + \epsilon)

-approximation to matching and an

O(1)

-approximation to vertex cover in only

O(\log\log{n})

MPC rounds and

O(n/poly\log{(n)})

memory per machine. These results settle multiple open questions posed in the recent paper of Czumaj~et.al. [STOC 2018]

arXiv.org e-Print Archive

Crossref

Optical implementation of the Hopfield model

Author: Farhat Nabil H.
Park Eung
Prata Aluizio
Psaltis Demetri
Publication venue: Optical Society of America
Publication date: 01/01/1985
Field of study

Optical implementation of content addressable associative memory based on the Hopfield model for neural networks and on the addition of nonlinear iterative feedback to a vector-matrix multiplier is described. Numerical and experimental results presented show that the approach is capable of introducing accuracy and robustness to optical processing while maintaining the traditional advantages of optics, namely, parallelism and massive interconnection capability. Moreover a potentially useful link between neural processing and optics that can be of interest in pattern recognition and machine vision is established

CiteSeerX

Caltech Authors

Efficiently modeling neural networks on massively parallel computers

Author: Farber Robert M.
Publication venue
Publication date
Field of study

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks

NASA Technical Reports Server

Robust Learning from Bites

Author: Christmann Andreas
Publication venue
Publication date
Field of study

Many robust statistical procedures have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets. Secondly, robust confidence intervals for the estimated parameters or robust predictions according to the fitted models are often unknown. Here, we propose a general method to overcome these problems of robust estimation in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. The method additionally offers distribution-free confidence intervals for the median of the predictions. The method is illustrated for two situations: robust estimation in linear regression and kernel logistic regression from statistical machine learning. --

Research Papers in Economics

Round Compression for Parallel Matching Algorithms

Author: Czumaj Artur
Mitrović Slobodan
Mądry Aleksander
Onak Krzysztof
Sankowski Piotr
Łącki Jakub
Publication venue
Publication date: 01/01/2018
Field of study

For over a decade now we have been witnessing the success of {\em massive parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or Spark. One of the reasons for their success is the fact that these frameworks are able to accurately capture the nature of large-scale computation. In particular, compared to the classic distributed algorithms or PRAM models, these frameworks allow for much more local computation. The fundamental question that arises in this context is though: can we leverage this additional power to obtain even faster parallel algorithms? A prominent example here is the {\em maximum matching} problem---one of the most classic graph problems. It is well known that in the PRAM model one can compute a 2-approximate maximum matching in

O(\log{n})

rounds. However, the exact complexity of this problem in the MPC framework is still far from understood. Lattanzi et al. showed that if each machine has

n^{1+\Omega(1)}

memory, this problem can also be solved

2

-approximately in a constant number of rounds. These techniques, as well as the approaches developed in the follow up work, seem though to get stuck in a fundamental way at roughly

O(\log{n})

rounds once we enter the near-linear memory regime. It is thus entirely possible that in this regime, which captures in particular the case of sparse graph computations, the best MPC round complexity matches what one can already get in the PRAM model, without the need to take advantage of the extra local computation power. In this paper, we finally refute that perplexing possibility. That is, we break the above

O(\log n)

round complexity bound even in the case of {\em slightly sublinear} memory per machine. In fact, our improvement here is {\em almost exponential}: we are able to deliver a

(2+\epsilon)

-approximation to maximum matching, for any fixed constant

\epsilon>0

, in

O((\log \log n)^2)

rounds

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

DSpace@MIT

Warwick Research Archives Portal Repository

Computing fuzzy rough approximations in large scale information systems

Author: Asfoor Hasan
Cornelis Chris
De Cock Martine
Srinivasan Rajagopalan
Teredesai Ankur
Tolentino Matthew
Vasudevan Gayathri
Verbiest Nele
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem

University of Washington: UW Tacoma Digital Commons

Crossref

Ghent University Academic Bibliography