172,679 research outputs found
Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs
As massive graphs become more prevalent, there is a rapidly growing need for
scalable algorithms that solve classical graph problems, such as maximum
matching and minimum vertex cover, on large datasets. For massive inputs,
several different computational models have been introduced, including the
streaming model, the distributed communication model, and the massively
parallel computation (MPC) model that is a common abstraction of
MapReduce-style computation. In each model, algorithms are analyzed in terms of
resources such as space used or rounds of communication needed, in addition to
the more traditional approximation ratio.
In this paper, we give a single unified approach that yields better
approximation algorithms for matching and vertex cover in all these models. The
highlights include:
* The first one pass, significantly-better-than-2-approximation for matching
in random arrival streams that uses subquadratic space, namely a
-approximation streaming algorithm that uses space
for constant .
* The first 2-round, better-than-2-approximation for matching in the MPC
model that uses subquadratic space per machine, namely a
-approximation algorithm with memory per
machine for constant .
By building on our unified approach, we further develop parallel algorithms
in the MPC model that give a -approximation to matching and an
-approximation to vertex cover in only MPC rounds and
memory per machine. These results settle multiple open
questions posed in the recent paper of Czumaj~et.al. [STOC 2018]
Optical implementation of the Hopfield model
Optical implementation of content addressable associative memory based on the Hopfield model for neural networks and on the addition of nonlinear iterative feedback to a vector-matrix multiplier is described. Numerical and experimental results presented show that the approach is capable of introducing accuracy and robustness to optical processing while maintaining the traditional advantages of optics, namely, parallelism and massive interconnection capability. Moreover a potentially useful link between neural processing and optics that can be of interest in pattern recognition and machine vision is established
Efficiently modeling neural networks on massively parallel computers
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks
Robust Learning from Bites
Many robust statistical procedures have two drawbacks. Firstly, they are computer-intensive such that they can hardly be used for massive data sets. Secondly, robust confidence intervals for the estimated parameters or robust predictions according to the fitted models are often unknown. Here, we propose a general method to overcome these problems of robust estimation in the context of huge data sets. The method is scalable to the memory of the computer, can be distributed on several processors if available, and can help to reduce the computation time substantially. The method additionally offers distribution-free confidence intervals for the median of the predictions. The method is illustrated for two situations: robust estimation in linear regression and kernel logistic regression from statistical machine learning. --
Round Compression for Parallel Matching Algorithms
For over a decade now we have been witnessing the success of {\em massive
parallel computation} (MPC) frameworks, such as MapReduce, Hadoop, Dryad, or
Spark. One of the reasons for their success is the fact that these frameworks
are able to accurately capture the nature of large-scale computation. In
particular, compared to the classic distributed algorithms or PRAM models,
these frameworks allow for much more local computation. The fundamental
question that arises in this context is though: can we leverage this additional
power to obtain even faster parallel algorithms?
A prominent example here is the {\em maximum matching} problem---one of the
most classic graph problems. It is well known that in the PRAM model one can
compute a 2-approximate maximum matching in rounds. However, the
exact complexity of this problem in the MPC framework is still far from
understood. Lattanzi et al. showed that if each machine has
memory, this problem can also be solved -approximately in a constant number
of rounds. These techniques, as well as the approaches developed in the follow
up work, seem though to get stuck in a fundamental way at roughly
rounds once we enter the near-linear memory regime. It is thus entirely
possible that in this regime, which captures in particular the case of sparse
graph computations, the best MPC round complexity matches what one can already
get in the PRAM model, without the need to take advantage of the extra local
computation power.
In this paper, we finally refute that perplexing possibility. That is, we
break the above round complexity bound even in the case of {\em
slightly sublinear} memory per machine. In fact, our improvement here is {\em
almost exponential}: we are able to deliver a -approximation to
maximum matching, for any fixed constant , in
rounds
Computing fuzzy rough approximations in large scale information systems
Rough set theory is a popular and powerful machine learning tool. It is especially suitable for dealing with information systems that exhibit inconsistencies, i.e. objects that have the same values for the conditional attributes but a different value for the decision attribute. In line with the emerging granular computing paradigm, rough set theory groups objects together based on the indiscernibility of their attribute values. Fuzzy rough set theory extends rough set theory to data with continuous attributes, and detects degrees of inconsistency in the data. Key to this is turning the indiscernibility relation into a gradual relation, acknowledging that objects can be similar to a certain extent. In very large datasets with millions of objects, computing the gradual indiscernibility relation (or in other words, the soft granules) is very demanding, both in terms of runtime and in terms of memory. It is however required for the computation of the lower and upper approximations of concepts in the fuzzy rough set analysis pipeline. Current non-distributed implementations in R are limited by memory capacity. For example, we found that a state of the art non-distributed implementation in R could not handle 30,000 rows and 10 attributes on a node with 62GB of memory. This is clearly insufficient to scale fuzzy rough set analysis to massive datasets. In this paper we present a parallel and distributed solution based on Message Passing Interface (MPI) to compute fuzzy rough approximations in very large information systems. Our results show that our parallel approach scales with problem size to information systems with millions of objects. To the best of our knowledge, no other parallel and distributed solutions have been proposed so far in the literature for this problem
- …