977 research outputs found

    An Overview of Multi-Processor Approximate Message Passing

    Full text link
    Approximate message passing (AMP) is an algorithmic framework for solving linear inverse problems from noisy measurements, with exciting applications such as reconstructing images, audio, hyper spectral images, and various other signals, including those acquired in compressive signal acquisiton systems. The growing prevalence of big data systems has increased interest in large-scale problems, which may involve huge measurement matrices that are unsuitable for conventional computing systems. To address the challenge of large-scale processing, multiprocessor (MP) versions of AMP have been developed. We provide an overview of two such MP-AMP variants. In row-MP-AMP, each computing node stores a subset of the rows of the matrix and processes corresponding measurements. In column- MP-AMP, each node stores a subset of columns, and is solely responsible for reconstructing a portion of the signal. We will discuss pros and cons of both approaches, summarize recent research results for each, and explain when each one may be a viable approach. Aspects that are highlighted include some recent results on state evolution for both MP-AMP algorithms, and the use of data compression to reduce communication in the MP network

    Multiprocessor Approximate Message Passing with Column-Wise Partitioning

    Full text link
    Solving a large-scale regularized linear inverse problem using multiple processors is important in various real-world applications due to the limitations of individual processors and constraints on data sharing policies. This paper focuses on the setting where the matrix is partitioned column-wise. We extend the algorithmic framework and the theoretical analysis of approximate message passing (AMP), an iterative algorithm for solving linear inverse problems, whose asymptotic dynamics are characterized by state evolution (SE). In particular, we show that column-wise multiprocessor AMP (C-MP-AMP) obeys an SE under the same assumptions when the SE for AMP holds. The SE results imply that (i) the SE of C-MP-AMP converges to a state that is no worse than that of AMP and (ii) the asymptotic dynamics of C-MP-AMP and AMP can be identical. Moreover, for a setting that is not covered by SE, numerical results show that damping can improve the convergence performance of C-MP-AMP.Comment: This document contains complete details of the previous version (i.e., arXiv:1701.02578v1), which was accepted for publication in ICASSP 201

    A survey of parallel algorithms for fractal image compression

    Get PDF
    This paper presents a short survey of the key research work that has been undertaken in the application of parallel algorithms for Fractal image compression. The interest in fractal image compression techniques stems from their ability to achieve high compression ratios whilst maintaining a very high quality in the reconstructed image. The main drawback of this compression method is the very high computational cost that is associated with the encoding phase. Consequently, there has been significant interest in exploiting parallel computing architectures in order to speed up this phase, whilst still maintaining the advantageous features of the approach. This paper presents a brief introduction to fractal image compression, including the iterated function system theory upon which it is based, and then reviews the different techniques that have been, and can be, applied in order to parallelize the compression algorithm

    Energy efficient and latency aware adaptive compression in wireless sensor networks

    Get PDF
    Wireless sensor networks are composed of a few to several thousand sensors deployed over an area or on specific objects to sense data and report that data back to a sink either directly or through a series of hops across other sensor nodes. There are many applications for wireless sensor networks including environment monitoring, wildlife tracking, security, structural heath monitoring, troop tracking, and many others. The sensors communicate wirelessly and are typically very small in size and powered by batteries. Wireless sensor networks are thus often constrained in bandwidth, processor speed, and power. Also, many wireless sensor network applications have a very low tolerance for latency and need to transmit the data in real time. Data compression is a useful tool for minimizing the bandwidth and power required to transmit data from the sensor nodes to the sink; however, compression algorithms often add a significant amount of latency or require a great deal of additional processing. The following papers define and analyze multiple approaches for achieving effective compression while reducing latency and power consumption far below what would be required to process and transmit the data uncompressed. The algorithms target many different types of sensor applications from lossless compression on a single sensor to error tolerant, collaborative compression across an entire network of sensors to compression of XML data on sensors. Extensive analysis over many different real-life data sets and comparison of several existing compression methods show significant contribution to efficient wireless sensor communication --Abstract, page iv

    Resiliency in numerical algorithm design for extreme scale simulations

    Get PDF
    This work is based on the seminar titled ‘Resiliency in Numerical Algorithm Design for Extreme Scale Simulations’ held March 1–6, 2020, at Schloss Dagstuhl, that was attended by all the authors. Advanced supercomputing is characterized by very high computation speeds at the cost of involving an enormous amount of resources and costs. A typical large-scale computation running for 48 h on a system consuming 20 MW, as predicted for exascale systems, would consume a million kWh, corresponding to about 100k Euro in energy cost for executing 1023 floating-point operations. It is clearly unacceptable to lose the whole computation if any of the several million parallel processes fails during the execution. Moreover, if a single operation suffers from a bit-flip error, should the whole computation be declared invalid? What about the notion of reproducibility itself: should this core paradigm of science be revised and refined for results that are obtained by large-scale simulation? Naive versions of conventional resilience techniques will not scale to the exascale regime: with a main memory footprint of tens of Petabytes, synchronously writing checkpoint data all the way to background storage at frequent intervals will create intolerable overheads in runtime and energy consumption. Forecasts show that the mean time between failures could be lower than the time to recover from such a checkpoint, so that large calculations at scale might not make any progress if robust alternatives are not investigated. More advanced resilience techniques must be devised. The key may lie in exploiting both advanced system features as well as specific application knowledge. Research will face two essential questions: (1) what are the reliability requirements for a particular computation and (2) how do we best design the algorithms and software to meet these requirements? While the analysis of use cases can help understand the particular reliability requirements, the construction of remedies is currently wide open. One avenue would be to refine and improve on system- or application-level checkpointing and rollback strategies in the case an error is detected. Developers might use fault notification interfaces and flexible runtime systems to respond to node failures in an application-dependent fashion. Novel numerical algorithms or more stochastic computational approaches may be required to meet accuracy requirements in the face of undetectable soft errors. These ideas constituted an essential topic of the seminar. The goal of this Dagstuhl Seminar was to bring together a diverse group of scientists with expertise in exascale computing to discuss novel ways to make applications resilient against detected and undetected faults. In particular, participants explored the role that algorithms and applications play in the holistic approach needed to tackle this challenge. This article gathers a broad range of perspectives on the role of algorithms, applications and systems in achieving resilience for extreme scale simulations. The ultimate goal is to spark novel ideas and encourage the development of concrete solutions for achieving such resilience holistically.Peer Reviewed"Article signat per 36 autors/es: Emmanuel Agullo, Mirco Altenbernd, Hartwig Anzt, Leonardo Bautista-Gomez, Tommaso Benacchio, Luca Bonaventura, Hans-Joachim Bungartz, Sanjay Chatterjee, Florina M. Ciorba, Nathan DeBardeleben, Daniel Drzisga, Sebastian Eibl, Christian Engelmann, Wilfried N. Gansterer, Luc Giraud, Dominik G ̈oddeke, Marco Heisig, Fabienne Jezequel, Nils Kohl, Xiaoye Sherry Li, Romain Lion, Miriam Mehl, Paul Mycek, Michael Obersteiner, Enrique S. Quintana-Ortiz, Francesco Rizzi, Ulrich Rude, Martin Schulz, Fred Fung, Robert Speck, Linda Stals, Keita Teranishi, Samuel Thibault, Dominik Thonnes, Andreas Wagner and Barbara Wohlmuth"Postprint (author's final draft

    IST Austria Thesis

    Get PDF
    The scalability of concurrent data structures and distributed algorithms strongly depends on reducing the contention for shared resources and the costs of synchronization and communication. We show how such cost reductions can be attained by relaxing the strict consistency conditions required by sequential implementations. In the first part of the thesis, we consider relaxation in the context of concurrent data structures. Specifically, in data structures such as priority queues, imposing strong semantics renders scalability impossible, since a correct implementation of the remove operation should return only the element with highest priority. Intuitively, attempting to invoke remove operations concurrently creates a race condition. This bottleneck can be circumvented by relaxing semantics of the affected data structure, thus allowing removal of the elements which are no longer required to have the highest priority. We prove that the randomized implementations of relaxed data structures provide provable guarantees on the priority of the removed elements even under concurrency. Additionally, we show that in some cases the relaxed data structures can be used to scale the classical algorithms which are usually implemented with the exact ones. In the second part, we study parallel variants of the stochastic gradient descent (SGD) algorithm, which distribute computation among the multiple processors, thus reducing the running time. Unfortunately, in order for standard parallel SGD to succeed, each processor has to maintain a local copy of the necessary model parameter, which is identical to the local copies of other processors; the overheads from this perfect consistency in terms of communication and synchronization can negate the speedup gained by distributing the computation. We show that the consistency conditions required by SGD can be relaxed, allowing the algorithm to be more flexible in terms of tolerating quantized communication, asynchrony, or even crash faults, while its convergence remains asymptotically the same
    • …
    corecore