2,552 research outputs found
Time Lower Bounds for Parallel Network Computations
Direct Acyclic Graphs (DAGs) are a suitable way to describe computations, expressing precedence constraints among operations. Beyond the representation of the execution of an algorithm, a DAG can effectively represent the execution of a parallel network. This last kind of DAG has a regular structure, consisting in the repetition over time of the original network; these common representations suggest a possible uniform approach in the study of execution of algorithms and emulation of networks.
Both in parallel computing and computational complexity, DAGs have been extensively employed in the study of algorithmic features, as lower bounds for the execution/emulation time of algorithms/networks, the minimum quantity of memory needed for computing an algorithm or the minimum I/O complexity of an algorithm given a certain amount of fast memory cells. Developed techniques are quite different in their assumptions; one of the more fundamental differences is that some of them allow recomputation of intermediate results, while others disallow it, requiring the storage in memory of intermediate results for their further usages.
In nowadays computations the trade-off between data recomputation and data storing is important both in parallel and in local elaborations, since in the former we can increase the bandwidth and reduce the latency with whom data can be accessed (by computing the same data in several points of the network), while in the latter we can avoid to pay the latency of the access in memory to reload data, by recomputing them possibly loading fewer data or using data already present in memory.
So far it does not exist an universal technique able to foresee the strict lower bound for each execution of algorithm or emulation of network in each network and the known results derive from several theorems. On the contrary there are a lot of cases for which it neither exists a tight result; among these there are also emulations of extensively studied networks, such as multidimensional arrays.
The first part of our thesis starts from this state-of-the-art: we propose a survey of several known lower bound techniques involving DAGs, followed by original theorems which clarify or solve open problems. In particular, in our survey we consider lower bound techniques for execution of algorithms and emulation of networks in parallel networks, showing their principles and their limits.
In the discussion we show relationships among theorems, proving that no one of them is better of the others in general terms: there are counter-examples in which each theorem gives better bounds than others. We also exhibit examples where no bound among the considered techniques is tight. Moreover we generalize some theorems originally suited for network emulations, adapting them to execution of general DAGs in parallel networks, showing examples of their application. We also consider theorems for determining minimum I/O complexity, presenting similarities and differences with emulation theorems.
One of the main results of the thesis is a new general technique which provides lower bounds almost tight (except for a logarithmic factor) in a class of network emulations including multidimensional arrays. We improve previously better known results which have a polynomial gap between lower bound and actual emulation time. Our theorem considers emulations with recomputation, giving results valid in the most general context.
Finally we consider the role of recomputation in performance, trying to understand when it gives a real advantage respect to storing intermediate results in memory. In particular we introduce the problem in simple networks, showing a class of them in which recomputation can not improve I/O performance, ending in butterfly DAGs where recomputation can save a number of I/O accesses at least as big as the fast memory available during the computation. The approach used highlights the difficulty of exploit recomputation in executions of algorithms when their DAG representation exhibits an high bisection bandwidth
Dynamic Control Flow in Large-Scale Machine Learning
Many recent machine learning models rely on fine-grained dynamic control flow
for training and inference. In particular, models based on recurrent neural
networks and on reinforcement learning depend on recurrence relations,
data-dependent conditional execution, and other features that call for dynamic
control flow. These applications benefit from the ability to make rapid
control-flow decisions across a set of computing devices in a distributed
system. For performance, scalability, and expressiveness, a machine learning
system must support dynamic control flow in distributed and heterogeneous
environments.
This paper presents a programming model for distributed machine learning that
supports dynamic control flow. We describe the design of the programming model,
and its implementation in TensorFlow, a distributed machine learning system.
Our approach extends the use of dataflow graphs to represent machine learning
models, offering several distinctive features. First, the branches of
conditionals and bodies of loops can be partitioned across many machines to run
on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs.
Second, programs written in our model support automatic differentiation and
distributed gradient computations, which are necessary for training machine
learning models that use control flow. Third, our choice of non-strict
semantics enables multiple loop iterations to execute in parallel across
machines, and to overlap compute and I/O operations.
We have done our work in the context of TensorFlow, and it has been used
extensively in research and production. We evaluate it using several real-world
applications, and demonstrate its performance and scalability.Comment: Appeared in EuroSys 2018. 14 pages, 16 figure
Prospects and limitations of full-text index structures in genome analysis
The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared
Real-Time Decoding for Fault-Tolerant Quantum Computing: Progress, Challenges and Outlook
Quantum computing is poised to solve practically useful problems which are
computationally intractable for classical supercomputers. However, the current
generation of quantum computers are limited by errors that may only partially
be mitigated by developing higher-quality qubits. Quantum error correction
(QEC) will thus be necessary to ensure fault tolerance. QEC protects the
logical information by cyclically measuring syndrome information about the
errors. An essential part of QEC is the decoder, which uses the syndrome to
compute the likely effect of the errors on the logical degrees of freedom and
provide a tentative correction. The decoder must be accurate, fast enough to
keep pace with the QEC cycle (e.g., on a microsecond timescale for
superconducting qubits) and with hard real-time system integration to support
logical operations. As such, real-time decoding is essential to realize
fault-tolerant quantum computing and to achieve quantum advantage. In this
work, we highlight some of the key challenges facing the implementation of
real-time decoders while providing a succinct summary of the progress to-date.
Furthermore, we lay out our perspective for the future development and provide
a possible roadmap for the field of real-time decoding in the next few years.
As the quantum hardware is anticipated to scale up, this perspective article
will provide a guidance for researchers, focusing on the most pressing issues
in real-time decoding and facilitating the development of solutions across
quantum and computer science
Temporal clustering of social interactions trades-off disease spreading and knowledge diffusion
Non-pharmaceutical measures such as preventive quarantines, remote working,
school and workplace closures, lockdowns, etc. have shown effectivenness from
an epidemic control perspective; however they have also significant negative
consequences on social life and relationships, work routines, and community
engagement. In particular, complex ideas, work and school collaborations,
innovative discoveries, and resilient norms formation and maintenance, which
often require face-to-face interactions of two or more parties to be developed
and synergically coordinated, are particularly affected. In this study, we
propose an alternative hybrid solution that balances the slowdown of epidemic
diffusion with the preservation of face-to-face interactions. Our approach
involves a two-step partitioning of the population. First, we tune the level of
node clustering, creating "social bubbles" with increased contacts within each
bubble and fewer outside, while maintaining the average number of contacts in
each network. Second, we tune the level of temporal clustering by pairing, for
a certain time interval, nodes from specific social bubbles. Our results
demonstrate that a hybrid approach can achieve better trade-offs between
epidemic control and complex knowledge diffusion. The versatility of our model
enables tuning and refining clustering levels to optimally achieve the desired
trade-off, based on the potentially changing characteristics of a disease or
knowledge diffusion process
- …