895 research outputs found
Automated problem scheduling and reduction of synchronization delay effects
It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs
Engineering Resilient Collective Adaptive Systems by Self-Stabilisation
Collective adaptive systems are an emerging class of networked computational
systems, particularly suited in application domains such as smart cities,
complex sensor networks, and the Internet of Things. These systems tend to
feature large scale, heterogeneity of communication model (including
opportunistic peer-to-peer wireless interaction), and require inherent
self-adaptiveness properties to address unforeseen changes in operating
conditions. In this context, it is extremely difficult (if not seemingly
intractable) to engineer reusable pieces of distributed behaviour so as to make
them provably correct and smoothly composable.
Building on the field calculus, a computational model (and associated
toolchain) capturing the notion of aggregate network-level computation, we
address this problem with an engineering methodology coupling formal theory and
computer simulation. On the one hand, functional properties are addressed by
identifying the largest-to-date field calculus fragment generating
self-stabilising behaviour, guaranteed to eventually attain a correct and
stable final state despite any transient perturbation in state or topology, and
including highly reusable building blocks for information spreading,
aggregation, and time evolution. On the other hand, dynamical properties are
addressed by simulation, empirically evaluating the different performances that
can be obtained by switching between implementations of building blocks with
provably equivalent functional properties. Overall, our methodology sheds light
on how to identify core building blocks of collective behaviour, and how to
select implementations that improve system performance while leaving overall
system function and resiliency properties unchanged.Comment: To appear on ACM Transactions on Modeling and Computer Simulatio
A Fast Compiler for NetKAT
High-level programming languages play a key role in a growing number of
networking platforms, streamlining application development and enabling precise
formal reasoning about network behavior. Unfortunately, current compilers only
handle "local" programs that specify behavior in terms of hop-by-hop forwarding
behavior, or modest extensions such as simple paths. To encode richer "global"
behaviors, programmers must add extra state -- something that is tricky to get
right and makes programs harder to write and maintain. Making matters worse,
existing compilers can take tens of minutes to generate the forwarding state
for the network, even on relatively small inputs. This forces programmers to
waste time working around performance issues or even revert to using
hardware-level APIs.
This paper presents a new compiler for the NetKAT language that handles rich
features including regular paths and virtual networks, and yet is several
orders of magnitude faster than previous compilers. The compiler uses symbolic
automata to calculate the extra state needed to implement "global" programs,
and an intermediate representation based on binary decision diagrams to
dramatically improve performance. We describe the design and implementation of
three essential compiler stages: from virtual programs (which specify behavior
in terms of virtual topologies) to global programs (which specify network-wide
behavior in terms of physical topologies), from global programs to local
programs (which specify behavior in terms of single-switch behavior), and from
local programs to hardware-level forwarding tables. We present results from
experiments on real-world benchmarks that quantify performance in terms of
compilation time and forwarding table size
Code Building Genetic Programming
In recent years the field of genetic programming has made significant
advances towards automatic programming. Research and development of
contemporary program synthesis methods, such as PushGP and Grammar Guided
Genetic Programming, can produce programs that solve problems typically
assigned in introductory academic settings. These problems focus on a narrow,
predetermined set of simple data structures, basic control flow patterns, and
primitive, non-overlapping data types (without, for example, inheritance or
composite types). Few, if any, genetic programming methods for program
synthesis have convincingly demonstrated the capability of synthesizing
programs that use arbitrary data types, data structures, and specifications
that are drawn from existing codebases. In this paper, we introduce Code
Building Genetic Programming (CBGP) as a framework within which this can be
done, by leveraging programming language features such as reflection and
first-class specifications. CBGP produces a computational graph that can be
executed or translated into source code of a host language. To demonstrate the
novel capabilities of CBGP, we present results on new benchmarks that use
non-primitive, polymorphic data types as well as some standard program
synthesis benchmarks.Comment: Proceedings of the 2020 Genetic and Evolutionary Computation
Conference, Genetic Programming Trac
Throughput constrained parallelism reduction in cyclo-static dataflow applications
International audienceThis paper deals with semantics-preserving parallelism reduction methods for cyclo-static dataflow applications. Parallelism reduction is the process of equivalent actors fusioning. The principal objectives of parallelism reduction are to decrease the memory footprint of an application and to increase its execution performance. We focus on parallelism reduction methodologies constrained by application throughput. A generic parallelism reduction methodology is introduced. Experimental results are provided for asserting the performance of the proposed method
Integrated testing and verification system for research flight software design document
The NASA Langley Research Center is developing the MUST (Multipurpose User-oriented Software Technology) program to cut the cost of producing research flight software through a system of software support tools. The HAL/S language is the primary subject of the design. Boeing Computer Services Company (BCS) has designed an integrated verification and testing capability as part of MUST. Documentation, verification and test options are provided with special attention on real time, multiprocessing issues. The needs of the entire software production cycle have been considered, with effective management and reduced lifecycle costs as foremost goals. Capabilities have been included in the design for static detection of data flow anomalies involving communicating concurrent processes. Some types of ill formed process synchronization and deadlock also are detected statically
Packet Transactions: High-level Programming for Line-Rate Switches
Many algorithms for congestion control, scheduling, network measurement,
active queue management, security, and load balancing require custom processing
of packets as they traverse the data plane of a network switch. To run at line
rate, these data-plane algorithms must be in hardware. With today's switch
hardware, algorithms cannot be changed, nor new algorithms installed, after a
switch has been built.
This paper shows how to program data-plane algorithms in a high-level
language and compile those programs into low-level microcode that can run on
emerging programmable line-rate switching chipsets. The key challenge is that
these algorithms create and modify algorithmic state. The key idea to achieve
line-rate programmability for stateful algorithms is the notion of a packet
transaction : a sequential code block that is atomic and isolated from other
such code blocks. We have developed this idea in Domino, a C-like imperative
language to express data-plane algorithms. We show with many examples that
Domino provides a convenient and natural way to express sophisticated
data-plane algorithms, and show that these algorithms can be run at line rate
with modest estimated die-area overhead.Comment: 16 page
- …