31,417 research outputs found
Constructs and evaluation strategies for intelligent speculative parallelism - armageddon revisited
This report addresses speculative parallelism (the assignment of spare processing resources to tasks which are not known to be strictly required for the successful completion of a computation) at the user and application level. At this level, the execution of a program is seen
as a (dynamic) tree —a graph, in general. A solution for a problem is a traversal of this graph from the initial state to a node known to be the answer. Speculative parallelism then represents the assignment of resources to múltiple branches of this graph even if they are not positively known to be on the path to a solution. In highly non-deterministic programs the branching factor can be very high and a naive assignment will very soon use up all the
resources. This report presents work assignment strategies other than the usual depth-first and breadth-first. Instead, best-first strategies are used. Since their definition is application-dependent, the application language contains primitives that allow the user (or application programmer) to a) indícate when intelligent OR-parallelism should be used; b) provide the functions that define "best," and c) indícate when to use them.
An abstract architecture enables those primitives to perform the search in a "speculative" way, using several processors, synchronizing them, killing the siblings of the path leading to the answer, etc. The user is freed from worrying about these interactions. Several search strategies are proposed and their implementation issues are addressed. "Armageddon," a global pruning method, is introduced, together with both a software and a hardware implementation for it. The concepts exposed are applicable to áreas of Artificial Intelligence such as extensive expert systems, planning, game playing, and in general to large search problems. The proposed strategies, although
showing promise, have not been evaluated by simulation or experimentation
Access to vectors in multi-module memories
The poor bandwidth obtained from memory when conflicts arise in the modules or in the interconnection network degrades the performance of computers. Address transformation schemes, such as interleaving, skewing and linear transformations, have been proposed to achieve conflict-free access for streams with constant stride. However, this is achieved only for some strides. In this paper, we summarize a mechanism to request the elements in an out-of-order way which allows to achieve
conflict-free access for a larger number of strides. We study the cases of a single vector processor and of a vector multiprocessor system. For this latter case, we propose a synchronous mode of accessing memory that can be applied in SIMD machines or in MIMD systems with decoupled access and execution.Peer ReviewedPostprint (published version
The Parallel Complexity of Growth Models
This paper investigates the parallel complexity of several non-equilibrium
growth models. Invasion percolation, Eden growth, ballistic deposition and
solid-on-solid growth are all seemingly highly sequential processes that yield
self-similar or self-affine random clusters. Nonetheless, we present fast
parallel randomized algorithms for generating these clusters. The running times
of the algorithms scale as , where is the system size, and the
number of processors required scale as a polynomial in . The algorithms are
based on fast parallel procedures for finding minimum weight paths; they
illuminate the close connection between growth models and self-avoiding paths
in random environments. In addition to their potential practical value, our
algorithms serve to classify these growth models as less complex than other
growth models, such as diffusion-limited aggregation, for which fast parallel
algorithms probably do not exist.Comment: 20 pages, latex, submitted to J. Stat. Phys., UNH-TR94-0
Conflict-free strides for vectors in matched memories
Address transformation schemes, such as skewing and linear transformations, have been proposed to achieve conflict-free access to one family of strides in vector processors with matched memories. The paper extends these schemes to achieve this conflict-free access for several families. The basic idea is to perform an out-of-order access to vectors of fixed length, equal to that of the vector registers of the processor. The hardware required is similar to that for the access in order.Peer ReviewedPostprint (author's final draft
(Tissue) P Systems with Vesicles of Multisets
We consider tissue P systems working on vesicles of multisets with the very
simple operations of insertion, deletion, and substitution of single objects.
With the whole multiset being enclosed in a vesicle, sending it to a target
cell can be indicated in those simple rules working on the multiset. As
derivation modes we consider the sequential mode, where exactly one rule is
applied in a derivation step, and the set maximal mode, where in each
derivation step a non-extendable set of rules is applied. With the set maximal
mode, computational completeness can already be obtained with tissue P systems
having a tree structure, whereas tissue P systems even with an arbitrary
communication structure are not computationally complete when working in the
sequential mode. Adding polarizations (-1, 0, 1 are sufficient) allows for
obtaining computational completeness even for tissue P systems working in the
sequential mode.Comment: In Proceedings AFL 2017, arXiv:1708.0622
(Tissue) P Systems with Vesicles of Multisets
We consider tissue P systems working on vesicles of multisets with the very
simple operations of insertion, deletion, and substitution of single objects.
With the whole multiset being enclosed in a vesicle, sending it to a target
cell can be indicated in those simple rules working on the multiset. As
derivation modes we consider the sequential mode, where exactly one rule is
applied in a derivation step, and the set maximal mode, where in each
derivation step a non-extendable set of rules is applied. With the set maximal
mode, computational completeness can already be obtained with tissue P systems
having a tree structure, whereas tissue P systems even with an arbitrary
communication structure are not computationally complete when working in the
sequential mode. Adding polarizations (-1, 0, 1 are sufficient) allows for
obtaining computational completeness even for tissue P systems working in the
sequential mode.Comment: In Proceedings AFL 2017, arXiv:1708.0622
Microgrid - The microthreaded many-core architecture
Traditional processors use the von Neumann execution model, some other
processors in the past have used the dataflow execution model. A combination of
von Neuman model and dataflow model is also tried in the past and the resultant
model is referred as hybrid dataflow execution model. We describe a hybrid
dataflow model known as the microthreading. It provides constructs for
creation, synchronization and communication between threads in an intermediate
language. The microthreading model is an abstract programming and machine model
for many-core architecture. A particular instance of this model is named as the
microthreaded architecture or the Microgrid. This architecture implements all
the concurrency constructs of the microthreading model in the hardware with the
management of these constructs in the hardware.Comment: 30 pages, 16 figure
Transparent code authentication at the processor level
The authors present a lightweight authentication mechanism that verifies the authenticity of code and thereby addresses the virus and malicious code problems at the hardware level eliminating the need for trusted extensions in the operating system. The technique proposed tightly integrates the authentication mechanism into the processor core. The authentication latency is hidden behind the memory access latency, thereby allowing seamless on-the-fly authentication of instructions. In addition, the proposed authentication method supports seamless encryption of code (and static data). Consequently, while providing the software users with assurance for authenticity of programs executing on their hardware, the proposed technique also protects the software manufacturers’ intellectual property through encryption. The performance analysis shows that, under mild assumptions, the presented technique introduces negligible overhead for even moderate cache sizes
- …