54 research outputs found

    An Introduction to DF-Threads and their Execution Model

    No full text
    Current computing systems are mostly focused on achieving performance, programmability, energy efficiency, resiliency by essentially trying to replicate the uni-core execution model n-times in parallel on a multi/many-core system. This choice has heavily conditioned the way both software and hardware are designed nowadays. However, as old as computer architecture is the concept of dataflow, that is ``initiating an activity in presence of the data it needs to perform its function'' [J. Dennis]. Dataflow had been historically initially explored at instruction level and has led to the major result of the realization of current superscalar processors, which implement a form of ``restricted dataflow'' at instruction level. In this paper, we illustrate the idea of using the dataflow concept to define novel thread types that we call Data-Flow-Threads or DF-Threads. The advantages we are aiming at regard several aspects, not fully explored yet: i) isolating the computations so that communication patterns can be more efficiently managed by a not-so-complex architecture; ii) possibility to repeat the execution of a thread in case of detected faults affecting the thread resources; iii) providing a minimalistic low-level API for allowing compilers and programmers to map their parallel codes and architects to implement more efficient and scalable systems. The semantics of DF-Threads is also tightly connected to their execution model, hereby illustrated. Several other efforts have been done with similar purposes since the introduction of macro-dataflow through the more recent DF-Codelets and the OCR project. In our case, we aim at a more complete model with the above advantages and in particular including the way of managing the mutable shared state by relying on the transactional memory semantics. Our initial experiments show how to map some simple kernel and the scalability potential on a futuristic 1k-core many-core

    A VLSI Architecture for Hierarchical Motion Estimation

    No full text

    BLOCK PLACEMENT WITH A BOLTZMANN MACHINE

    No full text
    The Boltzmann Machine is a neural model based on the same principles of Simulated Annealing that reaches good solutions, reduces the computational requirements, and is well suited for a low-cost, massively parallel hardware implementation. In this paper we present a connectionist approach to the problem of block placement in the plane to minimize wire length, based on its formalization in terms of the Boltzmann Machine. We detail the procedure to build the Boltzmann Machine by formulating the placement problem as a constrained quadratic assignment problem and by defining an equivalent 0-1 programming problem. The key features of the proposed model are: 1) high degree of parallelism in the algorithm, 2) high quality of the results, often near-optimal, and 3) support of a large variety of constraints such as arbitrary block shape, flexible aspect ratio, and rotations/reflections. Experimental results on different problem instances show the skills of the method as an effective alternative to other deterministic and statistical techniques
    corecore