285 research outputs found

    Scalability of broadcast performance in wireless network-on-chip

    Get PDF
    Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip communication requirements of processors with hundreds or thousands of cores. The main reason is that the performance of such networks drops as the number of cores grows, especially in the presence of multicast and broadcast traffic. This not only limits the scalability of current multiprocessor architectures, but also sets a performance wall that prevents the development of architectures that generate moderate-to-high levels of multicast. In this paper, a Wireless Network-on-Chip (WNoC) where all cores share a single broadband channel is presented. Such design is conceived to provide low latency and ordered delivery for multicast/broadcast traffic, in an attempt to complement a wireline NoC that will transport the rest of communication flows. To assess the feasibility of this approach, the network performance of WNoC is analyzed as a function of the system size and the channel capacity, and then compared to that of wireline NoCs with embedded multicast support. Based on this evaluation, preliminary results on the potential performance of the proposed hybrid scheme are provided, together with guidelines for the design of MAC protocols for WNoC.Peer ReviewedPostprint (published version

    A scheduling algorithm for multiport memory minimization in datapath synthesis

    Full text link
    Abstract- In this paper, we present a new scheduling algorithms that generates area-efficient register transfer level datapaths with multiport memories. The proposed scheduling algorithm assigns an operation to a specific control step such that maximal sharing of functional units can be achieved with minimal number of memory ports, while satisfying given constraints. We propose a measure of multiport memory cost, MAV (Multiple Access Variable) which is defined as a variable accessed at several control steps, and overall memory cost is reduced by equally distributing the MAVs throughout all the control steps. When compared with previous approaches for several benchmarks available from the literature, the proposed algorithm generates the datapaths with less memory modules and interconnection structures by reflecting the memory cost in the scheduling process

    Parallelization of Stochastic Evolution for Cell Placement

    Get PDF
    VLSI physical design and the problems related to it such as placement, channel routing, etc, carry inherent complexities that are best dealt with iterative heuristics. However the major drawback of these iterative heuristics has been the large runtime involved in reaching acceptable solutions especially when optimizing for multiple objectives. Among the acceleration techniques proposed, parallelization is one promising method. Distributed memory multiprocessor systems and shared memory multiprocessor systems have gained considerable attention in recent years of research. This idea of parallel computing has attracted both the researchers and manufacturers who are targeting to reduce the time to market. Our objective is to exploit the benefits of parallel computing for a time consuming placement problem in VLSI. Finding the best solution for the placement of n modules is a hard problem. Thus the enumerative search techniques, specially those which employ the brute force, are unaccepted for the circuits in which n (number of modules) is large. Constructive and Iterative heuristics play the key role in this scenario and hence are frequently used. We will use Stochastic Evolution for finding the optimal solution to the above mentioned placement problem where the major task in our objective will be the parallelization of Stochastic Evolution using different parallelization techniques and the comparison between these different parallelized versions based on the results achieved. The parallelization will be carried out using MPI (Message Passing Interface) on a distributed memory multiprocessor system and conclusion will be based on the results achieved that are expected to show speedup nearly equal to linear speedup when run over increasing number of processors

    FPGA Implementation of Data Flow Graphs for Digital Signal Processing Applications

    Get PDF
    A rapid growth in digital signal processing applications has increased the requirement for high-speed digital systems. Multiprocessor systems are the best choice for these applications. A prior sequence of operations should be applied to the operations that described the nature of these applications before hardware implementation is produced. These operations should be scheduled and hardware allocated. This paper proposes a new scheduling technique for digital signal processing (DSP) applications has been represented by data flow graphs (DFGs). In addition, hardware allocation is implemented in the form of embedded system. A proposed scheduling technique also achieves the optimal scheduling of a DFG at design time. The optimality criteria considered in this algorithm are the maximum throughput within the available hardware resources. The maximum throughput is achieved by arranging the DFG nodes according to their inter-related data dependencies. Then, two nodes can be clustered into one compound task to reduce the overall execution time by minimizing the number of tasks to be executed that minimizing the number of cycles to execute them. Then each task is presented in form of instruction to be executed in the hardware system. A hardware system is composed of one or multiple homogenous pipelined processing elements and it is designed to meet the maximum-rate schedule.  Two implementations are proposed of the system architecture according to the number of the processing elements, namely:  the serial system and the parallel system. The serial system comprises one processing element where all tasks are processed sequentially, whilst the parallel system has four processing elements to execute tasks concurrently. These systems consist mainly of seven units: central shared memory, state table, multiway function unit buffer, execution array, processing element/s, instruction buffer and the address generation unit. The hardware components were built on an FPGA chip using Verilog HDL. In synthesis results, the parallel system has better system performance by 25.5% than the serial system. While the serial system requires smaller area size, which described by the number of slice registers and the number of the slice lookup tables (LUTs) than the parallel one. The relationship between the number of instructions that are executed in both systems, and the system area and the system performance that presented by system frequency, are studied. By increasing memories size in both systems, the system performance isn’t affected as in a serial system, and it is slightly decreased as the parallel system by 1.5% to 4.5%. In terms of the systems area, both serial system area and parallel system area are increased and in some cases are doubled. The proposed scheduling technique is shown to outperform the retaining technique, which we have chosen to compare with.  The serial system has better performance by 19.3% higher system frequency than a retiming technique. And the parallel system also outperforms the retaining technique by 51.2% higher system frequency in synthesis results

    Parallelization of Stochastic Evolution for Cell Placement

    Get PDF
    VLSI physical design and the problems related to it such as placement, channel routing, etc, carry inherent complexities that are best dealt with iterative heuristics. However the major drawback of these iterative heuristics has been the large runtime involved in reaching acceptable solutions especially when optimizing for multiple objectives. Among the acceleration techniques proposed, parallelization is one promising method. Distributed memory multiprocessor systems and shared memory multiprocessor systems have gained considerable attention in recent years of research. This idea of parallel computing has attracted both the researchers and manufacturers who are targeting to reduce the time to market. Our objective is to exploit the benefits of parallel computing for a time consuming placement problem in VLSI. Finding the best solution for the placement of n modules is a hard problem. Thus the enumerative search techniques, specially those which employ the brute force, are unaccepted for the circuits in which n (number of modules) is large. Constructive and Iterative heuristics play the key role in this scenario and hence are frequently used. We will use Stochastic Evolution for finding the optimal solution to the above mentioned placement problem where the major task in our objective will be the parallelization of Stochastic Evolution using different parallelization techniques and the comparison between these different parallelized versions based on the results achieved. The parallelization will be carried out using MPI (Message Passing Interface) on a distributed memory multiprocessor system and conclusion will be based on the results achieved that are expected to show speedup nearly equal to linear speedup when run over increasing number of processors

    Parallelization of Stochastic Evolution for Cell Placement

    Get PDF
    VLSI physical design and the problems related to it such as placement, channel routing, etc, carry inherent complexities that are best dealt with iterative heuristics. However the major drawback of these iterative heuristics has been the large runtime involved in reaching acceptable solutions especially when optimizing for multiple objectives. Among the acceleration techniques proposed, parallelization is one promising method. Distributed memory multiprocessor systems and shared memory multiprocessor systems have gained considerable attention in recent years of research. This idea of parallel computing has attracted both the researchers and manufacturers who are targeting to reduce the time to market. Our objective is to exploit the benefits of parallel computing for a time consuming placement problem in VLSI. Finding the best solution for the placement of n modules is a hard problem. Thus the enumerative search techniques, specially those which employ the brute force, are unaccepted for the circuits in which n (number of modules) is large. Constructive and Iterative heuristics play the key role in this scenario and hence are frequently used. We will use Stochastic Evolution for finding the optimal solution to the above mentioned placement problem where the major task in our objective will be the parallelization of Stochastic Evolution using different parallelization techniques and the comparison between these different parallelized versions based on the results achieved. The parallelization will be carried out using MPI (Message Passing Interface) on a distributed memory multiprocessor system and conclusion will be based on the results achieved that are expected to show speedup nearly equal to linear speedup when run over increasing number of processors

    C-MOS array design techniques: SUMC multiprocessor system study

    Get PDF
    The current capabilities of LSI techniques for speed and reliability, plus the possibilities of assembling large configurations of LSI logic and storage elements, have demanded the study of multiprocessors and multiprocessing techniques, problems, and potentialities. Evaluated are three previous systems studies for a space ultrareliable modular computer multiprocessing system, and a new multiprocessing system is proposed that is flexibly configured with up to four central processors, four 1/0 processors, and 16 main memory units, plus auxiliary memory and peripheral devices. This multiprocessor system features a multilevel interrupt, qualified S/360 compatibility for ground-based generation of programs, virtual memory management of a storage hierarchy through 1/0 processors, and multiport access to multiple and shared memory units

    Problems related to the integration of fault tolerant aircraft electronic systems

    Get PDF
    Problems related to the design of the hardware for an integrated aircraft electronic system are considered. Taxonomies of concurrent systems are reviewed and a new taxonomy is proposed. An informal methodology intended to identify feasible regions of the taxonomic design space is described. Specific tools are recommended for use in the methodology. Based on the methodology, a preliminary strawman integrated fault tolerant aircraft electronic system is proposed. Next, problems related to the programming and control of inegrated aircraft electronic systems are discussed. Issues of system resource management, including the scheduling and allocation of real time periodic tasks in a multiprocessor environment, are treated in detail. The role of software design in integrated fault tolerant aircraft electronic systems is discussed. Conclusions and recommendations for further work are included
    corecore