919 research outputs found

    BRAHMS: Novel middleware for integrated systems computation

    Get PDF
    Biological computational modellers are becoming increasingly interested in building large, eclectic models, including components on many different computational substrates, both biological and non-biological. At the same time, the rise of the philosophy of embodied modelling is generating a need to deploy biological models as controllers for robots in real-world environments. Finally, robotics engineers are beginning to find value in seconding biomimetic control strategies for use on practical robots. Together with the ubiquitous desire to make good on past software development effort, these trends are throwing up new challenges of intellectual and technological integration (for example across scales, across disciplines, and even across time) - challenges that are unmet by existing software frameworks. Here, we outline these challenges in detail, and go on to describe a newly developed software framework, BRAHMS. that meets them. BRAHMS is a tool for integrating computational process modules into a viable, computable system: its generality and flexibility facilitate integration across barriers, such as those described above, in a coherent and effective way. We go on to describe several cases where BRAHMS has been successfully deployed in practical situations. We also show excellent performance in comparison with a monolithic development approach. Additional benefits of developing in the framework include source code self-documentation, automatic coarse-grained parallelisation, cross-language integration, data logging, performance monitoring, and will include dynamic load-balancing and 'pause and continue' execution. BRAHMS is built on the nascent, and similarly general purpose, model markup language, SystemML. This will, in future, also facilitate repeatability and accountability (same answers ten years from now), transparent automatic software distribution, and interfacing with other SystemML tools. (C) 2009 Elsevier Ltd. All rights reserved

    Quarc: an architecture for efficient on-chip communication

    Get PDF
    The exponential downscaling of the feature size has enforced a paradigm shift from computation-based design to communication-based design in system on chip development. Buses, the traditional communication architecture in systems on chip, are incapable of addressing the increasing bandwidth requirements of future large systems. Networks on chip have emerged as an interconnection architecture offering unique solutions to the technological and design issues related to communication in future systems on chip. The transition from buses as a shared medium to networks on chip as a segmented medium has given rise to new challenges in system on chip realm. By leveraging the shared nature of the communication medium, buses have been highly efficient in delivering multicast communication. The segmented nature of networks, however, inhibits the multicast messages to be delivered as efficiently by networks on chip. Relying on extensive research on multicast communication in parallel computers, several network on chip architectures have offered mechanisms to perform the operation, while conforming to resource constraints of the network on chip paradigm. Multicast communication in majority of these networks on chip is implemented by establishing a connection between source and all multicast destinations before the message transmission commences. Establishing the connections incurs an overhead and, therefore, is not desirable; in particular in latency sensitive services such as cache coherence. To address high performance multicast communication, this research presents Quarc, a novel network on chip architecture. The Quarc architecture targets an area-efficient, low power, high performance implementation. The thesis covers a detailed representation of the building blocks of the architecture, including topology, router and network interface. The cost and performance comparison of the Quarc architecture against other network on chip architectures reveals that the Quarc architecture is a highly efficient architecture. Moreover, the thesis introduces novel performance models of complex traffic patterns, including multicast and quality of service-aware communication

    Physical parameter-aware Networks-on-Chip design

    Get PDF
    PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable and power-efficient communication fabric for chip multiprocessors (CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine both the performance and the reliability of such systems, with a significant power demand that is expected to increase due to developments in both technology and architecture. In terms of architecture, an important trend in many-core systems architecture is to increase the number of cores on a chip while reducing their individual complexity. This trend increases communication power relative to computation power. Moreover, technology-wise, power-hungry wires are dominating logic as power consumers as technology scales down. For these reasons, the design of future very large scale integration (VLSI) systems is moving from being computation-centric to communication-centric. On the other hand, chip’s physical parameters integrity, especially power and thermal integrity, is crucial for reliable VLSI systems. However, guaranteeing this integrity is becoming increasingly difficult with the higher scale of integration due to increased power density and operating frequencies that result in continuously increasing temperature and voltage drops in the chip. This is a challenge that may prevent further shrinking of devices. Thus, tackling the challenge of power and thermal integrity of future many-core systems at only one level of abstraction, the chip and package design for example, is no longer sufficient to ensure the integrity of physical parameters. New designtime and run-time strategies may need to work together at different levels of abstraction, such as package, application, network, to provide the required physical parameter integrity for these large systems. This necessitates strategies that work at the level of the on-chip network with its rising power budget. This thesis proposes models, techniques and architectures to improve power and thermal integrity of Network-on-Chip (NoC)-based many-core systems. The thesis is composed of two major parts: i) minimization and modelling of power supply variations to improve power integrity; and ii) dynamic thermal adaptation to improve thermal integrity. This thesis makes four major contributions. The first is a computational model of on-chip power supply variations in NoCs. The proposed model embeds a power delivery model, an NoC activity simulator and a power model. The model is verified with SPICE simulation and employed to analyse power supply variations in synthetic and real NoC workloads. Novel observations regarding power supply noise correlation with different traffic patterns and routing algorithms are found. The second is a new application mapping strategy aiming vii to minimize power supply noise in NoCs. This is achieved by defining a new metric, switching activity density, and employing a force-based objective function that results in minimizing switching density. Significant reductions in power supply noise (PSN) are achieved with a low energy penalty. This reduction in PSN also results in a better link timing accuracy. The third contribution is a new dynamic thermal-adaptive routing strategy to effectively diffuse heat from the NoC-based threedimensional (3D) CMPs, using a dynamic programming (DP)-based distributed control architecture. Moreover, a new approach for efficient extension of two-dimensional (2D) partially-adaptive routing algorithms to 3D is presented. This approach improves three-dimensional networkon- chip (3D NoC) routing adaptivity while ensuring deadlock-freeness. Finally, the proposed thermal-adaptive routing is implemented in field-programmable gate array (FPGA), and implementation challenges, for both thermal sensing and the dynamic control architecture are addressed. The proposed routing implementation is evaluated in terms of both functionality and performance. The methodologies and architectures proposed in this thesis open a new direction for improving the power and thermal integrity of future NoC-based 2D and 3D many-core architectures

    High-Level Synthesis Hardware Design for FPGA-Based Accelerators: Models, Methodologies, and Frameworks

    Get PDF
    Hardware accelerators based on field programmable gate array (FPGA) and system on chip (SoC) devices have gained attention in recent years. One of the main reasons is that these devices contain reconfigurable logic, which makes them feasible for boosting the performance of applications. High-level synthesis (HLS) tools facilitate the creation of FPGA code from a high level of abstraction using different directives to obtain an optimized hardware design based on performance metrics. However, the complexity of the design space depends on different factors such as the number of directives used in the source code, the available resources in the device, and the clock frequency. Design space exploration (DSE) techniques comprise the evaluation of multiple implementations with different combinations of directives to obtain a design with a good compromise between different metrics. This paper presents a survey of models, methodologies, and frameworks proposed for metric estimation, FPGA-based DSE, and power consumption estimation on FPGA/SoC. The main features, limitations, and trade-offs of these approaches are described. We also present the integration of existing models and frameworks in diverse research areas and identify the different challenges to be addressed

    Dynamic partial reconfiguration for pipelined digital systems— A Case study using a color space conversion engine

    Get PDF
    In digital hardware design, reconfigurable devices such as Field Programmable Gate Arrays (FPGAs) allow for a unique feature called partial reconfiguration PR). This refers to the reprogramming of a subset of the reconfigurable logic during active operation. PR allows multiple hardware blocks to be consolidated into a single partition, which can be reprogrammed at run-time as desired. This may reduce the logic circuit (and silicon area) requirements and greatly extend functionality. Furthermore, dynamic partial reconfiguration (DPR) refers to PR that does not halt the system during reprogramming. This allows for configuration to overlap with normal processing, potentially achieving better system performance than a static(halting) PR implementation. This work has investigated the advantages and trade-offs of DPR as applied to an existing color space conversion(CSC) engine provided by Hewlett-Packard (HP). Two versions were created: a single-pipeline engine, which can only overlap tasks in specific sequences; and a dual-pipeline engine, which can overlap any consecutive tasks. These were implemented in a Virtex-6 FPGA. Data communication occurs over the PCI Express (PCIe) interface. Test results show improvements in execution speed and resource utilization, though some are minor due to intrinsic characteristics of the CSC engine pipeline. The dual-pipeline version outperformed the single-pipeline in most test cases. Therefore, future work will focus on multiple-pipeline architectures

    Integrated support for Adaptivity and Fault-tolerance in MPSoCs

    Get PDF
    The technology improvement and the adoption of more and more complex applications in consumer electronics are forcing a rapid increase in the complexity of multiprocessor systems on chip (MPSoCs). Following this trend, MPSoCs are becoming increasingly dynamic and adaptive, for several reasons. One of these is that applications are getting intrinsically dynamic. Another reason is that the workload on emerging MPSoCs cannot be predicted because modern systems are open to new incoming applications at run-time. A third reason which calls for adaptivity is the decreasing component reliability associated with technology scaling. Components below the 32-nm node are more inclined to temporal or even permanent faults. In case of a malfunctioning system component, the rest of the system is supposed to take over its tasks. Thus, the system adaptivity goal shall influence several de- sign decisions, that have been listed below: 1) The applications should be specified such that system adaptivity can be easily supported. To this end, we consider Polyhedral Process Networks (PPNs) as model of computation to specify applications. PPNs are composed by concurrent and autonomous processes that communicate between each other using bounded FIFO channels. Moreover, in PPNs the control is completely distributed, as well as the memories. This represents a good match with the emerging MPSoC architectures, in which processing elements and memories are usually distributed. Most importantly, the simple operational semantics of PPNs allows for an easy adoption of system adaptivity mechanisms. 2) The hardware platform should guarantee the flexibility that adaptivity mechanisms require. Networks-on-Chip (NoCs) are emerging communication infrastructures for MPSoCs that, among many other advantages, allow for system adaptivity. This is because NoCs are generic, since the same platformcan be used to run different applications, or to run the same application with different mapping of processes. However, there is a mismatch between the generic structure of the NoCs and the semantics of the PPN model. Therefore, in this thesis we investigate and propose several communication approaches to overcome this mismatch. 3) The system must be able to change the process mapping at run-time, using process migration. To this end, a process migration mechanism has been proposed and evaluated. This mechanism takes into account specific requirements of the embedded domain such as predictability and efficiency. To face the problem of graceful degradation of the system, we enriched the MADNESS NoC platform by adding fault tolerance support at both software and hardware level. The proposed process migration mechanism can be exploited to cope with permanent faults by migrating the processes running on the faulty processing element. A fast heuristic is used to determine the new mapping of the processes to tiles. The experimental results prove that the overhead in terms of execution time, due to the execution time of the remapping heuristic, together with the actual process migration, is almost negligible compared to the execution time of the whole application. This means that the proposed approach allows the system to change its performance metrics and to react to faults without a substantial impact on the user experience

    Integrated support for Adaptivity and Fault-tolerance in MPSoCs

    Get PDF
    The technology improvement and the adoption of more and more complex applications in consumer electronics are forcing a rapid increase in the complexity of multiprocessor systems on chip (MPSoCs). Following this trend, MPSoCs are becoming increasingly dynamic and adaptive, for several reasons. One of these is that applications are getting intrinsically dynamic. Another reason is that the workload on emerging MPSoCs cannot be predicted because modern systems are open to new incoming applications at run-time. A third reason which calls for adaptivity is the decreasing component reliability associated with technology scaling. Components below the 32-nm node are more inclined to temporal or even permanent faults. In case of a malfunctioning system component, the rest of the system is supposed to take over its tasks. Thus, the system adaptivity goal shall influence several de- sign decisions, that have been listed below: 1) The applications should be specified such that system adaptivity can be easily supported. To this end, we consider Polyhedral Process Networks (PPNs) as model of computation to specify applications. PPNs are composed by concurrent and autonomous processes that communicate between each other using bounded FIFO channels. Moreover, in PPNs the control is completely distributed, as well as the memories. This represents a good match with the emerging MPSoC architectures, in which processing elements and memories are usually distributed. Most importantly, the simple operational semantics of PPNs allows for an easy adoption of system adaptivity mechanisms. 2) The hardware platform should guarantee the flexibility that adaptivity mechanisms require. Networks-on-Chip (NoCs) are emerging communication infrastructures for MPSoCs that, among many other advantages, allow for system adaptivity. This is because NoCs are generic, since the same platformcan be used to run different applications, or to run the same application with different mapping of processes. However, there is a mismatch between the generic structure of the NoCs and the semantics of the PPN model. Therefore, in this thesis we investigate and propose several communication approaches to overcome this mismatch. 3) The system must be able to change the process mapping at run-time, using process migration. To this end, a process migration mechanism has been proposed and evaluated. This mechanism takes into account specific requirements of the embedded domain such as predictability and efficiency. To face the problem of graceful degradation of the system, we enriched the MADNESS NoC platform by adding fault tolerance support at both software and hardware level. The proposed process migration mechanism can be exploited to cope with permanent faults by migrating the processes running on the faulty processing element. A fast heuristic is used to determine the new mapping of the processes to tiles. The experimental results prove that the overhead in terms of execution time, due to the execution time of the remapping heuristic, together with the actual process migration, is almost negligible compared to the execution time of the whole application. This means that the proposed approach allows the system to change its performance metrics and to react to faults without a substantial impact on the user experience
    • …
    corecore