28 research outputs found
Circuit simulation using distributed waveform relaxation techniques
Simulation plays an important role in the design of integrated circuits. Due to high costs and large delays involved in their fabrication, simulation is commonly used to verify functionality and to predict performance before fabrication. This thesis describes analysis, implementation and performance evaluation of a distributed memory parallel waveform relaxation technique for the electrical circuit simulation of MOS VLSI circuits. The waveform relaxation technique exhibits inherent parallelism due to the partitioning of a circuit into a number of sub-circuits. These subcircuits can be concurrently simulated on parallel processors. Different forms of parallelism in the direct method and the waveform relaxation technique are studied. An analysis of single queue and distributed queue approaches to implement parallel waveform relaxation on distributed memory machines is performed and their performance implications are studied. The distributed queue approach selected for exploiting the coarse grain parallelism across sub-circuits is described. Parallel waveform relaxation programs based on Gauss-Seidel and Gauss-Jacobi techniques are implemented using a network of eight Transputers. Static and dynamic load balancing strategies are studied. A dynamic load balancing algorithm is developed and implemented. Results of parallel implementation are analyzed to identify sources of bottlenecks. This thesis has demonstrated the applicability of a low cost distributed memory multi-computer system for simulation of MOS VLSI circuits. Speed-up measurements prove that a five times improvement in the speed of calculations can be achieved using a full window parallel Gauss-Jacobi waveform relaxation algorithm. Analysis of overheads shows that load imbalance is the major source of overhead and that the fraction of the computation which must be performed sequentially is very low. Communication overhead depends on the nature of the parallel architecture and the design of communication mechanisms. The run-time environment (parallel processing framework) developed in this research exploits features of the Transputer architecture to reduce the effect of the communication overhead by effectively overlapping computation with communications, and running communications processes at a higher priority. This research will contribute to the development of low cost, high performance workstations for computer-aided design and analysis of VLSI circuits
Recommended from our members
Logic, parallelism and semantic networks : the binary predicate execution model
This thesis develops the Binary Predicate Execution Model; a distributed, massively-parallel system for semantic networks and knowledge bases that is built on a subset of first-order predicate logic. The use of logic gives the model an easily-understood programming paradigm and a well-defined semantics of execution. When expressed in binary predicates, a simple graphical interpretation can be used. All program facts are represented in an assertion graph. Each vertex is associated with a term appearing in a fact and the edges are labeled with the predicate names. Similar graphs are also associated with each rule body and the query. Finding all possible solutions corresponds to finding all possible matches between the query graph and the assertion graph. Invoking a rule corresponds to substituting the graph of its body constrained by the dependencies between its arguments. This can be implemented in a parallel, message-passing fashion where the assertion graph vertices are active processing elements which asynchronously exchange messages identifying different parts of the query that remain to be matched and containing any binding information from previous matching required to accomplish this. The model is data-driven since every message can be immediately processed without the need for any centralized control or centralized memory. By restricting how functional terms can occur, distributed data structures and remote data look-ups for unification are eliminated. Thus, the model's performance on increasingly larger problems scales-up given increasingly larger machines in most cases. Architectural support for the model is investigated and simulation results of a relatively simple software implementation are reported. This suggests performance on the order of 10^5 logical inferences per second for 256 processing elements in an n-cube configuration. Further research directions, including that of increasing efficiency, are discussed
Physical parameter-aware Networks-on-Chip design
PhD ThesisNetworks-on-Chip (NoCs) have been proposed as a scalable, reliable
and power-efficient communication fabric for chip multiprocessors
(CMPs) and multiprocessor systems-on-chip (MPSoCs). NoCs determine
both the performance and the reliability of such systems, with a
significant power demand that is expected to increase due to developments
in both technology and architecture. In terms of architecture, an
important trend in many-core systems architecture is to increase the
number of cores on a chip while reducing their individual complexity.
This trend increases communication power relative to computation
power. Moreover, technology-wise, power-hungry wires are dominating
logic as power consumers as technology scales down. For these
reasons, the design of future very large scale integration (VLSI) systems
is moving from being computation-centric to communication-centric.
On the other hand, chip’s physical parameters integrity, especially
power and thermal integrity, is crucial for reliable VLSI systems. However,
guaranteeing this integrity is becoming increasingly difficult with
the higher scale of integration due to increased power density and operating
frequencies that result in continuously increasing temperature
and voltage drops in the chip. This is a challenge that may prevent
further shrinking of devices. Thus, tackling the challenge of power
and thermal integrity of future many-core systems at only one level
of abstraction, the chip and package design for example, is no longer
sufficient to ensure the integrity of physical parameters. New designtime
and run-time strategies may need to work together at different
levels of abstraction, such as package, application, network, to provide
the required physical parameter integrity for these large systems. This
necessitates strategies that work at the level of the on-chip network
with its rising power budget.
This thesis proposes models, techniques and architectures to improve
power and thermal integrity of Network-on-Chip (NoC)-based
many-core systems. The thesis is composed of two major parts: i)
minimization and modelling of power supply variations to improve
power integrity; and ii) dynamic thermal adaptation to improve thermal
integrity. This thesis makes four major contributions. The first is
a computational model of on-chip power supply variations in NoCs.
The proposed model embeds a power delivery model, an NoC activity
simulator and a power model. The model is verified with SPICE simulation
and employed to analyse power supply variations in synthetic
and real NoC workloads. Novel observations regarding power supply
noise correlation with different traffic patterns and routing algorithms
are found. The second is a new application mapping strategy aiming
vii
to minimize power supply noise in NoCs. This is achieved by defining
a new metric, switching activity density, and employing a force-based
objective function that results in minimizing switching density. Significant
reductions in power supply noise (PSN) are achieved with a low
energy penalty. This reduction in PSN also results in a better link timing
accuracy. The third contribution is a new dynamic thermal-adaptive
routing strategy to effectively diffuse heat from the NoC-based threedimensional
(3D) CMPs, using a dynamic programming (DP)-based distributed
control architecture. Moreover, a new approach for efficient extension
of two-dimensional (2D) partially-adaptive routing algorithms
to 3D is presented. This approach improves three-dimensional networkon-
chip (3D NoC) routing adaptivity while ensuring deadlock-freeness.
Finally, the proposed thermal-adaptive routing is implemented in
field-programmable gate array (FPGA), and implementation challenges,
for both thermal sensing and the dynamic control architecture are addressed.
The proposed routing implementation is evaluated in terms
of both functionality and performance.
The methodologies and architectures proposed in this thesis open a
new direction for improving the power and thermal integrity of future
NoC-based 2D and 3D many-core architectures
Run-time management for future MPSoC platforms
In recent years, we are witnessing the dawning of the Multi-Processor Systemon- Chip (MPSoC) era. In essence, this era is triggered by the need to handle more complex applications, while reducing overall cost of embedded (handheld) devices. This cost will mainly be determined by the cost of the hardware platform and the cost of designing applications for that platform. The cost of a hardware platform will partly depend on its production volume. In turn, this means that ??exible, (easily) programmable multi-purpose platforms will exhibit a lower cost. A multi-purpose platform not only requires ??exibility, but should also combine a high performance with a low power consumption. To this end, MPSoC devices integrate computer architectural properties of various computing domains. Just like large-scale parallel and distributed systems, they contain multiple heterogeneous processing elements interconnected by a scalable, network-like structure. This helps in achieving scalable high performance. As in most mobile or portable embedded systems, there is a need for low-power operation and real-time behavior. The cost of designing applications is equally important. Indeed, the actual value of future MPSoC devices is not contained within the embedded multiprocessor IC, but in their capability to provide the user of the device with an amount of services or experiences. So from an application viewpoint, MPSoCs are designed to ef??ciently process multimedia content in applications like video players, video conferencing, 3D gaming, augmented reality, etc. Such applications typically require a lot of processing power and a signi??cant amount of memory. To keep up with ever evolving user needs and with new application standards appearing at a fast pace, MPSoC platforms need to be be easily programmable. Application scalability, i.e. the ability to use just enough platform resources according to the user requirements and with respect to the device capabilities is also an important factor. Hence scalability, ??exibility, real-time behavior, a high performance, a low power consumption and, ??nally, programmability are key components in realizing the success of MPSoC platforms. The run-time manager is logically located between the application layer en the platform layer. It has a crucial role in realizing these MPSoC requirements. As it abstracts the platform hardware, it improves platform programmability. By deciding on resource assignment at run-time and based on the performance requirements of the user, the needs of the application and the capabilities of the platform, it contributes to ??exibility, scalability and to low power operation. As it has an arbiter function between different applications, it enables real-time behavior. This thesis details the key components of such an MPSoC run-time manager and provides a proof-of-concept implementation. These key components include application quality management algorithms linked to MPSoC resource management mechanisms and policies, adapted to the provided MPSoC platform services. First, we describe the role, the responsibilities and the boundary conditions of an MPSoC run-time manager in a generic way. This includes a de??nition of the multiprocessor run-time management design space, a description of the run-time manager design trade-offs and a brief discussion on how these trade-offs affect the key MPSoC requirements. This design space de??nition and the trade-offs are illustrated based on ongoing research and on existing commercial and academic multiprocessor run-time management solutions. Consequently, we introduce a fast and ef??cient resource allocation heuristic that considers FPGA fabric properties such as fragmentation. In addition, this thesis introduces a novel task assignment algorithm for handling soft IP cores denoted as hierarchical con??guration. Hierarchical con??guration managed by the run-time manager enables easier application design and increases the run-time spatial mapping freedom. In turn, this improves the performance of the resource assignment algorithm. Furthermore, we introduce run-time task migration components. We detail a new run-time task migration policy closely coupled to the run-time resource assignment algorithm. In addition to detailing a design-environment supported mechanism that enables moving tasks between an ISP and ??ne-grained recon??gurable hardware, we also propose two novel task migration mechanisms tailored to the Network-on-Chip environment. Finally, we propose a novel mechanism for task migration initiation, based on reusing debug registers in modern embedded microprocessors. We propose a reactive on-chip communication management mechanism. We show that by exploiting an injection rate control mechanism it is possible to provide a communication management system capable of providing a soft (reactive) QoS in a NoC. We introduce a novel, platform independent run-time algorithm to perform quality management, i.e. to select an application quality operating point at run-time based on the user requirements and the available platform resources, as reported by the resource manager. This contribution also proposes a novel way to manage the interaction between the quality manager and the resource manager. In order to have a the realistic, reproducible and ??exible run-time manager testbench with respect to applications with multiple quality levels and implementation tradev offs, we have created an input data generation tool denoted Pareto Surfaces For Free (PSFF). The the PSFF tool is, to the best of our knowledge, the ??rst tool that generates multiple realistic application operating points either based on pro??ling information of a real-life application or based on a designer-controlled random generator. Finally, we provide a proof-of-concept demonstrator that combines these concepts and shows how these mechanisms and policies can operate for real-life situations. In addition, we show that the proposed solutions can be integrated into existing platform operating systems
The connection machine
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1988.Bibliography: leaves 134-157.by William Daniel Hillis.Ph.D