388 research outputs found
Parallel simulation techniques for telecommunication network modelling
In this thesis, we consider the application of parallel simulation to the performance modelling of telecommunication networks. A largely automated approach was first explored using a parallelizing compiler to speed up the simulation of simple models of circuit-switched networks. This yielded reasonable results for relatively little effort compared with other approaches. However, more complex simulation models of packet- and cell-based telecommunication networks, requiring the use of discrete event techniques, need an alternative approach. A critical review of parallel discrete event simulation indicated that a distributed model components approach using conservative or optimistic synchronization would be worth exploring. Experiments were therefore conducted using simulation models of queuing networks and Asynchronous Transfer Mode (ATM) networks to explore the potential speed-up possible using this approach. Specifically, it is shown that these techniques can be used successfully to speed-up the execution of useful telecommunication network simulations. A detailed investigation has demonstrated that conservative synchronization performs very well for applications with good look ahead properties and sufficient message traffic density and, given such properties, will significantly outperform optimistic synchronization. Optimistic synchronization, however, gives reasonable speed-up for models with a wider range of such properties and can be optimized for speed-up and memory usage at run time. Thus, it is confirmed as being more generally applicable particularly as model development is somewhat easier than for conservative synchronization. This has to be balanced against the more difficult task of developing and debugging an optimistic synchronization kernel and the application models
The exploitation of parallelism on shared memory multiprocessors
PhD ThesisWith the arrival of many general purpose shared memory multiple processor
(multiprocessor) computers into the commercial arena during the mid-1980's, a
rift has opened between the raw processing power offered by the emerging
hardware and the relative inability of its operating software to effectively deliver
this power to potential users. This rift stems from the fact that, currently, no
computational model with the capability to elegantly express parallel activity is
mature enough to be universally accepted, and used as the basis for programming
languages to exploit the parallelism that multiprocessors offer. To add to this,
there is a lack of software tools to assist programmers in the processes of designing
and debugging parallel programs.
Although much research has been done in the field of programming languages,
no undisputed candidate for the most appropriate language for programming
shared memory multiprocessors has yet been found. This thesis examines why this
state of affairs has arisen and proposes programming language constructs,
together with a programming methodology and environment, to close the ever
widening hardware to software gap.
The novel programming constructs described in this thesis are intended for use
in imperative languages even though they make use of the synchronisation
inherent in the dataflow model by using the semantics of single assignment when
operating on shared data, so giving rise to the term shared values. As there are
several distinct parallel programming paradigms, matching flavours of shared
value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
A Theoretical Approach Involving Recurrence Resolution, Dependence Cycle Statement Ordering and Subroutine Transformation for the Exploitation of Parallelism in Sequential Code.
To exploit parallelism in Fortran code, this dissertation consists of a study of the following three issues: (1) recurrence resolution in Do-loops for vector processing, (2) dependence cycle statement ordering in Do-loops for parallel processing, and (3) sub-routine parallelization. For recurrence resolution, the major findings include: (1) the node splitting algorithm cannot be used directly to break an essential antidependence link, of which the source variable that results in antidependence is itself the sink variable of another true dependence so a correction method is proposed, (2) a sink variable renaming technique is capable of breaking an antidependence and/or output-dependence link, (3) for recurrences formed by only true dependences, a dynamic dependence concept and the derived technique are powerful, and (4) by integrating related techniques, an algorithm for resolving a general multistatement recurrence is developed. The performance of a parallel loop is determined by the level of parallelism and the time delay due to interprocessor communication and synchronization. For a dependence cycle of a single parallel loop executed in a general synchronization mode, the parallelism exposed varies with the alignment of statements. Statements are reordered on the basis of execution-time of the loop as estimated at compile-time. An improved timing formula and a derived statement ordering algorithm are proposed. Further extension of this algorithm to multiple perfectly nested Do-loops with simple global dependence cycle is also presented. The subroutine is a potential source for parallel processing. Several problems must be solved for subroutine parallelization: (1) the precedence of parallel executions of subroutines, (2) identification of the optimum execution mode for each subroutine and (3) the restructuring of a serial program. A five-step approach to parallelize called subroutines for a calling subroutine is proposed: (1) computation of control dependence, (2) approximation of the global effects of subroutines, (3) analysis of data dependence, (4) identification of execution mode, and (5) restructuring of calling and called subroutines. Application of these five steps in a recursive manner to different levels of calling subroutines in a program addresses the parallelization of subroutines
Automated Debugging Methodology for FPGA-based Systems
Electronic devices make up a vital part of our lives. These are seen from mobiles, laptops, computers, home automation, etc. to name a few. The modern designs constitute billions of transistors. However, with this evolution, ensuring that the devices fulfill the designer’s expectation under variable conditions has also become a great challenge. This requires a lot of design time and effort. Whenever an error is encountered, the process is re-started. Hence, it is desired to minimize the number of spins required to achieve an error-free product, as each spin results in loss of time and effort.
Software-based simulation systems present the main technique to ensure the verification of the design before fabrication. However, few design errors (bugs) are likely to escape the simulation process. Such bugs subsequently appear during the post-silicon phase. Finding such bugs is time-consuming due to inherent invisibility of the hardware. Instead of software simulation of the design in the pre-silicon phase, post-silicon techniques permit the designers to verify the functionality through the physical implementations of the design. The main benefit of the methodology is that the implemented design in the post-silicon phase runs many order-of-magnitude faster than its counterpart in pre-silicon. This allows the designers to validate their design more exhaustively.
This thesis presents five main contributions to enable a fast and automated debugging solution for reconfigurable hardware. During the research work, we used an obstacle avoidance system for robotic vehicles as a use case to illustrate how to apply the proposed debugging solution in practical environments.
The first contribution presents a debugging system capable of providing a lossless trace of debugging data which permits a cycle-accurate replay. This methodology ensures capturing permanent as well as intermittent errors in the implemented design. The contribution also describes a solution to enhance hardware observability. It is proposed to utilize processor-configurable concentration networks, employ debug data compression to transmit the data more efficiently, and partially reconfiguring the debugging system at run-time to save the time required for design re-compilation as well as preserve the timing closure.
The second contribution presents a solution for communication-centric designs. Furthermore, solutions for designs with multi-clock domains are also discussed.
The third contribution presents a priority-based signal selection methodology to identify the signals which can be more helpful during the debugging process. A connectivity generation tool is also presented which can map the identified signals to the debugging system.
The fourth contribution presents an automated error detection solution which can help in capturing the permanent as well as intermittent errors without continuous monitoring of debugging data. The proposed solution works for designs even in the absence of golden reference.
The fifth contribution proposes to use artificial intelligence for post-silicon debugging. We presented a novel idea of using a recurrent neural network for debugging when a golden reference is present for training the network. Furthermore, the idea was also extended to designs where golden reference is not present
QoS-Driven Reconfigurable Parallel Computing for NoC-Based Clustered MPSoCs
Reconfigurable parallel computing is required to provide high-performance embedded computing, hide hardware complexity, boost software development, and manage multiple workloads when multiple applications are running simultaneously on the emerging network-on-chip (NoC)-based multiprocessor systems-on-chip (MPSoCs) platforms. In these type of systems, the overall system performance may be affected due to congestion, and therefore parallel programming stacks must be assisted by quality-of-service (QoS) support to meet application requirements and to deal with application dynamism. In this paper, we present a hardware-software QoS-driven reconfigurable parallel computing framework, i.e., the NoC services, the runtime QoS middleware API and our ocMPI library and its tracing support which has been tailored for a distributed-shared memory ARM clustered NoC-based MPSoC platform. The experimental results show the efficiency of our software stack under a broad range of parallel kernels and benchmarks, in terms of low-latency interprocess communication, good application scalability, and most important, they demonstrate the ability to enable runtime reconfiguration to manage workloads in message-passing parallel applications
Recommended from our members
Logical partitioning of parallel system simulations
Simulation has been a fundamental tool to prototype, hypothesize, and evaluate
new ideas to continue improving system performance. However, increasing levels
of processor parallelism and heterogeneity have introduced additional
constraints when evaluating new designs. The work embodied in this dissertation
explores how to leverage novel ideas in simulator partitioning to improve
simulator speed and flexibility for simulating these new types of systems.
The contribution of this work includes the introduction of optimistic
partitioned simulation to improve parallelization, and the introduction of
warped partitioned simulation for improved flexibility. These ideas are refined
and demonstrated through the use of prototypes to demonstrate their benefits
compared to state-of-the-art approaches. By leveraging partitioning in a
structured manner, it is possible to design simulators that better address the
open challenges of parallel and heterogeneous systems design.Electrical and Computer Engineerin
Synchronization of tasks in multiprocessor systems-on-chip
Tese de mestrado integrado. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 201
Recommended from our members
Deterministic, Mutable, and Distributed Record-Replay for Operating Systems and Database Systems
Application record and replay is the ability to record application execution and replay it at a later time. Record-replay has many use cases including diagnosing and debugging applications by capturing and reproducing hard to find bugs, providing transparent application fault tolerance by maintaining a live replica of a running program, and offline instrumentation that would be too costly to run in a production environment. Different record-replay systems may offer different levels of replay faithfulness, the strongest level being deterministic replay which guarantees an identical reenactment of the original execution. Such a guarantee requires capturing all sources of nondeterminism during the recording phase. In the general case, such record-replay systems can dramatically hinder application performance, rendering them unpractical in certain application domains. Furthermore, various use cases are incompatible with strictly replaying the original execution. For example, in a primary-secondary database scenario, the secondary database would be unable to serve additional traffic while being replicated. No record-replay system fit all use cases.
This dissertation shows how to make deterministic record-replay fast and efficient, how broadening replay semantics can enable powerful new use cases, and how choosing the right level of abstraction for record-replay can support distributed and heterogeneous database replication with little effort.
We explore four record-replay systems with different semantics enabling different use cases. We first present Scribe, an OS-level deterministic record-replay mechanism that support multi-process applications on multi-core systems. One of the main challenge is to record the interaction of threads running on different CPU cores in an efficient manner. Scribe introduces two new lightweight OS mechanisms, rendezvous point and sync points, to efficiently record nondeterministic interactions such as related system calls, signals, and shared memory accesses. Scribe allows the capture and replication of hard to find bugs to facilitate debugging and serves as a solid foundation for our two following systems.
We then present RacePro, a process race detection system to improve software correctness. Process races occur when multiple processes access shared operating system resources, such as files, without proper synchronization. Detecting process races is difficult due to the elusive nature of these bugs, and the heterogeneity of frameworks involved in such bugs. RacePro is the first tool to detect such process races. RacePro records application executions in deployed systems, allowing offline race detection by analyzing the previously recorded log. RacePro then replays the application execution and forces the manifestation of detected races to check their effect on the application. Upon failure, RacePro reports potentially harmful races to developers.
Third, we present Dora, a mutable record-replay system which allows a recorded execution of an application to be replayed with a modified version of the application. Mutable record-replay provides a number of benefits for reproducing, diagnosing, and fixing software bugs. Given a recording and a modified application, finding a mutable replay is challenging, and undecidable in the general case. Despite the difficulty of the problem, we show a very simple but effective algorithm to search for suitable replays.
Lastly, we present Synapse, a heterogeneous database replication system designed for Web applications. Web applications are increasingly built using a service-oriented architecture that integrates services powered by a variety of databases. Often, the same data, needed by multiple services, must be replicated across different databases and kept in sync. Unfortunately, these databases use vendor specific data replication engines which are not compatible with each other. To solve this challenge, Synapse operates at the application level to access a unified data representation through object relational mappers. Additionally, Synapse leverages application semantics to replicate data with good consistency semantics using mechanisms similar to Scribe
- …