16 research outputs found
Memory models for heterogeneous systems
Heterogeneous systems, in which a CPU and an accelerator can execute together while sharing memory, are becoming popular in several computing sectors. Nowadays, programmers can split their computation into multiple specialised threads that can take advantage of each specialised component. FPGAs are popular accelerators with configurable logic for various tasks, and hardware manufacturers are developing platforms with tightly integrated multicore CPUs and FPGAs. In such tightly integrated platforms, the CPU threads and the FPGA threads access shared memory locations in a fine-grained manner. However, architectural optimisations will lead to instructions being observed out of order by different cores. The programmers must consider these reorderings for correct program executions.
Memory models can aid in reasoning about these complex systems since they can be used to explore guarantees regarding the systems' behaviours. These models are helpful for low-level programmers, compiler writers, and designers of analysis tools. Memory models are specified according to two main paradigms: operational and axiomatic. An operational model is an abstract representation of the actual machine, described by states that represent idealised components such as buffers and queues, and the legal transitions between these states. Axiomatic models define relations between memory accesses to constrain the allowed and disallowed behaviours.
This dissertation makes the following main contributions: an operational model of a CPU/FPGA system, an axiomatic one and an exploration of simulation techniques for operational models. The operational model is implemented in C and validated using all the behaviours described in the available documentation. We will see how the ambiguities from the documentation can be clarified by running tests on the hardware and consulting with the designers. Finally, to demonstrate the model's utility, we reason about a producer/consumer buffer implemented across the CPU and the FPGA.
The simulation of axiomatic models can be orders of magnitude faster than operational models. For this reason, we also provide an axiomatic version of the memory model. This model allows us to generate small concurrent programs to reveal whether a specific memory model behaviour can occur. However, synthesising a single test for the FPGA requires significant time and prevents us from directly running many tests. To overcome this issue, we develop a soft-core processor that allows us to quickly run large numbers of such tests and gain higher confidence in the accuracy of our models.
The simulation of the operational model faces a path-explosion problem that limits the exploration of large models. Observing that program analysis tools tackle a similar path-explosion problem, we investigate the idea of reducing the decision problem of ``whether a given memory model allows a given behaviour'' to the decision problem of ``whether a given C program is safe'', which can be handled by a variety of off-the-shelf tools. Using this approach, we can simulate our model more deeply and gain more confidence in its accuracy.Open Acces
Technical Report: Feedback-Based Generation of Hardware Characteristics
ABSTRACT In large complex server-like computer systems it is difficult to characterise hardware usage in early stages of system development. Many times the applications running on the platform are not ready at the time of platform deployment leading to postponed metrics measurement. In our study we seek answers to the questions: (1) Can we use a feedbackbased control system to create a characteristics model of a real production system? (2) Can such a model be sufficiently accurate to detect characteristics changes instead of executing the production application? The model we have created runs a signalling application, similar to the production application, together with a PIDregulator generating L1 and L2 cache misses to the same extent as the production system. Our measurements indicate that we have managed to mimic a similar environment regarding cache characteristics. Additionally we have applied the model on a software update for a production system and detected characteristics changes using the model. This has later been verified on the complete production system, which in this study is a large scale telecommunication system with a substantial market share
Model Checking and Model-Based Testing : Improving Their Feasibility by Lazy Techniques, Parallelization, and Other Optimizations
This thesis focuses on the lightweight formal method of model-based testing for checking safety properties, and derives a new and more feasible approach.
For liveness properties, dynamic testing is impossible, so feasibility is increased by specializing on an important class of properties, livelock freedom, and deriving a more feasible model checking algorithm for it.
All mentioned improvements are substantiated by experiments
Conception Assistée des Logiciels Sécurisés pour les Systèmes Embarqués
A vast majority of distributed embedded systems is concerned by security risks. The fact that applications may result poorly protected is partially due to methodological lacks in the engineering development process. More specifically, methodologies targeting formal verification may lack support to certain phases of the development process. Particularly, system modeling frameworks may be complex-to-use or not address security at all. Along with that, testing is not usually addressed by verification methodologies since formal verification and testing are considered as exclusive stages. Nevertheless, we believe that platform testing can be applied to ensure that properties formally verified in a model are truly endowed to the real system. Our contribution is made in the scope of a model-driven based methodology that, in particular, targets secure-by-design embedded systems. The methodology is an iterative process that pursues coverage of several engineering development phases and that relies upon existing security analysis techniques. Still in evolution, the methodology is mainly defined via a high level SysML profile named Avatar. The contribution specifically consists on extending Avatar so as to model security concerns and in formally defining a model transformation towards a verification framework. This contribution allows to conduct proofs on authenticity and confidentiality. We illustrate how a cryptographic protocol is partially secured by applying several methodology stages. In addition, it is described how Security Testing was conducted on an embedded prototype platform within the scope of an automotive project.Une vaste majorité de systèmes embarqués distribués sont concernés par des risques de sécurité. Le fait que les applications peuvent être mal protégées est partiellement à cause des manques méthodologiques dans le processus d’ingénierie de développement. Particulièrement, les méthodologies qui ciblent la vérification formelle peuvent manquer de support pour certaines étapes du processus de développement SW. Notamment, les cadres de modélisation peuvent être complexes à utiliser ou ne pas adresser la sécurité du tout. Avec cela, l’étape de tests n’est pas normalement abordée par les méthodologies de vérification formelle. Néanmoins, nous croyons que faire des tests sur la plateforme peut aider à assurer que les propriétés vérifiées dans le modèle sont véritablement préservées par le système embarqué. Notre contribution est faite dans le cadre d’une méthodologie nommée Avatar qui est basée sur les modèles et vise la sécurité dès la conception du système. La méthodologie est un processus itératif qui poursuit la couverture de plusieurs étapes du développement SW et qui s’appuie sur plusieurs techniques d’analyse de sécurité. La méthodologie compte avec un cadre de modélisation SysML. Notre contribution consiste notamment à étendre le cadre de modélisation Avatar afin d’aborder les aspects de sécurité et aussi à définir une transformation du modèle Avatar vers un cadre de vérification formel. Cette contribution permet d’effectuer preuves d’authenticité et confidentialité. Nous montrons comment un protocole cryptographique est partiellement sécurisé. Aussi, il est décrit comment les tests de sécurité ont été menés sur un prototype dans le cadre d’un projet véhiculaire
Performance Optimization Strategies for Transactional Memory Applications
This thesis presents tools for Transactional Memory (TM) applications that cover multiple TM systems (Software, Hardware, and hybrid TM) and use information of all different layers of the TM software stack. Therefore, this thesis addresses a number of challenges to extract static information, information about the run time behavior, and expert-level knowledge to develop these new methods and strategies for the optimization of TM applications
Systematic energy characterization of CMP/SMT processor systems via automated micro-benchmarks
Microprocessor-based systems today are composed of multi-core, multi-threaded processors with complex cache hierarchies and gigabytes of main memory. Accurate characterization of such a system, through predictive pre-silicon modeling and/or diagnostic postsilicon measurement based analysis are increasingly cumbersome and error prone. This is especially true of energy-related characterization studies. In this paper, we take the position that automated micro-benchmarks generated with particular objectives in mind hold the key to obtaining accurate energy-related characterization. As such, we first present a flexible micro-benchmark generation framework (MicroProbe) that is used to probe complex multi-core/multi-threaded systems with a variety and range of energy-related queries in mind. We then present experimental results centered around an
IBM POWER7 CMP/SMT system to demonstrate how the systematically generated micro-benchmarks can be used to answer three
specific queries: (a) How to project application-specific (and if needed, phase-specific) power consumption with component-wise breakdowns? (b) How to measure energy-per-instruction (EPI) values for the target machine? (c) How to bound the worst-case (maximum) power consumption in order to determine safe, but practical (i.e. affordable) packaging or cooling solutions? The solution approaches to the above problems are all new. Hardware measurement
based analysis shows superior power projection accuracy (with error margins of less than 2.3% across SPEC CPU2006) as well as max-power stressing capability (with 10.7% increase in processor power over the very worst-case power seen during the execution of SPEC CPU2006 applications).Peer ReviewedPostprint (author’s final draft
Recommended from our members
Automatic generation of synthetic workloads for multicore systems
textWhen designing a computer system, benchmark programs are used with cycle accurate performance/power simulators and HDL level simulators to evaluate novel architectural enhancements, perform design space exploration, understand the worst-case power characteristics of various designs and find performance bottlenecks. This research effort is directed towards automatically generating synthetic benchmarks to tackle three design challenges: 1) For most of the simulation related purposes, full runs of modern real world parallel applications like the PARSEC, SPLASH suites cannot be used as they take machine weeks of time on cycle accurate and HDL level simulators incurring a prohibitively large time cost 2) The second design challenge is that, some of these real world applications are intellectual property and cannot be shared with processor vendors for design studies 3) The most significant problem in the design stage is the complexity involved in fixing the maximum power consumption of a multicore design, called the Thermal Design Power (TDP). In an effort towards fixing this maximum power consumption of a system at the most optimal point, designers are used to hand-crafting possible code snippets called power viruses. But, this process of trying to manually write such maximum power consuming code snippets is very tedious.
All of these aforementioned challenges has lead to the resurrection of synthetic benchmarks in the recent past, serving as a promising solution to all the challenges. During the design stage of a multicore system, availability of a framework to automatically generate system-level synthetic benchmarks for multicore systems will greatly simplify the design process and result in more confident design decisions. The key idea behind such an adaptable benchmark synthesis framework is to identify the key characteristics of real world parallel applications that affect the performance and power consumption of a real program and create synthetic executable programs by varying the values for these characteristics. Firstly, with such a framework, one can generate miniaturized synthetic clones for large target (current and futuristic) parallel applications enabling an architect to use them with slow low-level simulation models (e.g., RTL models in VHDL/Verilog) and helps in tailoring designs to the targeted applications. These synthetic benchmark clones can be distributed to architects and designers even if the original applications are intellectual property, when they are not publicly available. Lastly, such a framework can be used to automatically create maximum power consuming code snippets to be able to help in fixing the TDP, heat sinks, cooling system and other power related features of the system.
The workload cloning framework built using the proposed synthetic benchmark generation methodology is evaluated to show its superiority over the existing cloning methodologies for single-core systems by generating miniaturized clones for CPU2006 and ImplantBench workloads with only an average error of 2.9% in performance for up to five orders of magnitude of simulation speedup. The correlation coefficient predicting the sensitivity to design changes is 0.95 and 0.98 for performance and power consumption. The proposed framework is evaluated by cloning parallel applications implemented based on p-threads and OpenMP in the PARSEC benchmark suite. The average error in predicting performance is 4.87% and that of power consumption is 2.73%. The correlation coefficient predicting the sensitivity to design changes is 0.92 for performance. The efficacy of the proposed synthetic benchmark generation framework for power virus generation is evaluation on SPARC, Alpha and x86 ISAs using full system simulators and also using real hardware. The results show that the power viruses generated for single-core systems consume 14-41% more power compared to MPrime on SPARC ISA. Similarly, the power viruses generated for multicore systems consume 45-98%, 40-89% and 41-56% more power than PARSEC workloads, running multiple copies of MPrime and multithreaded SPECjbb respectively.Electrical and Computer Engineerin
Recommended from our members
Measuring program similarity for efficient benchmarking and performance analysis of computer systems
textComputer benchmarking involves running a set of benchmark programs to measure performance of a computer system. Modern benchmarks are developed from real applications. Applications are becoming complex and hence modern benchmarks run for a very long time. These benchmarks are also used for performance evaluation in the early design phase of microprocessors. Due to the size of benchmarks and increase in complexity of microprocessor design, the effort required for performance evaluation has increased significantly. This dissertation proposes methodologies to reduce the effort of benchmarking and performance evaluation of computer systems. Identifying a set of programs that can be used in the process of benchmarking can be very challenging. A solution to this problem can start by identifying similarity between programs to capture the diversity in their behavior before they can be considered for benchmarking. The aim of this methodology is to identify redundancy in the set of benchmarks and find a subset of representative benchmarks with the least possible loss of information. This dissertation proposes the use of program characteristics which capture the performance behavior of programs and identifies representative benchmarks applicable over a wide range of system configurations. The use of benchmark subsetting has not been restricted to academic research. Recently, the SPEC CPU subcommittee used the information derived from measuring similarity based on program behavior characteristics between different benchmark candidates as one of the criteria for selecting the SPEC CPU2006 benchmarks. The information of similarity between programs can also be used to predict performance of an application when it is difficult to port the application on different platforms. This is a common problem when a customer wants to buy the best computer system for his application. Performance of a customer's application on a particular system can be predicted using the performance scores of the standard benchmarks on that system and the similarity information between the application and the benchmarks. Similarity between programs is quantified by the distance between them in the space of the measured characteristics, and is appropriately used to predict performance of a new application using the performance scores of its neighbors in the workload space.Electrical and Computer Engineerin