291 research outputs found

    Virtual Cycle-accurate Hardware and Software Co-simulation Platform for Cellular IoT

    Get PDF
    Modern embedded development flows often depend on FPGA board usage for pre-ASIC system verification. The purpose of this project is to instead explore the usage of Electronic System Level (ESL) hardware-software co-simulation through the usage of ARM SoC Designer tool to create a virtual prototype of a cellular IoT modem and thereafter compare the benefits of including such a methodology into the early development cycle. The virtual system is completely developed and executed on a host computer, without the requirement of additional hardware. The virtual prototype hardware is based on C++ ARM verified cycle-accurate models generated from RTL hardware descriptions, High-level synthesis (HLS) pre-synthesis SystemC HW accelerator models and behavioural models which implement the ARM Cycle-accurate Simulation Interface (CASI). The micro-controller of the virtual system which is based on an ARM Cortex-M processor, is capable of executing instructions from a memory module. This report documents the virtual prototype implementation and compares both the software performance and cycle-accuracy of various virtual micro-controller configurations to a commercial reference development board. By altering factors such as memory latencies and bus interconnect subsystem arbitration in co-simulations, the software cycle-count performance of the development board was shown possible to reproduce within a 5% error margin, at the cost of approximately 266 times slower execution speed. Furthermore, the validity of two HLS pre-synthesis hardware models is investigated and proven to be functionally accurate within three clock cycles of individual block latency compared to post-synthesis FPGA synthesized implementations. The final virtual prototype system consisted of the micro-controller and two cellular IoT hardware accelerators. The system runs a FreeRTOS 9.0.0 port, executing a multi-threaded program at an average clock cycle simulation frequency of 10.6 kHz.-Designing and simulating embedded computer systems virtually. Cellular internet of things (IoT) is a new technology that will enable the interconnection of everything: from street lights and parking meters to your gas or water meter at home, wireless cellular networks will allow information to be shared between devices. However, in order for these systems to provide any useful data, they need to include a computer chip with a system to manage the communication itself, enabling the connection to a cellular network and the actual transmission and reception of data. Such a chip is called an embedded chip or system. Traditionally, the design and verification of digital embedded systems, that is to say a system which has both hardware and software components, had to be done in two steps. The first step consists of designing all the hardware, testing it, integrating it and producing it physically on silicon in order to verify the intended functionality of all the components. The second step thus consists of taking the hardware that has been developed and designing the software: a program which will have to execute in complete compliance to the hardware that has been previously developed. This poses two main issues: the software engineers cannot begin their work properly until the hardware is finished, which makes the process very long, and the fact that the hardware has been printed on silicon greatly restricts the possibility of doing changes to accommodate late system requirement alterations; which is quite likely for a tailor-made application specific system such as a cellular IoT chip. A currently widespread technology used to mitigate the previously mentioned negative aspects of embedded design, is the employment of field-programmable gate array (FPGA) development boards which often contain a micro-controller (with a processor and some memories), and a gate array connected to it. The FPGA part consists of a lattice of digital logic gates which can be programmed to interconnect and represent the functionality of the hardware being designed. The processor can thus execute software instructions placed on the memories and the hardware being developed can be programmed into the gate array in order to integrate and verify a full hardware and software system. Nevertheless, this boards are expensive and limit the design to the hardware components available commercially in the different off-the-shelf models, e.g. a specific processor which might not be the desired one. Now imagine there is a way to design hardware components such as processors in the traditional way, however once the hardware has been implemented it can be integrated together with software without the need of printing a physical silicon chip specifically for this purpose. That would be extremely convenient and would save lots of time, would it not? Fortunately, this is already possible due to Electronic System Level (ESL) design, which is compilation of techniques that allow to design, simulate and partially verify a digital chip, all within any normal laptop or desktop computer. Moreover, some ESL tools such as the one investigated in this project, allow you to even simulate a program code written specifically for this hardware; this is known as virtual hardware software co-simulation. The reliability of simulation must however be considered when compared to a traditional two-step methodology or FPGA board usage to verify a full system. This is because a virtual hardware simulation can have several degrees of accuracy, depending on the specificity of component models that make up the virtual prototype of the digital system. Therefore, in order to use co-simulation techniques with a high degree of confidence for verification, the highest accuracy degree should be employed if possible to guarantee that what is being simulated will match the reality of a silicon implementation. The clock cycle-accurate level is one of the highest accuracy system simulation methods available, and it consists of representing the digital states of all hardware components such as signals and registers, in a cycle-by-cycle manner. By using the ARM SoC Designer ESL tool, we have co-designed and co-simulated several microcontrollers on a detailed, cycle-accurate level and confirmed its behaviour by comparing it to a physical reference target development board. Finally, a more complex virtual prototype of a cellular IoT system was also simulated, including a micro-controller running a a real-time operating system (RTOS), hardware accelerators and serial data interfacing. Parts of this virtual prototype where compared to an FPGA board to evaluate the pros and cons of incorporating virtual system simulation into the development cycle and to what extent can ESL methods substitute traditional verification techniques. The ease of interchanging hardware, simplicity of development, simulation speed and the level of debug capabilities available when developing in a virtual environment are some of the aspects of ARM SoC Designer discussed in this thesis. A more in depth description of the methodology and results can be found in the report titled "Virtual Cycle-accurate Hardware and Software Co-simulation Platform for Cellular IoT"

    A Co-Processor Approach for Efficient Java Execution in Embedded Systems

    Get PDF
    This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

    Automatic performance optimisation of component-based enterprise systems via redundancy

    Get PDF
    Component technologies, such as J2EE and .NET have been extensively adopted for building complex enterprise applications. These technologies help address complex functionality and flexibility problems and reduce development and maintenance costs. Nonetheless, current component technologies provide little support for predicting and controlling the emerging performance of software systems that are assembled from distinct components. Static component testing and tuning procedures provide insufficient performance guarantees for components deployed and run in diverse assemblies, under unpredictable workloads and on different platforms. Often, there is no single component implementation or deployment configuration that can yield optimal performance in all possible conditions under which a component may run. Manually optimising and adapting complex applications to changes in their running environment is a costly and error-prone management task. The thesis presents a solution for automatically optimising the performance of component-based enterprise systems. The proposed approach is based on the alternate usage of multiple component variants with equivalent functional characteristics, each one optimized for a different execution environment. A management framework automatically administers the available redundant variants and adapts the system to external changes. The framework uses runtime monitoring data to detect performance anomalies and significant variations in the application's execution environment. It automatically adapts the application so as to use the optimal component configuration under the current running conditions. An automatic clustering mechanism analyses monitoring data and infers information on the components' performance characteristics. System administrators use decision policies to state high-level performance goals and configure system management processes. A framework prototype has been implemented and tested for automatically managing a J2EE application. Obtained results prove the framework's capability to successfully manage a software system without human intervention. The management overhead induced during normal system execution and through management operations indicate the framework's feasibility

    TOWARDS GENERIC SYSTEM OBSERVATION MANAGEMENT

    Get PDF
    Едно от най-големите предизвикателства на информатиката е да създава правилно работещи компютърни системи. За да се гарантира коректността на една система, по време на дизайн могат де се прилагат формални методи за моделиране и валидация. Този подход е за съжаление труден и скъп за приложение при мнозинството компютърни системи. Алтернативният подход е да се наблюдава и анализира поведението на системата по време на изпълнение след нейното създаване. В този доклад представям научната си работа по въпроса за наблюдение на копютърните системи. Предлагам един общ поглед на три основни страни на проблема: как трябва да се наблюдават компютърните системи, как се използват наблюденията при недетерминистични системи и как се работи по отворен, гъвкав и възпроизводим начин с наблюдения.One of the biggest challenges in computer science is to produce correct computer systems. One way of ensuring system correction is to use formal techniques to validate the system during its design. This approach is compulsory for critical systems but difficult and expensive for most computer systems. The alternative consists in observing and analyzing systems' behavior during execution. In this thesis, I present my research on system observation. I describe my contributions on generic observation mechanisms, on the use of observations for debugging nondeterministic systems and on the definition of an open, flexible and reproducible management of observations.Un des plus grands défis de l'informatique est de produire des systèmes corrects. Une manière d'assurer la correction des systèmes est d'utiliser des méthodes formelles de modélisation et de validation.Obligatoire dans le domaine des systèmes critiques, cette approche est difficile et coûteuse à mettre en place dans la plupart des systèmes informatiques.L'alternative est de vérifier le comportement des systèmes déjà développés en observant et analysant leur comportement à l'exécution.Ce mémoire présente mes contributions autour de l'observation des systèmes. Il discute de la définition de mécanismes génériques d'observation, de l'exploitation des observations pour le débogage de systèmes non déterministes et de la gestion ouverte, flexible et reproductible d'observations

    Proceedings of the 5th International Workshop on Reconfigurable Communication-centric Systems on Chip 2010 - ReCoSoC\u2710 - May 17-19, 2010 Karlsruhe, Germany. (KIT Scientific Reports ; 7551)

    Get PDF
    ReCoSoC is intended to be a periodic annual meeting to expose and discuss gathered expertise as well as state of the art research around SoC related topics through plenary invited papers and posters. The workshop aims to provide a prospective view of tomorrow\u27s challenges in the multibillion transistor era, taking into account the emerging techniques and architectures exploring the synergy between flexible on-chip communication and system reconfigurability

    Just-in-time Hardware generation for abstracted reconfigurable computing

    Get PDF
    This thesis addresses the use of reconfigurable hardware in computing platforms, in order to harness the performance benefits of dedicated hardware whilst maintaining the flexibility associated with software. Although the reconfigurable computing concept is not new, the low level nature of the supporting tools normally used, together with the consequent limited level of abstraction and resultant lack of backwards compatibility, has prevented the widespread adoption of this technology. In addition, bandwidth and architectural limitations, have seriously constrained the potential improvements in performance. A review of existing approaches and tools flows is conducted to highlight the current problems being faced in this field. The objective of the work presented in this thesis is to introduce a radically new approach to reconfigurable computing tool flows. The runtime based tool flow introduces complete abstraction between the application developer and the underlying hardware. This new technique eliminates the ease of use and backwards compatibility issues that have plagued the reconfigurable computing concept, and could pave the way for viable mainstream reconfigurable computing platforms. An easy to use, cycle accurate behavioural modelling system is also presented, which was used extensively during the early exploration of new concepts and architectures. Some performance improvements produced by the new reconfigurable computing tool flow, when applied to both a MIPS based embedded platform, and the Cray XDl, are also presented. These results are then analyzed and the hardware and software factors affecting the performance increases that were obtained are discussed, together with potential techniques that could be used to further increase the performance of the system. Lastly a heterogenous computing concept is proposed, in which, a computer system, containing multiple types of computational resource is envisaged, each having their own strengths and weaknesses (e.g. DSPs, CPUs, FPGAs). A revolutionary new method of fully exploiting the potential of such a system, whilst maintaining scalability, backwards compatibility, and ease of use is also presented

    Doctor of Philosophy

    Get PDF
    dissertationStochastic methods, dense free-form mapping, atlas construction, and total variation are examples of advanced image processing techniques which are robust but computationally demanding. These algorithms often require a large amount of computational power as well as massive memory bandwidth. These requirements used to be ful lled only by supercomputers. The development of heterogeneous parallel subsystems and computation-specialized devices such as Graphic Processing Units (GPUs) has brought the requisite power to commodity hardware, opening up opportunities for scientists to experiment and evaluate the in uence of these techniques on their research and practical applications. However, harnessing the processing power from modern hardware is challenging. The di fferences between multicore parallel processing systems and conventional models are signi ficant, often requiring algorithms and data structures to be redesigned signi ficantly for efficiency. It also demands in-depth knowledge about modern hardware architectures to optimize these implementations, sometimes on a per-architecture basis. The goal of this dissertation is to introduce a solution for this problem based on a 3D image processing framework, using high performance APIs at the core level to utilize parallel processing power of the GPUs. The design of the framework facilitates an efficient application development process, which does not require scientists to have extensive knowledge about GPU systems, and encourages them to harness this power to solve their computationally challenging problems. To present the development of this framework, four main problems are described, and the solutions are discussed and evaluated: (1) essential components of a general 3D image processing library: data structures and algorithms, as well as how to implement these building blocks on the GPU architecture for optimal performance; (2) an implementation of unbiased atlas construction algorithms|an illustration of how to solve a highly complex and computationally expensive algorithm using this framework; (3) an extension of the framework to account for geometry descriptors to solve registration challenges with large scale shape changes and high intensity-contrast di fferences; and (4) an out-of-core streaming model, which enables developers to implement multi-image processing techniques on commodity hardware
    corecore