1,589 research outputs found

    CSP design model and tool support

    Get PDF
    The CSP paradigm is known as a powerful concept for designing and analysing the architectural and behavioural parts of concurrent software. Although the theory of CSP is useful for mathematicians, the programming language occam has been derived from CSP that is useful for any engineering practice. Nowadays, the concept of occam/CSP can be used for almost every object-oriented programming language. This paper describes a tree-based description model and prototype tool that elevates the use of occam/CSP concepts at the design level and performs code generation to Java, C, C++, and machine-readable CSP for the level of implementation. The tree-based description model can be used to browse through the generated source code. The tool is a kind of browser that is able to assist modern workbenches (like Borland Builder, Microsoft Visual C++ and 20-SIM) with coding concurrency. The tool will guide the user through the design trajectory using support messages and several semantic and syntax rule checks. The machine-readable CSP can be read by FDR, enabling more advanced analysis on the design. Early experiments with the prototype tool show that the browser concept, combined with the tree-based description model, enables a user-friendly way to create a design using the CSP concepts and benefits. The design tool is available from our URL, http://www.rt.el.utwente.nl/javapp

    Implementing microinstruction folding on the BlueJ Java optimized processor

    Get PDF
    This paper present the work on implementing microinstruction folding on the BlueJEP. The BlueJep is a Java Embedded Processor written entirely in Bluespec SystemVerilog. The folding model is introduced and how it is implemented. The implementation was tested on a Xilinx FPGA and measurements were taken through simulation

    A Co-Processor Approach for Efficient Java Execution in Embedded Systems

    Get PDF
    This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

    Reification: A Process to Configure Java Realtime Processors

    Get PDF
    Real-time systems require stringent requirements both on the processor and the software application. The primary concern is speed and the predictability of execution times. In all real-time applications the developer must identify and calculate the worst case execution times (WCET) of their software. In almost all cases the processor design complexity impacts the analysis when calculating the WCET. Design features which impact this analysis include cache and instruction pipelining. With both cache and pipelining the time taken for a particular instruction can vary depending on cache and pipeline contents. When calculating the WCET the developer must ignore the speed advantages from these enhancements and use the normal instruction timings. This investigation is about a Java processor targeted to run within an FPGA environment (Java soft chip) supporting Java real-time applications. The investigation focuses on a simple processor design that allows simple analysis of WCET. The processor design has no cache and no instruction pipeline enhancements yet achieves higher performance than existing designs with these enhancements. The investigation centers on a process that translates Java byte codes and folds these translated codes into a modified Harvard Micro Controller (HMC). The modifications include better alignment with the application code and take advantage of the FPGA’s parallel capability. A prototyped ontology is used where the top level categories defined by Sowa are expanded to support the process. The proposed HMC and process are used to produce investigation results. Performance testing using the Sobel edge detection algorithm is used to compare the results with the only Java processor claiming real-time abilities

    Bio-inspired call-stack reconstruction for performance analysis

    Get PDF
    The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.We thankfully acknowledge Mathis Bode for giving us access to the Arts CF binaries, and Miguel Castrillo and Kim Serradell for their valuable insight regarding Nemo. We would like to thank Forschungszentrum Jülich for the computation time on their Blue Gene/Q system. This research has been partially funded by the CICYT under contracts No. TIN2012-34557 and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

    Acceleration of stereo-matching on multi-core CPU and GPU

    Get PDF
    This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

    An asynchronous java processor for smart card.

    Get PDF
    Yu Chun-Pong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 60-61).Abstracts in English and Chinese.Abstract of this thesis entitled: --- p.i摘要 --- p.iiiAcknowledgements --- p.ivTable of contents --- p.vList of Tables --- p.viList of Figures --- p.viiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Asynchronous design --- p.1Chapter 1.2 --- Java processor for contactless smart card [3] --- p.2Chapter 1.3 --- Motivation --- p.3Chapter Chapter 2 --- Asynchronous circuit design techniques --- p.5Chapter 2.1 --- Overview --- p.5Chapter 2.2 --- Handshake protocol --- p.5Chapter 2.3 --- Asynchronous pipeline --- p.7Chapter 2.4 --- Asynchronous control elements --- p.9Chapter Chapter 3 --- Asynchronous Java Processor --- p.15Chapter 3.1 --- Instruction Set --- p.15Chapter 3.2 --- Architecture of the java processor --- p.17Chapter 3.3 --- Basic building blocks of the java processor --- p.22Chapter 3.4 --- Token flow --- p.32Chapter Chapter 4 --- Results and Discussion --- p.37Chapter 4.1 --- Simulation Results of test programs --- p.37Chapter 4.2 --- Experimental result --- p.41Chapter 4.3 --- Future work --- p.42Chapter Chapter 5 --- Conclusion --- p.45Appendix --- p.47Chip micrograph for the java processor core --- p.47Pin assignment of the java processor --- p.48Schematic of the java processor --- p.52Schematic of the decoder --- p.54Schematic of the Stage2 of the java processor --- p.55Schematic of the stack --- p.56Schematic of the block of the local variables --- p.57Schematic of the 16-bit self-timed adder --- p.58The schematic and the layout of the memory cell --- p.59Reference --- p.6
    corecore