Search CORE

1,589 research outputs found

CSP design model and tool support

Author: Bakkers A.W.P.
Broenink J.F.
Hilderink G.H.
Vervoort W.A.
Volkerink H.J.
Publication venue: IOS Press
Publication date: 01/01/2000
Field of study

The CSP paradigm is known as a powerful concept for designing and analysing the architectural and behavioural parts of concurrent software. Although the theory of CSP is useful for mathematicians, the programming language occam has been derived from CSP that is useful for any engineering practice. Nowadays, the concept of occam/CSP can be used for almost every object-oriented programming language. This paper describes a tree-based description model and prototype tool that elevates the use of occam/CSP concepts at the design level and performs code generation to Java, C, C++, and machine-readable CSP for the level of implementation. The tree-based description model can be used to browse through the generated source code. The tool is a kind of browser that is able to assist modern workbenches (like Borland Builder, Microsoft Visual C++ and 20-SIM) with coding concurrency. The tool will guide the user through the design trajectory using support messages and several semantic and syntax rule checks. The machine-readable CSP can be read by FDR, enabling more advanced analysis on the design. Early experiments with the prototype tool show that the browser concept, combined with the tree-based description model, enables a user-friendly way to create a design using the CSP concepts and benefits. The design tool is available from our URL, http://www.rt.el.utwente.nl/javapp

CiteSeerX

University of Twente Research Information

Implementing microinstruction folding on the BlueJ Java optimized processor

Author: Gruian Flavius
Westmijze Mark
Publication venue: 'Lund University Library'
Publication date: 01/01/2008
Field of study

This paper present the work on implementing microinstruction folding on the BlueJEP. The BlueJep is a Java Embedded Processor written entirely in Bluespec SystemVerilog. The folding model is introduced and how it is implemented. The implementation was tested on a Xilinx FPGA and measurements were taken through simulation

Lund University Publications

A Co-Processor Approach for Efficient Java Execution in Embedded Systems

Author: Säntti Tero
Publication venue: Turku Centre for Computer Science
Publication date: 10/11/2008
Field of study

This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

UTUPub

Reification: A Process to Configure Java Realtime Processors

Author: Heath John Huddleston
Publication venue: The Aquila Digital Community
Publication date: 01/12/2012
Field of study

Real-time systems require stringent requirements both on the processor and the software application. The primary concern is speed and the predictability of execution times. In all real-time applications the developer must identify and calculate the worst case execution times (WCET) of their software. In almost all cases the processor design complexity impacts the analysis when calculating the WCET. Design features which impact this analysis include cache and instruction pipelining. With both cache and pipelining the time taken for a particular instruction can vary depending on cache and pipeline contents. When calculating the WCET the developer must ignore the speed advantages from these enhancements and use the normal instruction timings. This investigation is about a Java processor targeted to run within an FPGA environment (Java soft chip) supporting Java real-time applications. The investigation focuses on a simple processor design that allows simple analysis of WCET. The processor design has no cache and no instruction pipeline enhancements yet achieves higher performance than existing designs with these enhancements. The investigation centers on a process that translates Java byte codes and folds these translated codes into a modified Harvard Micro Controller (HMC). The modifications include better alignment with the application code and take advantage of the FPGA’s parallel capability. A prototyped ontology is used where the top level categories defined by Sowa are expanded to support the process. The proposed HMC and process are used to produce investigation results. Performance testing using the Sobel edge detection algorithm is used to compare the results with the only Java processor claiming real-time abilities

Aquila Digital Community

Bio-inspired call-stack reconstruction for performance analysis

Author: Giménez Lucas Judit
González Juan
Labarta Mancho Jesús José
Llort German
Servat Harald
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The correlation of performance bottlenecks and their associated source code has become a cornerstone of performance analysis. It allows understanding why the efficiency of an application falls behind the computer's peak performance and enabling optimizations on the code ultimately. To this end, performance analysis tools collect the processor call-stack and then combine this information with measurements to allow the analyst comprehend the application behavior. Some tools modify the call-stack during run-time to diminish the collection expense but at the cost of resulting in non-portable solutions. In this paper, we present a novel portable approach to associate performance issues with their source code counterpart. To address it, we capture a reduced segment of the call-stack (up to three levels) and then process the segments using an algorithm inspired by multi-sequence alignment techniques. The results of our approach are easily mapped to detailed performance views, enabling the analyst to unveil the application behavior and its corresponding region of code. To demonstrate the usefulness of our approach, we have applied the algorithm to several first-time seen in-production applications to describe them finely, and optimize them by using tiny modifications based on the analyses.We thankfully acknowledge Mathis Bode for giving us access to the Arts CF binaries, and Miguel Castrillo and Kim Serradell for their valuable insight regarding Nemo. We would like to thank Forschungszentrum Jülich for the computation time on their Blue Gene/Q system. This research has been partially funded by the CICYT under contracts No. TIN2012-34557 and TIN2015-65316-P.Peer ReviewedPostprint (author's final draft

Crossref

UPCommons. Portal del coneixement obert de la UPC

Juelich Shared Electronic Resources

Creating portable and efficient packet processing applications

Author: A Korobeynikov
AV Aho
B Wun
CW Fraser
D Bernstein
EA Lee
EJ Johnson
Fulvio Risso
G Memik
J Carlstrom
J Wagner
JA Fisher
JL Hennessy
L Ciminiera
L George
M Baldi
M Baldi
MK Chen
N Shah
Olivier Morandi
P Briggs
Paolo Veglia
Pierluigi Rolando
R Cytron
R Ennals
R Morris
Silvio Valenti
SS Muchnick
T Lindholm
Z Budimlic
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Recommended from our members

JavaFlow : a Java DataFlow Machine

Author: Ascott Robert John
Publication venue
Publication date: 10/02/2015
Field of study

textThe JavaFlow, a Java DataFlow Machine is a machine design concept implementing a Java Virtual Machine aimed at addressing technology roadmap issues along with the ability to effectively utilize and manage very large numbers of processing cores. Specific design challenges addressed include: design complexity through a common set of repeatable structures; low power by featuring unused circuits and ability to power off sections of the chip; clock propagation and wire limits by using locality to bring data to processing elements and a Globally Asynchronous Locally Synchronous (GALS) design; and reliability by allowing portions of the design to be bypassed in case of failures. A Data Flow Architecture is used with multiple heterogeneous networks to connect processing elements capable of executing a single Java ByteCode instruction. Whole methods are cached in this DataFlow fabric, and the networks plus distributed intelligence are used for their management and execution. A mesh network is used for the DataFlow transfers; two ordered networks are used for management and control flow mapping; and multiple high speed rings are used to access the storage subsystem and a controlling General Purpose Processor (GPP). Analysis of benchmarks demonstrates the potential for this design concept. The design process was initiated by analyzing SPEC JVM benchmarks which identified a small number methods contributing to a significant percentage of the overall ByteCode operations. Additional analysis established static instruction mixes to prioritize the types of processing elements used in the DataFlow Fabric. The overall objective of the machine is to provide multi-threading performance for Java Methods deployed to this DataFlow fabric. With advances in technology it is envisioned that from 1,000 to 10,000 cores/instructions could be deployed and managed using this structure. This size of DataFlow fabric would allow all the key methods from the SPEC benchmarks to be resident. A baseline configuration is defined with a compressed dataflow structure and then compared to multiple configurations of instruction assignments and clock relationships. Using a series of methods from the SPEC benchmark running independently, IPC (Instructions per Cycle) performance of the sparsely populated heterogeneous structure is 40% of the baseline. The average ratio of instructions to required nodes is 3.5. Innovative solutions to the loading and management of Java methods along with the translation from control flow to DataFlow structure are demonstrated.Electrical and Computer Engineerin

Texas ScholarWorks

Acceleration of stereo-matching on multi-core CPU and GPU

Author: Cockshott Paul
Oehler Susanne
Tian Xu
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

CiteSeerX

Enlighten

An asynchronous java processor for smart card.

Author
Publication venue
Publication date: 01/01/2003
Field of study

Yu Chun-Pong.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 60-61).Abstracts in English and Chinese.Abstract of this thesis entitled: --- p.i摘要 --- p.iiiAcknowledgements --- p.ivTable of contents --- p.vList of Tables --- p.viList of Figures --- p.viiChapter Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Asynchronous design --- p.1Chapter 1.2 --- Java processor for contactless smart card [3] --- p.2Chapter 1.3 --- Motivation --- p.3Chapter Chapter 2 --- Asynchronous circuit design techniques --- p.5Chapter 2.1 --- Overview --- p.5Chapter 2.2 --- Handshake protocol --- p.5Chapter 2.3 --- Asynchronous pipeline --- p.7Chapter 2.4 --- Asynchronous control elements --- p.9Chapter Chapter 3 --- Asynchronous Java Processor --- p.15Chapter 3.1 --- Instruction Set --- p.15Chapter 3.2 --- Architecture of the java processor --- p.17Chapter 3.3 --- Basic building blocks of the java processor --- p.22Chapter 3.4 --- Token flow --- p.32Chapter Chapter 4 --- Results and Discussion --- p.37Chapter 4.1 --- Simulation Results of test programs --- p.37Chapter 4.2 --- Experimental result --- p.41Chapter 4.3 --- Future work --- p.42Chapter Chapter 5 --- Conclusion --- p.45Appendix --- p.47Chip micrograph for the java processor core --- p.47Pin assignment of the java processor --- p.48Schematic of the java processor --- p.52Schematic of the decoder --- p.54Schematic of the Stage2 of the java processor --- p.55Schematic of the stack --- p.56Schematic of the block of the local variables --- p.57Schematic of the 16-bit self-timed adder --- p.58The schematic and the layout of the memory cell --- p.59Reference --- p.6

CUHK Digital Repository