Search CORE

25,138 research outputs found

CycleCounter: an Efficient and Accurate UltraSPARC III CPU Simulation Module

Author: Strazdins Peter
Publication venue
Publication date: 01/01/2005
Field of study

This paper presents a novel technique for cycle-accurate simulation of the Central Processing Unit (CPU) of a modern superscalar processor, the UltraSPARC III Cu processor. The technique is based on adding a module to an existing fetch-decode-execute style of CPU simulator, rather than the traditional method of fully implementing the CPU pipeline and microarchitecture. The main functions of the module are the simulation of instruction grouping, register interlocks and the store buffer, and has a simple table-driven implementation which permits easy modification for exploring microarchitectural variations. The technique results on a 15--30\% loss of simulation speed, instead of a 10

\times

or greater performance loss by fully implementing the detailed micro-architecture. The accuracy of the technique is validated against an actual UltraSPARC III Cu processor, and achieves high levels of accuracy in cases of interest

The Australian National University

Streaming Ray Tracer on GPU

Author: Dvořák Jakub
Publication venue: Vysoké učení technické v Brně. Fakulta informačních technologií
Publication date: 01/01/2008
Field of study

Současné GPU je možné snadno použít jako vysoce výkonné stream procesory a představují tak lákovou platformu pro implementaci raytracingu. V první části práce stručně přibližuji základy raytracingu, programovatelnou pipeline moderních GPU a možnosti jejího využití. V druhé části popisuji algoritmy využité pro implementaci jednoduchého raytraceru a rozebírám experimenty s ním provedené.Current consumer GPUs can be used as high performance stream processors and are a tempting platform to be used to implement raytracing. In this paper I briefly present raytracing principles and methods used to accelerate it, modern GPUs programmable pipeline and examples of its use. I describe stream processing in general and available interfaces enabling the usage of GPU as stream processor. Then I present my GPU raytracer implementation, used algorithms and experiments I have made.

Digital library of Brno University of Technology

National Repository of Grey Literature

Inviwo -- A Visualization System with Usage Abstraction Levels

Author: Englund Rickard
Falk Martin
Hotz Ingrid
Jönsson Daniel
Kottravel Sathish
Ropinski Timo
Steneteg Peter
Sundén Erik
Ynnerman Anders
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2019
Field of study

The complexity of today's visualization applications demands specific visualization systems tailored for the development of these applications. Frequently, such systems utilize levels of abstraction to improve the application development process, for instance by providing a data flow network editor. Unfortunately, these abstractions result in several issues, which need to be circumvented through an abstraction-centered system design. Often, a high level of abstraction hides low level details, which makes it difficult to directly access the underlying computing platform, which would be important to achieve an optimal performance. Therefore, we propose a layer structure developed for modern and sustainable visualization systems allowing developers to interact with all contained abstraction levels. We refer to this interaction capabilities as usage abstraction levels, since we target application developers with various levels of experience. We formulate the requirements for such a system, derive the desired architecture, and present how the concepts have been exemplary realized within the Inviwo visualization system. Furthermore, we address several specific challenges that arise during the realization of such a layered architecture, such as communication between different computing platforms, performance centered encapsulation, as well as layer-independent development by supporting cross layer documentation and debugging capabilities

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

A High Performance Fuzzy Logic Architecture for UAV Decision Making

Author: Campbell Duncan
Lees Michael
Narayan Pritesh
Walker Rodney
Wu Paul
Publication venue: 'ACTA Press'
Publication date: 01/01/2006
Field of study

The majority of Unmanned Aerial Vehicles (UAVs) in operation today are not truly autonomous, but are instead reliant on a remote human pilot. A high degree of autonomy can provide many advantages in terms of cost, operational resources and safety. However, one of the challenges involved in achieving autonomy is that of replicating the reasoning and decision making capabilities of a human pilot. One candidate method for providing this decision making capability is fuzzy logic. In this role, the fuzzy system must satisfy real-time constraints, process large quantities of data and relate to large knowledge bases. Consequently, there is a need for a generic, high performance fuzzy computation platform for UAV applications. Based on Lees’ [1] original work, a high performance fuzzy processing architecture, implemented in Field Programmable Gate Arrays (FPGAs), has been developed and is shown to outclass the performance of existing fuzzy processors

Queensland University of Technology ePrints Archive

Runtime Optimizations for Prediction with Tree-Based Models

Author: Asadi Nima
de Vries Arjen P.
Lin Jimmy
Publication venue
Publication date: 01/01/2013
Field of study

Tree-based models have proven to be an effective solution for web ranking as well as other problems in diverse domains. This paper focuses on optimizing the runtime performance of applying such models to make predictions, given an already-trained model. Although exceedingly simple conceptually, most implementations of tree-based models do not efficiently utilize modern superscalar processor architectures. By laying out data structures in memory in a more cache-conscious fashion, removing branches from the execution flow using a technique called predication, and micro-batching predictions using a technique called vectorization, we are able to better exploit modern processor architectures and significantly improve the speed of tree-based models over hard-coded if-else blocks. Our work contributes to the exploration of architecture-conscious runtime implementations of machine learning algorithms

arXiv.org e-Print Archive

CWI's Institutional Repository