Search CORE

6,598 research outputs found

Inviwo -- A Visualization System with Usage Abstraction Levels

Author: Englund Rickard
Falk Martin
Hotz Ingrid
Jönsson Daniel
Kottravel Sathish
Ropinski Timo
Steneteg Peter
Sundén Erik
Ynnerman Anders
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 10/10/2019
Field of study

The complexity of today's visualization applications demands specific visualization systems tailored for the development of these applications. Frequently, such systems utilize levels of abstraction to improve the application development process, for instance by providing a data flow network editor. Unfortunately, these abstractions result in several issues, which need to be circumvented through an abstraction-centered system design. Often, a high level of abstraction hides low level details, which makes it difficult to directly access the underlying computing platform, which would be important to achieve an optimal performance. Therefore, we propose a layer structure developed for modern and sustainable visualization systems allowing developers to interact with all contained abstraction levels. We refer to this interaction capabilities as usage abstraction levels, since we target application developers with various levels of experience. We formulate the requirements for such a system, derive the desired architecture, and present how the concepts have been exemplary realized within the Inviwo visualization system. Furthermore, we address several specific challenges that arise during the realization of such a layered architecture, such as communication between different computing platforms, performance centered encapsulation, as well as layer-independent development by supporting cross layer documentation and debugging capabilities

arXiv.org e-Print Archive

Publikationer från Linköpings universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Recommended from our members

A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI

Author: Alon E
Asanović K
Avizienis R
Bailey S
Blagojević M
Chen PH
Chiu PF
Flatresse P
Jevtić R
Keller B
Kwak J
Le HP
Lee Y
Nikolić B
Puggelli A
Richards B
Sutardja N
Waterman A
Zimmer B
Publication venue: eScholarship, University of California
Publication date: 01/04/2016
Field of study

This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices

eScholarship - University of California

Fast signal processing

Author: Rychlý Ivo
Publication venue: Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií
Publication date: 01/01/2015
Field of study

Zvětšující se množství dat v moderním zpracování obrazu vyžaduje nový postupy v psaní algoritmů. Největší překážkou pro úspěšné zrychlení algoritmu je paralelizace a následná optimalizace. Programy jako CUDA a OpenCL s modifikovaným programovacím jazykem a rozhraním pomáhají s tímto problémem a otevírají paralelní zpracování širšímu okruhu lidí. V této práci zabývám základy zpracování obrazu a tomu jak paralelizace algoritmů může urychlit zpracování obrazu.An increasing amount of data in modern image processing requires a new approach in algorithms. The biggest obstacle for successful speed up of an algorithm is parallelization and subsequent optimization. Architectures like CUDA and OpenCL with modified programing languages and interfaces help to overcome this obstacle and bring parallel computing to a broader audience. In this paper I take a look at basics of image processing and how parallelization can speed up the algorithms in image processing.

Digital library of Brno University of Technology

National Repository of Grey Literature

Graphics-processing-unit-based acceleration of electromagnetic transients simulation.

Author: Debnath Jayanta K.
Fung Wai-Keung
Gole Aniruddha M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/10/2016
Field of study

This paper presents a novel parallelization approach to speedup EMT simulation, using GPU-based computing. This paper extends earlier published works in the area, by exploiting additional parallelism to accelerate EMT simulation. A 2D-parallel matrix-vector multiplication is used that is faster than previous 1D-methods. Also this paper implements a simpler GPU-specific sparsity technique to further speed up the simulations as available CPU-based sparse techniques are not suitable for GPUs. Additionally, as an extension to previous works, this paper demonstrates modelling of a power electronic subsystem. A low granularity system, i.e. one with a large cluster of busses connected to others with a few transmission lines is considered, as is also a high granularity where a small cluster of busses is connected to other clusters thereby requiring more interconnecting transmission lines. Computation times for GPU-based computing are compared with the computation times for sequential implementations on the CPU. The paper shows two surprising differences of GPU simulation in comparison with CPU simulation. Firstly, the inclusion of sparsity only makes minor reductions in the GPU-based simulation time. Secondly excessive granularity, even though it appears to increase the number of parallel computable subsystems, significantly slows down the GPU-based simulation

Open Access Institutional Repository at Robert Gordon University

Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing

Author: Zhao Zhou
Publication venue: LSU Digital Commons
Publication date: 13/09/2018
Field of study

How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one

Louisiana State University

Application of technology developed for flight simulation at NASA. Langley Research Center

Author: Cleveland Jeff I., II
Publication venue
Publication date
Field of study

In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations including mathematical model computation and data input/output to the simulators must be deterministic and be completed in as short a time as possible. Personnel at NASA's Langley Research Center are currently developing the use of supercomputers for simulation mathematical model computation for real-time simulation. This, coupled with the use of an open systems software architecture, will advance the state-of-the-art in real-time flight simulation

NASA Technical Reports Server

VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

Author: Ardakani Arash
Gross Warren J.
Hanyu Takahiro
Leduc-Primeau François
Onizawa Naoya
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

The hardware implementation of deep neural networks (DNNs) has recently received tremendous attention: many applications in fact require high-speed operations that suit a hardware implementation. However, numerous elements and complex interconnections are usually required, leading to a large area occupation and copious power consumption. Stochastic computing has shown promising results for low-power area-efficient hardware implementations, even though existing stochastic algorithms require long streams that cause long latencies. In this paper, we propose an integer form of stochastic computation and introduce some elementary circuits. We then propose an efficient implementation of a DNN based on integral stochastic computing. The proposed architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62% average reductions in area and latency compared to the best reported architecture in literature. We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate. Due to fault-tolerant nature of stochastic architectures, we also consider a quasi-synchronous implementation which yields 33% reduction in energy consumption w.r.t. the binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure

arXiv.org e-Print Archive

HAL-Université de Bretagne Occidentale

PolyPublie

Design and implementation of DA FIR filter for bio-inspired computing architecture

Author: Ahmed Mohammed Riyaz
Kounte Manjunath R.
Prashanth B. U. V.
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/04/2021
Field of study

This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability

ZENODO

Institute of Advanced Engineering and Science

A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems

Author: Mittal Sparsh
Publication venue
Publication date: 01/01/2014
Field of study

Recent technological advances have greatly improved the performance and features of embedded systems. With the number of just mobile devices now reaching nearly equal to the population of earth, embedded systems have truly become ubiquitous. These trends, however, have also made the task of managing their power consumption extremely challenging. In recent years, several techniques have been proposed to address this issue. In this paper, we survey the techniques for managing power consumption of embedded systems. We discuss the need of power management and provide a classification of the techniques on several important parameters to highlight their similarities and differences. This paper is intended to help the researchers and application-developers in gaining insights into the working of power management techniques and designing even more efficient high-performance embedded systems of tomorrow

arXiv.org e-Print Archive

Crossref