6,598 research outputs found
Inviwo -- A Visualization System with Usage Abstraction Levels
The complexity of today's visualization applications demands specific
visualization systems tailored for the development of these applications.
Frequently, such systems utilize levels of abstraction to improve the
application development process, for instance by providing a data flow network
editor. Unfortunately, these abstractions result in several issues, which need
to be circumvented through an abstraction-centered system design. Often, a high
level of abstraction hides low level details, which makes it difficult to
directly access the underlying computing platform, which would be important to
achieve an optimal performance. Therefore, we propose a layer structure
developed for modern and sustainable visualization systems allowing developers
to interact with all contained abstraction levels. We refer to this interaction
capabilities as usage abstraction levels, since we target application
developers with various levels of experience. We formulate the requirements for
such a system, derive the desired architecture, and present how the concepts
have been exemplary realized within the Inviwo visualization system.
Furthermore, we address several specific challenges that arise during the
realization of such a layered architecture, such as communication between
different computing platforms, performance centered encapsulation, as well as
layer-independent development by supporting cross layer documentation and
debugging capabilities
Recommended from our members
A RISC-V Vector Processor With Simultaneous-Switching Switched-Capacitor DC-DC Converters in 28 nm FDSOI
This work demonstrates a RISC-V vector microprocessor implemented in 28 nm FDSOI with fully integrated simultaneous-switching switched-capacitor DC-DC (SC DC-DC) converters and adaptive clocking that generates four on-chip voltages between 0.45 and 1 V using only 1.0 V core and 1.8 V IO voltage inputs. The converters achieve high efficiency at the system level by switching simultaneously to avoid charge-sharing losses and by using an adaptive clock to maximize performance for the resulting voltage ripple. Details about the implementation of the DC-DC switches, DC-DC controller, and adaptive clock are provided, and the sources of conversion loss are analyzed based on measured results. This system pushes the capabilities of dynamic voltage scaling by enabling fast transitions (20 ns), simple packaging (no off-chip passives), low area overhead (16%), high conversion efficiency (80%-86%), and high energy efficiency (26.2 DP GFLOPS/W) for mobile devices
Fast signal processing
Zvětšující se množství dat v moderním zpracování obrazu vyžaduje nový postupy v psaní algoritmů. Největší překážkou pro úspěšné zrychlení algoritmu je paralelizace a následná optimalizace. Programy jako CUDA a OpenCL s modifikovaným programovacím jazykem a rozhraním pomáhají s tímto problémem a otevírají paralelní zpracování širšímu okruhu lidí. V této práci zabývám základy zpracování obrazu a tomu jak paralelizace algoritmů může urychlit zpracování obrazu.An increasing amount of data in modern image processing requires a new approach in algorithms. The biggest obstacle for successful speed up of an algorithm is parallelization and subsequent optimization. Architectures like CUDA and OpenCL with modified programing languages and interfaces help to overcome this obstacle and bring parallel computing to a broader audience. In this paper I take a look at basics of image processing and how parallelization can speed up the algorithms in image processing.
Graphics-processing-unit-based acceleration of electromagnetic transients simulation.
This paper presents a novel parallelization approach to speedup EMT simulation, using GPU-based computing. This paper extends earlier published works in the area, by exploiting additional parallelism to accelerate EMT simulation. A 2D-parallel matrix-vector multiplication is used that is faster than previous 1D-methods. Also this paper implements a simpler GPU-specific sparsity technique to further speed up the simulations as available CPU-based sparse techniques are not suitable for GPUs. Additionally, as an extension to previous works, this paper demonstrates modelling of a power electronic subsystem. A low granularity system, i.e. one with a large cluster of busses connected to others with a few transmission lines is considered, as is also a high granularity where a small cluster of busses is connected to other clusters thereby requiring more interconnecting transmission lines. Computation times for GPU-based computing are compared with the computation times for sequential implementations on the CPU. The paper shows two surprising differences of GPU simulation in comparison with CPU simulation. Firstly, the inclusion of sparsity only makes minor reductions in the GPU-based simulation time. Secondly excessive granularity, even though it appears to increase the number of parallel computable subsystems, significantly slows down the GPU-based simulation
Techniques of Energy-Efficient VLSI Chip Design for High-Performance Computing
How to implement quality computing with the limited power budget is the key factor to move very large scale integration (VLSI) chip design forward. This work introduces various techniques of low power VLSI design used for state of art computing. From the viewpoint of power supply, conventional in-chip voltage regulators based on analog blocks bring the large overhead of both power and area to computational chips. Motivated by this, a digital based switchable pin method to dynamically regulate power at low circuit cost has been proposed to make computing to be executed with a stable voltage supply. For one of the widely used and time consuming arithmetic units, multiplier, its operation in logarithmic domain shows an advantageous performance compared to that in binary domain considering computation latency, power and area. However, the introduced conversion error reduces the reliability of the following computation (e.g. multiplication and division.). In this work, a fast calibration method suppressing the conversion error and its VLSI implementation are proposed. The proposed logarithmic converter can be supplied by dc power to achieve fast conversion and clocked power to reduce the power dissipated during conversion. Going out of traditional computation methods and widely used static logic, neuron-like cell is also studied in this work. Using multiple input floating gate (MIFG) metal-oxide semiconductor field-effect transistor (MOSFET) based logic, a 32-bit, 16-operation arithmetic logic unit (ALU) with zipped decoding and a feedback loop is designed. The proposed ALU can reduce the switching power and has a strong driven-in capability due to coupling capacitors compared to static logic based ALU. Besides, recent neural computations bring serious challenges to digital VLSI implementation due to overload matrix multiplications and non-linear functions. An analog VLSI design which is compatible to external digital environment is proposed for the network of long short-term memory (LSTM). The entire analog based network computes much faster and has higher energy efficiency than the digital one
Application of technology developed for flight simulation at NASA. Langley Research Center
In order to meet the stringent time-critical requirements for real-time man-in-the-loop flight simulation, computer processing operations including mathematical model computation and data input/output to the simulators must be deterministic and be completed in as short a time as possible. Personnel at NASA's Langley Research Center are currently developing the use of supercomputers for simulation mathematical model computation for real-time simulation. This, coupled with the use of an open systems software architecture, will advance the state-of-the-art in real-time flight simulation
VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing
The hardware implementation of deep neural networks (DNNs) has recently
received tremendous attention: many applications in fact require high-speed
operations that suit a hardware implementation. However, numerous elements and
complex interconnections are usually required, leading to a large area
occupation and copious power consumption. Stochastic computing has shown
promising results for low-power area-efficient hardware implementations, even
though existing stochastic algorithms require long streams that cause long
latencies. In this paper, we propose an integer form of stochastic computation
and introduce some elementary circuits. We then propose an efficient
implementation of a DNN based on integral stochastic computing. The proposed
architecture has been implemented on a Virtex7 FPGA, resulting in 45% and 62%
average reductions in area and latency compared to the best reported
architecture in literature. We also synthesize the circuits in a 65 nm CMOS
technology and we show that the proposed integral stochastic architecture
results in up to 21% reduction in energy consumption compared to the binary
radix implementation at the same misclassification rate. Due to fault-tolerant
nature of stochastic architectures, we also consider a quasi-synchronous
implementation which yields 33% reduction in energy consumption w.r.t. the
binary radix implementation without any compromise on performance.Comment: 11 pages, 12 figure
Design and implementation of DA FIR filter for bio-inspired computing architecture
This paper elucidates the system construct of DA-FIR filter optimized for design of distributed arithmetic (DA) finite impulse response (FIR) filter and is based on architecture with tightly coupled co-processor based data processing units. With a series of look-up-table (LUT) accesses in order to emulate multiply and accumulate operations the constructed DA based FIR filter is implemented on FPGA. The very high speed integrated circuit hardware description language (VHDL) is used implement the proposed filter and the design is verified using simulation. This paper discusses two optimization algorithms and resulting optimizations are incorporated into LUT layer and architecture extractions. The proposed method offers an optimized design in the form of offers average miminimizations of the number of LUT, reduction in populated slices and gate minimization for DA-finite impulse response filter. This research paves a direction towards development of bio inspired computing architectures developed without logically intensive operations, obtaining the desired specifications with respect to performance, timing, and reliability
A Survey of Techniques For Improving Energy Efficiency in Embedded Computing Systems
Recent technological advances have greatly improved the performance and
features of embedded systems. With the number of just mobile devices now
reaching nearly equal to the population of earth, embedded systems have truly
become ubiquitous. These trends, however, have also made the task of managing
their power consumption extremely challenging. In recent years, several
techniques have been proposed to address this issue. In this paper, we survey
the techniques for managing power consumption of embedded systems. We discuss
the need of power management and provide a classification of the techniques on
several important parameters to highlight their similarities and differences.
This paper is intended to help the researchers and application-developers in
gaining insights into the working of power management techniques and designing
even more efficient high-performance embedded systems of tomorrow
- …