370 research outputs found

    Graphical user interface tools

    Get PDF

    Cluster-based interactive volume rendering with Simian

    Get PDF
    technical reportCommodity-based computer clusters offer a cost-effective alternative to traditional largescale, tightly coupled computers as a means to provide high-performance computational and visualization services. The Center for the Simulation of Accidental Fires and Explosions (C-SAFE) at the University of Utah employs such a cluster, and we have begun to experiment with cluster-based visualization services. In particular, we seek to develop an interactive volume rendering tool for navigating and visualizing large-scale scientific datasets. Using Simian, an OpenGL volume renderer, we examine two approaches to cluster-based interactive volume rendering: (1) a ?cluster-aware? version of the application that makes explicit use of remote nodes through a message-passing interface, and (2) the unmodified application running atop the Chromium clustered rendering framework. This paper provides a detailed comparison of the two approaches by carefully considering the key issues that arise when parallelizing Simian. These issues include the richness of user interaction; the distribution of volumetric datasets and proxy geometry; and the degree of interactivity provided by the image rendering and compositing schemes. The results of each approach when visualizing two large-scale C-SAFE datasets are given, and we discuss the relative advantages and disadvantages that were considered when developing our cluster-based interactive volume rendering application

    Late-bound code generation

    Get PDF
    Each time a function or method is invoked during the execution of a program, a stream of instructions is issued to some underlying hardware platform. But exactly what underlying hardware, and which instructions, is usually left implicit. However in certain situations it becomes important to control these decisions. For example, particular problems can only be solved in real-time when scheduled on specialised accelerators, such as graphics coprocessors or computing clusters. We introduce a novel operator for hygienically reifying the behaviour of a runtime function instance as a syntactic fragment, in a language which may in general differ from the source function definition. Translation and optimisation are performed by recursively invoked, dynamically dispatched code generators. Side-effecting operations are permitted, and their ordering is preserved. We compare our operator with other techniques for pragmatic control, observing that: the use of our operator supports lifting arbitrary mutable objects, and neither requires rewriting sections of the source program in a multi-level language, nor interferes with the interface to individual software components. Due to its lack of interference at the abstraction level at which software is composed, we believe that our approach poses a significantly lower barrier to practical adoption than current methods. The practical efficacy of our operator is demonstrated by using it to offload the user interface rendering of a smartphone application to an FPGA coprocessor, including both statically and procedurally defined user interface components. The generated pipeline is an application-specific, statically scheduled processor-per-primitive rendering pipeline, suitable for place-and-route style optimisation. To demonstrate the compatibility of our operator with existing languages, we show how it may be defined within the Python programming language. We introduce a transformation for weakening mutable to immutable named bindings, termed let-weakening, to solve the problem of propagating information pertaining to named variables between modular code generating units.Open Acces

    Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI

    Get PDF
    Over the past few years, several applications have been extensively exploiting the advantages of deep learning, in particular when using convolutional neural networks (CNNs). The intrinsic flexibility of such models makes them widely adopted in a variety of practical applications, from medical to industrial. In this latter scenario, however, using consumer Personal Computer (PC) hardware is not always suitable for the potential harsh conditions of the working environment and the strict timing that industrial applications typically have. Therefore, the design of custom FPGA (Field Programmable Gate Array) solutions for network inference is gaining massive attention from researchers and companies as well. In this paper, we propose a family of network architectures composed of three kinds of custom layers working with integer arithmetic with a customizable precision (down to just two bits). Such layers are designed to be effectively trained on classical GPUs (Graphics Processing Units) and then synthesized to FPGA hardware for real-time inference. The idea is to provide a trainable quantization layer, called Requantizer, acting both as a non-linear activation for neurons and a value rescaler to match the desired bit precision. This way, the training is not only quantization-aware, but also capable of estimating the optimal scaling coefficients to accommodate both the non-linear nature of the activations and the constraints imposed by the limited precision. In the experimental section, we test the performance of this kind of model while working both on classical PC hardware and a case-study implementation of a signal peak detection device running on a real FPGA. We employ TensorFlow Lite for training and comparison, and use Xilinx FPGAs and Vivado for synthesis and implementation. The results show an accuracy of the quantized networks close to the floating point version, without the need for representative data for calibration as in other approaches, and performance that is better than dedicated peak detection algorithms. The FPGA implementation is able to run in real time at a rate of four gigapixels per second with moderate hardware resources, while achieving a sustained efficiency of 0.5 TOPS/W (tera operations per second per watt), in line with custom integrated hardware accelerators

    The Comparison of three 3D graphics raster processors and the design of another

    Get PDF
    There are a number of 3D graphics accelerator architectures on the market today. One of the largest issues concerning the design of a 3D accelerator is that of affordability for the home user while still delivering good performance. Three such architectures were analyzed: the Heresy architecture defined by Chiueh [2], the Talisman architecture defined by Torborg [7], and the Tayra architecture\u27s specification by White [9]. Portions of these three architectures were used to create a new architecture taking advantage of as many of their features as possible. The advantage of chunking will be analyzed, along with the advantages of a single cycle z-buffering algorithm. It was found that Fast Phong Shading is not suitable for implementation in this pipeline, and that the clipping algorithm should be eliminated in favor of a scissoring algorithm

    TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference

    Full text link
    TensorDash is a hardware level technique for enabling data-parallel MAC units to take advantage of sparsity in their input operand streams. When used to compose a hardware accelerator for deep learning, TensorDash can speedup the training process while also increasing energy efficiency. TensorDash combines a low-cost, sparse input operand interconnect comprising an 8-input multiplexer per multiplier input, with an area-efficient hardware scheduler. While the interconnect allows a very limited set of movements per operand, the scheduler can effectively extract sparsity when it is present in the activations, weights or gradients of neural networks. Over a wide set of models covering various applications, TensorDash accelerates the training process by 1.95Ă—1.95{\times} while being 1.89Ă—1.89\times more energy-efficient, 1.6Ă—1.6\times more energy efficient when taking on-chip and off-chip memory accesses into account. While TensorDash works with any datatype, we demonstrate it with both single-precision floating-point units and bfloat16

    Mental vision:a computer graphics platform for virtual reality, science and education

    Get PDF
    Despite the wide amount of computer graphics frameworks and solutions available for virtual reality, it is still difficult to find a perfect one fitting at the same time the many constraints of research and educational contexts. Advanced functionalities and user-friendliness, rendering speed and portability, or scalability and image quality are opposite characteristics rarely found into a same approach. Furthermore, fruition of virtual reality specific devices like CAVEs or wearable systems is limited by their costs and accessibility, being most of these innovations reserved to institutions and specialists able to afford and manage them through strong background knowledge in programming. Finally, computer graphics and virtual reality are a complex and difficult matter to learn, due to the heterogeneity of notions a developer needs to practice with before attempting to implement a full virtual environment. In this thesis we describe our contributions to these topics, assembled in what we called the Mental Vision platform. Mental Vision is a framework composed of three main entities. First, a teaching/research oriented graphics engine, simplifying access to 2D/3D real-time rendering on mobile devices, personal computers and CAVE systems. Second, a series of pedagogical modules to introduce and practice computer graphics and virtual reality techniques. Third, two advanced VR systems: a wearable, lightweight and handsfree mixed reality setup, and a four sides CAVE designed through off the shelf hardware. In this dissertation we explain our conceptual, architectural and technical approach, pointing out how we managed to create a robust and coherent solution reducing complexity related to cross-platform and multi-device 3D rendering, and answering simultaneously to contradictory common needs of computer graphics and virtual reality for researchers and students. A series of case studies evaluates how Mental Vision concretely satisfies these needs and achieves its goals on in vitro benchmarks and in vivo scientific and educational projects
    • …
    corecore