80 research outputs found

    Hardware Accelerators for Animated Ray Tracing

    Get PDF
    Future graphics processors are likely to incorporate hardware accelerators for real-time ray tracing, in order to render increasingly complex lighting effects in interactive applications. However, ray tracing poses difficulties when drawing scenes with dynamic content, such as animated characters and objects. In dynamic scenes, the spatial datastructures used to accelerate ray tracing are invalidated on each animation frame, and need to be rapidly updated. Tree update is a complex subtask in its own right, and becomes highly expensive in complex scenes. Both ray tracing and tree update are highly memory-intensive tasks, and rendering systems are increasingly bandwidth-limited, so research on accelerator hardware has focused on architectural techniques to optimize away off-chip memory traffic. Dynamic scene support is further complicated by the recent introduction of compressed trees, which use low-precision numbers for storage and computation. Such compression reduces both the arithmetic and memory bandwidth cost of ray tracing, but adds to the complexity of tree update.This thesis proposes methods to cope with dynamic scenes in hardware-accelerated ray tracing, with focus on reducing traffic to external memory. Firstly, a hardware architecture is designed for linear bounding volume hierarchy construction, an algorithm which is a basic building block in most state-of-the-art software tree builders. The algorithm is rearranged into a streaming form which reduces traffic to one-third of software implementations of the same algorithm. Secondly, an algorithm is proposed for compressing bounding volume hierarchies in a streaming manner as they are output from a hardware builder, instead of performing compression as a postprocessing pass. As a result, with the proposed method, compression reduces the overall cost of tree update rather than increasing it. The last main contribution of this thesis is an evaluation of shallow bounding volume hierarchies, common in software ray tracing, for use in hardware pipelines. These are found to be more energy-efficient than binary hierarchies. The results in this thesis both confirm that dynamic scene support may become a bottleneck in real time ray tracing, and add to the state of the art on tree update in terms of energy-efficiency, as well as the complexity of scenes that can be handled in real time on resource-constrained platforms

    Hardware Accelerated Compression of LIDAR Data Using FPGA Devices

    Get PDF
    Abstract: Airborne Light Detection and Ranging (LIDAR) has become a mainstream technology for terrain data acquisition and mapping. High sampling density of LIDAR enables the acquisition of high details of the terrain, but on the other hand, it results in a vast amount of gathered data, which requires huge storage space as well as substantial processing effort. The data are usually stored in the LAS format which has become the de facto standard for LIDAR data storage and exchange. In the paper, a hardware accelerated compression of LIDAR data is presented. The compression and decompression of LIDAR data is performed by a dedicated FPGA-based circuit and interfaced to the computer via a PCI-E general bus. The hardware compressor consists of three modules: LIDAR data predictor, variable length coder, and arithmetic coder. Hardware compression is considerably faster than software compression, while it also alleviates the processor load

    GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra

    Full text link
    We propose GraphMineSuite (GMS): the first benchmarking suite for graph mining that facilitates evaluating and constructing high-performance graph mining algorithms. First, GMS comes with a benchmark specification based on extensive literature review, prescribing representative problems, algorithms, and datasets. Second, GMS offers a carefully designed software platform for seamless testing of different fine-grained elements of graph mining algorithms, such as graph representations or algorithm subroutines. The platform includes parallel implementations of more than 40 considered baselines, and it facilitates developing complex and fast mining algorithms. High modularity is possible by harnessing set algebra operations such as set intersection and difference, which enables breaking complex graph mining algorithms into simple building blocks that can be separately experimented with. GMS is supported with a broad concurrency analysis for portability in performance insights, and a novel performance metric to assess the throughput of graph mining algorithms, enabling more insightful evaluation. As use cases, we harness GMS to rapidly redesign and accelerate state-of-the-art baselines of core graph mining problems: degeneracy reordering (by up to >2x), maximal clique listing (by up to >9x), k-clique listing (by 1.1x), and subgraph isomorphism (by up to 2.5x), also obtaining better theoretical performance bounds

    Survey of FPGA applications in the period 2000 – 2015 (Technical Report)

    Get PDF
    Romoth J, Porrmann M, Rückert U. Survey of FPGA applications in the period 2000 – 2015 (Technical Report).; 2017.Since their introduction, FPGAs can be seen in more and more different fields of applications. The key advantage is the combination of software-like flexibility with the performance otherwise common to hardware. Nevertheless, every application field introduces special requirements to the used computational architecture. This paper provides an overview of the different topics FPGAs have been used for in the last 15 years of research and why they have been chosen over other processing units like e.g. CPUs

    Towards Predictive Rendering in Virtual Reality

    Get PDF
    The strive for generating predictive images, i.e., images representing radiometrically correct renditions of reality, has been a longstanding problem in computer graphics. The exactness of such images is extremely important for Virtual Reality applications like Virtual Prototyping, where users need to make decisions impacting large investments based on the simulated images. Unfortunately, generation of predictive imagery is still an unsolved problem due to manifold reasons, especially if real-time restrictions apply. First, existing scenes used for rendering are not modeled accurately enough to create predictive images. Second, even with huge computational efforts existing rendering algorithms are not able to produce radiometrically correct images. Third, current display devices need to convert rendered images into some low-dimensional color space, which prohibits display of radiometrically correct images. Overcoming these limitations is the focus of current state-of-the-art research. This thesis also contributes to this task. First, it briefly introduces the necessary background and identifies the steps required for real-time predictive image generation. Then, existing techniques targeting these steps are presented and their limitations are pointed out. To solve some of the remaining problems, novel techniques are proposed. They cover various steps in the predictive image generation process, ranging from accurate scene modeling over efficient data representation to high-quality, real-time rendering. A special focus of this thesis lays on real-time generation of predictive images using bidirectional texture functions (BTFs), i.e., very accurate representations for spatially varying surface materials. The techniques proposed by this thesis enable efficient handling of BTFs by compressing the huge amount of data contained in this material representation, applying them to geometric surfaces using texture and BTF synthesis techniques, and rendering BTF covered objects in real-time. Further approaches proposed in this thesis target inclusion of real-time global illumination effects or more efficient rendering using novel level-of-detail representations for geometric objects. Finally, this thesis assesses the rendering quality achievable with BTF materials, indicating a significant increase in realism but also confirming the remainder of problems to be solved to achieve truly predictive image generation

    Design and Evaluation of Compression, Classification and Localization Schemes for Various IoT Applications

    Get PDF
    Nowadays we are surrounded by a huge number of objects able to communicate, read information such as temperature, light or humidity, and infer new information through ex- changing data. These kinds of objects are not limited to high-tech devices, such as desktop PC, laptop, new generation mobile phone, i.e. smart phone, and others with high capabilities, but also include commonly used object, such as ID cards, driver license, clocks, etc. that can made smart by allowing them to communicate. Thus, the analog world of just a few years ago is becoming the a digital world of the Inter- net of Things (IoT), where the information from a single object can be retrieved from the Internet. The IoT paradigm opens several architectural challenges, including self-organization, self-managing, self-deployment of the smart objects, as well as the problem of how to minimize the usage of the limited resources of each device. The concept of IoT covers a lot of communication paradigms such as WiFi, Radio Frequency Identification (RFID), and Wireless Sensor Network (WSN). Each paradigm can be thought of as an IoT island where each device can communicate directly with other devices. The thesis is divided in sections in order to cover each problem mentioned above. The first step is to understand the possibility to infer new knowledge from the deployed device in a scenario. For this reason, the research is focused on the web semantic, web 3.0, to assign a semantic meaning to each thing inside the architecture. The sole semantic concept is unusable to infer new information from the data gathered; in fact, it is necessary to organize the data through a hierarchical form defined by an Ontology. Through the exploitation of the Ontology, it is possible to apply semantic engine reasoners to infer new knowledge about the network. The second step of the dissertation deals with the minimization of the usage of every node in a WSN. The main purpose of each node is to collect environmental data and to exchange hem with other nodes. To minimize battery consumption, it is necessary to limit the radio usage. Therefore, we implemented Razor, a new lightweight algorithm which is expected to improve data compression and classification by leveraging on the advantages offered by data mining methods for optimizing communications and by enhancing information transmission to simplify data classification. Data compression is performed studying the well-know Vector Quantization (VQ) theory in order to create the codebooks necessary for signal compression. At the same time, it is requested to give a semantic meaning to un- known signals. In this way, the codebook feature is able not only to compress the signals, but also to classify unknown signals. Razor is compared with both state-of-the-art compression and signal classification techniques for WSN . The third part of the thesis covers the concept of smart object applied to Robotic research. A critical issue is how a robot can localize and retrieve smart objects in a real scenario without any prior knowledge. In order to achieve the objectives, it is possible to exploit the smart object concept and localize them through RSSI measurements. After the localization phase, the robot can exploit its own camera to retrieve the objects. Several filtering algorithms are developed in order to mitigate the multi–path issue due to the wireless communication channel and to achieve a better distance estimation through the RSSI measurement. The last part of the dissertation deals with the design and the development of a Cognitive Network (CN) testbed using off the shelf devices. The device type is chosen considering the cost, usability, configurability, mobility and possibility to modify the Operating System (OS) source code. Thus, the best choice is to select some devices based on Linux kernel as Android OS. The feature to modify the Operating System is required to extract the TCP/IP protocol stack parameters for the CN paradigm. It is necessary to monitor the network status in real-time and to modify the critical parameters in order to improve some performance, such as bandwidth consumption, number of hops to exchange the data, and throughput

    Energy-precision tradeoffs in the graphics pipeline

    Get PDF
    The energy consumption of a graphics processing unit (GPU) is an important factor in its design, whether for a server, desktop, or mobile device. Mobile products, such as smart phones, tablets, and laptop computers, rely on batteries to function; the less the demand for power is on these batteries, the longer they will last before needing to be recharged. GPUs used in servers and desktops, while not dependent on a battery for operation, are still limited by the efficiency of power supplies and heat dissipation techniques. In this dissertation, I propose to lower the energy consumption of GPUs by reducing the precision of floating-point arithmetic in the graphics pipeline and the data sent and stored on- and off-chip. The key idea behind this work is twofold: energy can be saved through a systematic and targeted reduction in the number of bits 1) computed and 2) communicated. Reducing the number of bits computed will necessarily reduce either the precision or range of a floating point number. I focus on saving energy by way of reducing precision, which can exploit the over-provisioning of bits in many stages of the graphics pipeline. Reducing the number of bits communicated takes several forms. First, I propose enhancements to existing compression schemes for off-chip buffers to save bandwidth. I also suggest a simple extension that exploits unused bits in reduced-precision data undergoing compression. Finally, I present techniques for saving energy in on-chip communication of reduced-precision data. By designing and simulating variable-precision arithmetic circuits with promising energy versus precision characteristics and tradeoffs, I have developed an energy model for GPUs. Using this model and my techniques, I have shown that significant savings (up to 70% in computation in the vertex and pixel shader stages) are possible by reducing the precision of the arithmetic. Further, my compression approaches have enabled improvements of 1.26x over past work, and a general-purpose compressor design has achieved bandwidth savings of 34%, 87%, and 65% for color, depth, and geometry data, respectively, which is competitive with past work. Lastly, an initial exploration in signal gating unused lines in on-chip buses has suggested savings of 13-48% for the tested applications' traffic from a multiprocessor's register file to its L1 cache
    corecore