237 research outputs found

    Efficient mapping of hierarchical trees on coarse-grain reconfigurable architectures

    Full text link
    Reconfigurable architectures have become increasingly important in recent years. In this paper we present an approach to the problem of executing 3D graphics interactive applications onto these architectures. The hierarchical trees are usually implemented to reduce the data processed, thereby diminishing the execution time. We have developed a mapping scheme that parallelizes the tree execution onto a SIMD reconfigurable architecture. This mapping scheme considerably reduces the time penalty caused by the possibility of executing different tree nodes in SIMD fashion. We have developed a technique that achieves an efficient hierarchical tree execution taking decisions at execution time. It also promotes the possibility of data coherence in order to reduce the execution time. The experimental results show high performance and efficient resource utilization on tested applications

    Doctor of Philosophy

    Get PDF
    dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches

    Mobile graphics: SIGGRAPH Asia 2017 course

    Get PDF
    Peer ReviewedPostprint (published version

    Hardware Accelerators for Animated Ray Tracing

    Get PDF
    Future graphics processors are likely to incorporate hardware accelerators for real-time ray tracing, in order to render increasingly complex lighting effects in interactive applications. However, ray tracing poses difficulties when drawing scenes with dynamic content, such as animated characters and objects. In dynamic scenes, the spatial datastructures used to accelerate ray tracing are invalidated on each animation frame, and need to be rapidly updated. Tree update is a complex subtask in its own right, and becomes highly expensive in complex scenes. Both ray tracing and tree update are highly memory-intensive tasks, and rendering systems are increasingly bandwidth-limited, so research on accelerator hardware has focused on architectural techniques to optimize away off-chip memory traffic. Dynamic scene support is further complicated by the recent introduction of compressed trees, which use low-precision numbers for storage and computation. Such compression reduces both the arithmetic and memory bandwidth cost of ray tracing, but adds to the complexity of tree update.This thesis proposes methods to cope with dynamic scenes in hardware-accelerated ray tracing, with focus on reducing traffic to external memory. Firstly, a hardware architecture is designed for linear bounding volume hierarchy construction, an algorithm which is a basic building block in most state-of-the-art software tree builders. The algorithm is rearranged into a streaming form which reduces traffic to one-third of software implementations of the same algorithm. Secondly, an algorithm is proposed for compressing bounding volume hierarchies in a streaming manner as they are output from a hardware builder, instead of performing compression as a postprocessing pass. As a result, with the proposed method, compression reduces the overall cost of tree update rather than increasing it. The last main contribution of this thesis is an evaluation of shallow bounding volume hierarchies, common in software ray tracing, for use in hardware pipelines. These are found to be more energy-efficient than binary hierarchies. The results in this thesis both confirm that dynamic scene support may become a bottleneck in real time ray tracing, and add to the state of the art on tree update in terms of energy-efficiency, as well as the complexity of scenes that can be handled in real time on resource-constrained platforms

    Doctor of Philosophy in Computer Science

    Get PDF
    dissertationRay tracing is becoming more widely adopted in offline rendering systems due to its natural support for high quality lighting. Since quality is also a concern in most real time systems, we believe ray tracing would be a welcome change in the real time world, but is avoided due to insufficient performance. Since power consumption is one of the primary factors limiting the increase of processor performance, it must be addressed as a foremost concern in any future ray tracing system designs. This will require cooperating advances in both algorithms and architecture. In this dissertation I study ray tracing system designs from a data movement perspective, targeting the various memory resources that are the primary consumer of power on a modern processor. The result is high performance, low energy ray tracing architectures

    Algorithm/Architecture Co-Exploration of Visual Computing: Overview and Future Perspectives

    Get PDF
    Concurrently exploring both algorithmic and architectural optimizations is a new design paradigm. This survey paper addresses the latest research and future perspectives on the simultaneous development of video coding, processing, and computing algorithms with emerging platforms that have multiple cores and reconfigurable architecture. As the algorithms in forthcoming visual systems become increasingly complex, many applications must have different profiles with different levels of performance. Hence, with expectations that the visual experience in the future will become continuously better, it is critical that advanced platforms provide higher performance, better flexibility, and lower power consumption. To achieve these goals, algorithm and architecture co-design is significant for characterizing the algorithmic complexity used to optimize targeted architecture. This paper shows that seamless weaving of the development of previously autonomous visual computing algorithms and multicore or reconfigurable architectures will unavoidably become the leading trend in the future of video technology

    Dynamic task scheduling and binding for many-core systems through stream rewriting

    Get PDF
    This thesis proposes a novel model of computation, called stream rewriting, for the specification and implementation of highly concurrent applications. Basically, the active tasks of an application and their dependencies are encoded as a token stream, which is iteratively modified by a set of rewriting rules at runtime. In order to estimate the performance and scalability of stream rewriting, a large number of experiments have been evaluated on many-core systems and the task management has been implemented in software and hardware.In dieser Dissertation wurde Stream Rewriting als eine neue Methode entwickelt, um Anwendungen mit einer großen Anzahl von dynamischen Tasks zu beschreiben und effizient zur Laufzeit verwalten zu können. Dabei werden die aktiven Tasks in einem Datenstrom verpackt, der zur Laufzeit durch wiederholtes Suchen und Ersetzen umgeschrieben wird. Um die Performance und Skalierbarkeit zu bestimmen, wurde eine Vielzahl von Experimenten mit Many-Core-Systemen durchgeführt und die Verwaltung von Tasks über Stream Rewriting in Software und Hardware implementiert

    Realtime ray tracing and interactive global illumination

    Get PDF
    One of the most sought-for goals in computer graphics is to generate "realism in real time". i.e. the generation of realistically looking images at realtime frame rates. Today, virtually all approaches towards realtime rendering use graphics hardware, which is based almost exclusively on triangle rasterization. Unfortunately, though this technology has seen tremendous progress over the last few years, for many applications it is currently reaching its limits in both model complexity, supported features, and achievable realism. An alternative to triangle rasterizations is the ray tracing algorithm, which is well-known for its higher flexibility, its generally higher achievable realism, and its superior scalability in both model size and compute power. However, ray tracing is also computationally demanding and thus so far is used almost exclusively for high-quality offline rendering tasks. This dissertation focuses on the question why ray tracing is likely to soon play a larger role for interactive applications, and how this scenario can be reached. To this end, we discuss the RTRT/OpenRT realtime ray tracing system, a software based ray tracing system that achieves interactive to realtime frame rates on todays commodity CPUs. In particular, we discuss the overall system design, the efficient implementation of the core ray tracing algorithms, techniques for handling dynamic scenes, an efficient parallelization framework, and an OpenGL-like low-level API. Taken together, these techniques form a complete realtime rendering engine that supports massively complex scenes, highley realistic and physically correct shading, and even physically based lighting simulation at interactive rates. In the last part of this thesis we then discuss the implications and potential of realtime ray tracing on global illumination, and how the availability of this new technology can be leveraged to finally achieve interactive global illumination - the physically correct simulation of light transport at interactive rates.Eines der wichtigsten Ziele der Computer-Graphik ist die Generierung von "Realismus in Echtzeit\u27; — die Erzeugung von realistisch wirkenden, computer- generierten Bildern in Echtzeit. Heutige Echtzeit-Graphikanwendungen werden derzeit zum überwiegenden Teil mit schneller Graphik-Hardware realisiert, welche zum aktuellen Stand der Technik fast ausschliesslich auf dem Dreiecksrasterisierungsalgorithmus basiert. Obwohl diese Rasterisierungstechnologie in den letzten Jahren zunehmend beeindruckende Fortschritte gemacht hat, stößt sie heutzutage zusehends an ihre Grenzen, speziell im Hinblick auf Modellkomplexität, unterstützte Beleuchtungseffekte, und erreichbaren Realismus. Eine Alternative zur Dreiecksrasterisierung ist das "Ray-Tracing\u27; (Stahl-Rückverfolgung), welches weithin bekannt ist für seine höhere Flexibilität, seinen im Großen und Ganzen höheren erreichbaren Realismus, und seine bessere Skalierbarkeit sowohl in Szenengröße als auch in Rechner-Kapazitäten. Allerdings ist Ray-Tracing ebenso bekannt für seinen hohen Rechenbedarf, und wird daher heutzutage fast ausschließlich für die hochqualitative, nichtinteraktive Bildsynthese benutzt. Diese Dissertation behandelt die Gründe warum Ray-Tracing in näherer Zukunft voraussichtlich eine größere Rolle für interaktive Graphikanwendungen spielen wird, und untersucht, wie dieses Szenario des Echtzeit Ray-Tracing erreicht werden kann. Hierfür stellen wir das RTRT/OpenRT Echtzeit Ray-Tracing System vor, ein software-basiertes Ray-Tracing System, welches es erlaubt, interaktive Performanz auf heutigen Standard-PC-Prozessoren zu erreichen. Speziell diskutieren wir das grundlegende System-Design, die effiziente Implementierung der Kern-Algorithmen, Techniken zur Unterstützung von dynamischen Szenen, ein effizientes Parallelisierungs-Framework, und eine OpenGL-ähnliche Anwendungsschnittstelle. In ihrer Gesamtheit formen diese Techniken ein komplettes Echtzeit-Rendering-System, welches es erlaubt, extrem komplexe Szenen, hochgradig realistische und physikalisch korrekte Effekte, und sogar physikalisch-basierte Beleuchtungssimulation interaktiv zu berechnen. Im letzten Teil der Dissertation behandeln wir dann die Implikationen und das Potential, welches Echtzeit Ray-Tracing für die Globale Beleuchtungssimulation bietet, und wie die Verfügbarkeit dieser neuen Technologie benutzt werden kann, um letztendlich auch Globale Belechtung — die physikalisch korrekte Simulation des Lichttransports — interaktiv zu berechnen

    Viability of Numerical Full-Wave Techniques in Telecommunication Channel Modelling

    Get PDF
    In telecommunication channel modelling the wavelength is small compared to the physical features of interest, therefore deterministic ray tracing techniques provide solutions that are more efficient, faster and still within time constraints than current numerical full-wave techniques. Solving fundamental Maxwell's equations is at the core of computational electrodynamics and best suited for modelling electrical field interactions with physical objects where characteristic dimensions of a computing domain is on the order of a few wavelengths in size. However, extreme communication speeds, wireless access points closer to the user and smaller pico and femto cells will require increased accuracy in predicting and planning wireless signals, testing the accuracy limits of the ray tracing methods. The increased computing capabilities and the demand for better characterization of communication channels that span smaller geographical areas make numerical full-wave techniques attractive alternative even for larger problems. The paper surveys ways of overcoming excessive time requirements of numerical full-wave techniques while providing acceptable channel modelling accuracy for the smallest radio cells and possibly wider. We identify several research paths that could lead to improved channel modelling, including numerical algorithm adaptations for large-scale problems, alternative finite-difference approaches, such as meshless methods, and dedicated parallel hardware, possibly as a realization of a dataflow machine
    corecore