66 research outputs found

    Dynamic task scheduling and binding for many-core systems through stream rewriting

    Get PDF
    This thesis proposes a novel model of computation, called stream rewriting, for the specification and implementation of highly concurrent applications. Basically, the active tasks of an application and their dependencies are encoded as a token stream, which is iteratively modified by a set of rewriting rules at runtime. In order to estimate the performance and scalability of stream rewriting, a large number of experiments have been evaluated on many-core systems and the task management has been implemented in software and hardware.In dieser Dissertation wurde Stream Rewriting als eine neue Methode entwickelt, um Anwendungen mit einer großen Anzahl von dynamischen Tasks zu beschreiben und effizient zur Laufzeit verwalten zu können. Dabei werden die aktiven Tasks in einem Datenstrom verpackt, der zur Laufzeit durch wiederholtes Suchen und Ersetzen umgeschrieben wird. Um die Performance und Skalierbarkeit zu bestimmen, wurde eine Vielzahl von Experimenten mit Many-Core-Systemen durchgeführt und die Verwaltung von Tasks über Stream Rewriting in Software und Hardware implementiert

    Application-Directed DVFS using Multiple Clock Domains on Graphics Hardware

    Get PDF
    As handheld devices have become increasingly popular, powerful programmable graphics hardware for mobile and handheld devices has been deployed. While many resources on mobile devices are limited, the predominant problem for mobile devices is their limited battery power. Several techniques have been proposed to increase the energy efficiency of mobile applications and improve battery life. In this thesis, we propose a new dynamic voltage and frequency scaling (DVFS) on Graphics Processing Units (GPU). In most cases, cues within the graphics appli- cation can be used to predict portions of a GPU that will be used or unused when the application is run. We partition the GPU into six clock domains that can be clocked at different rates. Specifically, each domain it has its own voltage and frequency set- ting based on its predicted workload to save energy without reducing applications frame rates. In addition, we propose an signature-based algorithm for predicting the workload offered to our six clock domains by a given application to decide voltage and frequency settings. We conduct experiments and compare the results of our new signature based workload prediction algorithm with some other traditional interval based workload prediction algorithms. Our results show that our signature-based prediction can save 30-50% energy without afecting application frame rates

    Improving GPU performance : reducing memory conflicts and latency

    Get PDF

    Improving GPU performance : reducing memory conflicts and latency

    Get PDF

    Mobile graphics: SIGGRAPH Asia 2017 course

    Get PDF
    Peer ReviewedPostprint (published version

    Programmable Image-Based Light Capture for Previsualization

    Get PDF
    Previsualization is a class of techniques for creating approximate previews of a movie sequence in order to visualize a scene prior to shooting it on the set. Often these techniques are used to convey the artistic direction of the story in terms of cinematic elements, such as camera movement, angle, lighting, dialogue, and character motion. Essentially, a movie director uses previsualization (previs) to convey movie visuals as he sees them in his minds-eye . Traditional methods for previs include hand-drawn sketches, Storyboards, scaled models, and photographs, which are created by artists to convey how a scene or character might look or move. A recent trend has been to use 3D graphics applications such as video game engines to perform previs, which is called 3D previs. This type of previs is generally used prior to shooting a scene in order to choreograph camera or character movements. To visualize a scene while being recorded on-set, directors and cinematographers use a technique called On-set previs, which provides a real-time view with little to no processing. Other types of previs, such as Technical previs, emphasize accurately capturing scene properties but lack any interactive manipulation and are usually employed by visual effects crews and not for cinematographers or directors. This dissertation\u27s focus is on creating a new method for interactive visualization that will automatically capture the on-set lighting and provide interactive manipulation of cinematic elements to facilitate the movie maker\u27s artistic expression, validate cinematic choices, and provide guidance to production crews. Our method will overcome the drawbacks of the all previous previs methods by combining photorealistic rendering with accurately captured scene details, which is interactively displayed on a mobile capture and rendering platform. This dissertation describes a new hardware and software previs framework that enables interactive visualization of on-set post-production elements. A three-tiered framework, which is the main contribution of this dissertation is; 1) a novel programmable camera architecture that provides programmability to low-level features and a visual programming interface, 2) new algorithms that analyzes and decomposes the scene photometrically, and 3) a previs interface that leverages the previous to perform interactive rendering and manipulation of the photometric and computer generated elements. For this dissertation we implemented a programmable camera with a novel visual programming interface. We developed the photometric theory and implementation of our novel relighting technique called Symmetric lighting, which can be used to relight a scene with multiple illuminants with respect to color, intensity and location on our programmable camera. We analyzed the performance of Symmetric lighting on synthetic and real scenes to evaluate the benefits and limitations with respect to the reflectance composition of the scene and the number and color of lights within the scene. We found that, since our method is based on a Lambertian reflectance assumption, our method works well under this assumption but that scenes with high amounts of specular reflections can have higher errors in terms of relighting accuracy and additional steps are required to mitigate this limitation. Also, scenes which contain lights whose colors are a too similar can lead to degenerate cases in terms of relighting. Despite these limitations, an important contribution of our work is that Symmetric lighting can also be leveraged as a solution for performing multi-illuminant white balancing and light color estimation within a scene with multiple illuminants without limits on the color range or number of lights. We compared our method to other white balance methods and show that our method is superior when at least one of the light colors is known a priori

    Parallel For Loops on Heterogeneous Resources

    Get PDF
    In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

    PERFORMANCE EVALUATION OF MEMORY AND COMPUTATIONALLY BOUND CHEMISTRY APPLICATIONS ON STREAMING GPGPUS AND MULTI-CORE X86 CPUS

    Get PDF
    In recent years, multi-core processors have come to dominate the field in desktop and high performance computing. Graphics processors traditionally used in CAD, video games, and other 3-d applications, have become more programmable and are now suitable for general purpose computing. This thesis explores multi-core processors and GPU performance and limitations in two computational chemistry applications: a memory bound component of ab-initio modeling and a computationally bound Monte Carlo simulation. For the applications presented in this thesis, exploiting multiple processors is done using a variety of tools and languages including OpenMP and MKL. Brook+ and the Compute Abstraction Layer streaming environments are used to accelerate applications on AMD GPUs. This thesis gives qualitative assertions about these languages and tools regarding ease of use and optimization in addition to quantitative analyses of performance. GPUs can yield modest performance improvements with little effort in some applications and even larger speedups with simple optimizations
    corecore