161 research outputs found

    Evaluation of genetic improvement tools for improvement of non-functional properties of software

    Get PDF
    Genetic improvement (GI) improves both functional properties of software, such as bug repair, and non-functional properties, such as execution time, energy consumption, or source code size. There are studies summarising and comparing GI tools for improving functional properties of software; however there is no such study for improvement of its non-functional properties using GI. Therefore, this research aims to survey and report on the existing GI tools for improvement of non-functional properties of software. We conducted a literature review of available GI tools, and ran multiple experiments on the found open-source tools to examine their usability. We applied a cross-testing strategy to check whether the available tools can work on different programs. Overall, we found 63 GI papers that use a GI tool to improve nonfunctional properties of software, within which 31 are accompanied with open-source code. We were able to successfully run eight GI tools, and found that ultimately only two ---Gin and PyGGI--- can be readily applied to new general software

    Specialization Opportunities in Graphical Workloads

    Get PDF
    Computer games are complex performance-critical graphical applications which require specialized GPU hardware. For this reason, GPU drivers often include many heuristics to help optimize throughput. Recently however, new APIs are emerging which sacrifice many heuristics for lower-level hardware control and more predictable driver behavior. This shifts the burden for many optimizations from GPU driver developers to game programmers, but also provides numerous opportunities to exploit application-specific knowledge."br/""br/"This paper examines different opportunities for specializing GPU code and reducing redundant data transfers. Static analysis of commercial games shows that 5-18% of GPU code is specializable by pruning dead data elements or moving portions to different graphics pipeline stages. In some games, up to 97% of the programs’ data inputs of a particular type, namely uniform variables, are unused, as well as up to 62% of those in the GPU internal vertex-fragment interface. This shows potential for improving memory usage and communication overheads. Insome test scenarios, removing dead uniform data can lead to 6x performance improvements."br/""br/"We also explore the upper limits of specialization if all dynamic inputs are constant at run-time. For instance, if uniform inputs are constant, up to 44% of instructions can be eliminated in some games, with a further 14% becoming constant-foldable at compile time. Analysis of run-time traces, reveals that 48-91% of uniform inputs are constant in real games, so values close to the upper limit may be achieved in practice

    Implementation of digital pheromones in PSO accelerated by commodity Graphics Hardware

    Get PDF
    In this paper, a model for Graphics Processing Unit (GPU) implementation of Particle Swarm Optimization (PSO) using digital pheromones to coordinate swarms within ndimensional design spaces is presented. Previous work by the authors demonstrated the capability of digital pheromones within PSO for searching n-dimensional design spaces with improved accuracy, efficiency and reliability in both serial and parallel computing environments using traditional CPUs. Modern GPUs have proven to outperform the number of floating point operations when compared to CPUs through inherent data parallel architecture and higher bandwidth capabilities. The advent of programmable graphics hardware in the recent times further provided a suitable platform for scientific computing particularly in the field of design optimization. However, the data parallel architecture of GPUs requires a specialized formulation for leveraging its computational capabilities. When the objective function computations are appropriately formulated for GPUs, it is theorized that the solution efficiency (speed) can be significantly increased while maintaining solution accuracy. The development of this method together with a number of multi-modal unconstrained test problems are tested and presented in this paper

    Platform Independent Real-Time X3D Shaders and their Applications in Bioinformatics Visualization

    Get PDF
    Since the introduction of programmable Graphics Processing Units (GPUs) and procedural shaders, hardware vendors have each developed their own individual real-time shading language standard. None of these shading languages is fully platform independent. Although this real-time programmable shader technology could be developed into 3D application on a single system, this platform dependent limitation keeps the shader technology away from 3D Internet applications. The primary purpose of this dissertation is to design a framework for translating different shader formats to platform independent shaders and embed them into the eXtensible 3D (X3D) scene for 3D web applications. This framework includes a back-end core shader converter, which translates shaders among different shading languages with a middle XML layer. Also included is a shader library containing a basic set of shaders that developers can load and add shaders to. This framework will then be applied to some applications in Biomolecular Visualization

    Novel Methodologies for Predictable CPU-To-GPU Command Offloading

    Get PDF
    There is an increasing industrial and academic interest towards a more predictable characterization of real-time tasks on high-performance heterogeneous embedded platforms, where a host system offloads parallel workloads to an integrated accelerator, such as General Purpose-Graphic Processing Units (GP-GPUs). In this paper, we analyze an important aspect that has not yet been considered in the real-time literature, and that may significantly affect real-time performance if not properly treated, i.e., the time spent by the CPU for submitting GP-GPU operations. We will show that the impact of CPU-to-GPU kernel submissions may be indeed relevant for typical real-time workloads, and that it should be properly factored in when deriving an integrated schedulability analysis for the considered platforms. This is the case when an application is composed of many small and consecutive GPU compute/copy operations. While existing techniques mitigate this issue by batching kernel calls into a reduced number of persistent kernel invocations, in this work we present and evaluate three other approaches that are made possible by recently released versions of the NVIDIA CUDA GP-GPU API, and by Vulkan, a novel open standard GPU API that allows an improved control of GPU command submissions. We will show that this added control may significantly improve the application performance and predictability due to a substantial reduction in CPU-to-GPU driver interactions, making Vulkan an interesting candidate for becoming the state-of-the-art API for heterogeneous Real-Time systems. Our findings are evaluated on a latest generation NVIDIA Jetson AGX Xavier embedded board, executing typical workloads involving Deep Neural Networks of parameterized complexity

    Real-time rendering and simulation of trees and snow

    Get PDF
    Tree models created by an industry used package are exported and the structure extracted in order to procedurally regenerate the geometric mesh, addressing the limitations of the application's standard output. The structure, once extracted, is used to fully generate a high quality skeleton for the tree, individually representing each section in every branch to give the greatest achievable level of freedom of deformation and animation. Around the generated skeleton, a new geometric mesh is wrapped using a single, continuous surface resulting in the removal of intersection based render artefacts. Surface smoothing and enhanced detail is added to the model dynamically using the GPU enhanced tessellation engine. A real-time snow accumulation system is developed to generate snow cover on a dynamic, animated scene. Occlusion techniques are used to project snow accumulating faces and map exposed areas to applied accumulation maps in the form of dynamic textures. Accumulation maps are xed to applied surfaces, allowing moving objects to maintain accumulated snow cover. Mesh generation is performed dynamically during the rendering pass using surface o�setting and tessellation to enhance required detail

    Genetic Improvement of Software: a Comprehensive Survey

    Get PDF
    Genetic improvement (GI) uses automated search to find improved versions of existing software. We present a comprehensive survey of this nascent field of research with a focus on the core papers in the area published between 1995 and 2015. We identified core publications including empirical studies, 96% of which use evolutionary algorithms (genetic programming in particular). Although we can trace the foundations of GI back to the origins of computer science itself, our analysis reveals a significant upsurge in activity since 2012. GI has resulted in dramatic performance improvements for a diverse set of properties such as execution time, energy and memory consumption, as well as results for fixing and extending existing system functionality. Moreover, we present examples of research work that lies on the boundary between GI and other areas, such as program transformation, approximate computing, and software repair, with the intention of encouraging further exchange of ideas between researchers in these fields

    Increasing the performance and realism of procedurally generated buildings

    Get PDF
    As multimedia such as games and movies grow, so does the need for content. Textures, 3D models, expansive terrain, sound effects, and other data must be generated to support and enrich these multimedia productions. As this need for content continues to grow, two critical problems emerge: the cost of hiring artists to create the content becomes extremely large, as does the amount of memory needed to store and manipulate the content.;To combat these issues, procedural content generation, or content generated algorithmically rather than via an artist, has been introduced. Algorithmically generating content allows for rapid creation of large amounts of certain classes of content with little human effort; further, this content can be represented extremely compactly, often by only exposing a handful of parameters.;In the realm of 3D building generation, split grammars have proven useful for generating a wide variety of buildings while being relatively intuitive. These split grammars have been used to generate entire cities full of detailed buildings with a fairly small number of rules.;Split grammars have two important areas which can be expanded upon: first, the writing of an appropriate grammar can require a significant amount of work and knowledge, especially when a grammar is required that must follow a certain building style while providing a high degree of variation. Second, applying these grammars to produce a building can be slow, often requiring an offline pregeneration phase which eliminates the usefulness the size benefits of the grammar\u27s compactness.;For the first problem, we propose a data mining approach to refining preexisting grammars, wherein a user can specify buildings which they prefer, and from these preferences a set of rules will be generated that will guide future building generation. We will show that the generated rules have a high degree of accuracy when used to predict whether a user will like or dislike a building, often in the upper 90%.;For the second problem, we provide two areas of improvement: a preprocessing step which parses a split grammar to make it easier and more efficient to apply the grammar without loss of generality, and a scheme that allows the execution of a grammar entirely within a geometry shader on a modern graphics processing unit (GPU) such that building generation can take advantage of the parallelization found on modern graphics cards. We will show that this second improvement can provide a speed benefit anywhere between 3 and 10 times a purely CPU approach, with further speed benefits possible depending on the nature of the grammars
    • …
    corecore