1,709 research outputs found

    Specialization Opportunities in Graphical Workloads

    Get PDF
    Computer games are complex performance-critical graphical applications which require specialized GPU hardware. For this reason, GPU drivers often include many heuristics to help optimize throughput. Recently however, new APIs are emerging which sacrifice many heuristics for lower-level hardware control and more predictable driver behavior. This shifts the burden for many optimizations from GPU driver developers to game programmers, but also provides numerous opportunities to exploit application-specific knowledge."br/""br/"This paper examines different opportunities for specializing GPU code and reducing redundant data transfers. Static analysis of commercial games shows that 5-18% of GPU code is specializable by pruning dead data elements or moving portions to different graphics pipeline stages. In some games, up to 97% of the programs’ data inputs of a particular type, namely uniform variables, are unused, as well as up to 62% of those in the GPU internal vertex-fragment interface. This shows potential for improving memory usage and communication overheads. Insome test scenarios, removing dead uniform data can lead to 6x performance improvements."br/""br/"We also explore the upper limits of specialization if all dynamic inputs are constant at run-time. For instance, if uniform inputs are constant, up to 44% of instructions can be eliminated in some games, with a further 14% becoming constant-foldable at compile time. Analysis of run-time traces, reveals that 48-91% of uniform inputs are constant in real games, so values close to the upper limit may be achieved in practice

    Near-Memory Address Translation

    Full text link
    Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

    Cloud engineering is search based software engineering too

    Get PDF
    Many of the problems posed by the migration of computation to cloud platforms can be formulated and solved using techniques associated with Search Based Software Engineering (SBSE). Much of cloud software engineering involves problems of optimisation: performance, allocation, assignment and the dynamic balancing of resources to achieve pragmatic trade-offs between many competing technical and business objectives. SBSE is concerned with the application of computational search and optimisation to solve precisely these kinds of software engineering challenges. Interest in both cloud computing and SBSE has grown rapidly in the past five years, yet there has been little work on SBSE as a means of addressing cloud computing challenges. Like many computationally demanding activities, SBSE has the potential to benefit from the cloud; ‘SBSE in the cloud’. However, this paper focuses, instead, of the ways in which SBSE can benefit cloud computing. It thus develops the theme of ‘SBSE for the cloud’, formulating cloud computing challenges in ways that can be addressed using SBSE

    Assessing Opportunities of SYCL and Intel oneAPI for Biological Sequence Alignment

    Full text link
    Background and objectives. The computational biology area is growing up over the years. The interest in researching and developing computational tools for the acquisition, storage, organization, analysis, and visualization of biological data generates the need to create new hardware architectures and new software tools that allow processing big data in acceptable times. In this sense, heterogeneous computing takes an important role in providing solutions but at the same time generates new challenges for developers in relation to the impossibility of porting source code between different architectures. Methods. Intel has recently introduced oneAPI, a new unified programming environment that allows code developed in the SYCL-based Data Parallel C++ (DPC++) language to be run on different devices such as CPUs, GPUs, and FPGAs, among others. Due to the large amount of CUDA software in the field of bioinformatics, this paper presents the migration process of the SW\# suite, a biological sequence alignment tool developed in CUDA, to DPC++ through the oneAPI compatibility tool dpc (recently renowned as SYCLomatic). Results. SW\# has been completely migrated with a small programmer intervention in terms of hand-coding. Moreover, it has been possible to port the migrated code between different architectures (considering different target platforms and vendors), with no noticeable performance degradation. Conclusions. The SYCLomatic tool presented a great performance-portability rate. SYCL and Intel oneAPI can offer attractive opportunities for the Bioinformatics community, especially considering the vast existence of CUDA-based legacy codes

    Comparative Study Of Implementing The On-Premises and Cloud Business Intelligence On Business Problems In a Multi-National Software Development Company

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNowadays every enterprise wants to be competitive. In the last decade, the data volumes are increased dramatically. As each year data in the market increases, the ability to extract, analyze and manage the data become the backbone condition for the organization to be competitive. In this condition, organizations need to adapt their technologies to the new business reality in order to be competitive and provide new solutions that meet new requests. Business Intelligence by the main definition is the ability to extract analyze and manage the data through which an organization gain a competitive advantage. Before using this approach, it’s important to decide on which computing system it will base on, considering the volume of data, business context of the organization and technologies requirements of the market. In the last 10 years, the popularity of cloud computing increased and divided the computing Systems into On-Premises and cloud. The cloud benefits are based on providing scalability, availability and fewer costs. On another hand, traditional On-Premises provides independence of software configuration, control over data and high security. The final decision as to which computing paradigm to follow in the organization it’s not an easy task as well as depends on the business context of the organization, and the characteristics of the performance of the current On-Premises systems in business processes. In this case, Business Intelligence functions and requires in-depth analysis in order to understand if cloud computing technologies could better perform in those processes than traditional systems. The objective of this internship is to conduct a comparative study between 2 computing systems in Business Intelligence routine functions. The study will compare the On-Premises Business Intelligence Based on Oracle Architecture with Cloud Business Intelligence based on Google Cloud Services. A comparative study will be conducted through participation in activities and projects in the Business Intelligence department, of a company that develops software digital solutions to serve the telecommunications market for 12 months, as an internship student in the 2nd year of a master’s degree in Information Management, with a specialization in Knowledge Management and Business Intelligence at Nova Information Management School (NOVA IMS)

    A Perspective on Safety and Real-Time Issues for GPU Accelerated ADAS

    Get PDF
    The current trend in designing Advanced Driving Assistance System (ADAS) is to enhance their computing power by using modern multi/many core accelerators. For many critical applications such as pedestrian detection, line following, and path planning the Graphic Processing Unit (GPU) is the most popular choice for obtaining orders of magnitude increases in performance at modest power consumption. This is made possible by exploiting the general purpose nature of today's GPUs, as such devices are known to express unprecedented performance per watt on generic embarrassingly parallel workloads (as opposed of just graphical rendering, as GPUs where only designed to sustain in previous generations). In this work, we explore novel challenges that system engineers have to face in terms of real-time constraints and functional safety when the GPU is the chosen accelerator. More specifically, we investigate how much of the adopted safety standards currently applied for traditional platforms can be translated to a GPU accelerated platform used in critical scenarios
    • …
    corecore