263 research outputs found

    A learning experience toward the understanding of abstraction-level interactions in parallel applications

    Get PDF
    In the curriculum of a Computer Engineering program, concepts like parallelism, concurrency, consistency, or atomicity are usually addressed in separate courses due to their thoroughness and extension. Isolating such concepts in courses helps students not only to focus on specific aspects, but also to experience the reality of working with modern computer systems, where those concepts are often detached in different abstraction levels. However, due to such an isolation, it exists a risk of inducing to the students an absence of interactions between these concepts, and, by extension, between the different abstraction levels of a system. This paper proposes a learning experience showcasing the interactions between abstraction levels addressed in laboratory sessions of different courses. The driving example is a parallel ray tracer. In the different courses, students implement and assemble components of this application from the algorithmic level of the tracer to the assembly instructions required to guarantee atomicity. Each lab focuses on a single abstraction level, but shows students the interactions with the rest of the levels. Technical results and student learning outcomes through the analysis of surveys validate the proposed experience and confirm the students learning improvement with a more integrated view of the system

    Software Approaches to Manage Resource Tradeoffs of Power and Energy Constrained Applications

    Get PDF
    Power and energy efficiency have become an increasingly important design metric for a wide spectrum of computing devices. Battery efficiency, which requires a mixture of energy and power efficiency, is exceedingly important especially since there have been no groundbreaking advances in battery capacity recently. The need for energy and power efficiency stretches from small embedded devices to portable computers to large scale data centers. The projected future of computing demand, referred to as exascale computing, demands that researchers find ways to perform exaFLOPs of computation at a power bound much lower than would be required by simply scaling today's standards. There is a large body of work on power and energy efficiency for a wide range of applications and at different levels of abstraction. However, there is a lack of work studying the nuances of different tradeoffs that arise when operating under a power/energy budget. Moreover, there is no work on constructing a generalized model of applications running under power/energy constraints, which allows the designer to optimize their resource consumption, be it power, energy, time, bandwidth, or space. There is need for an efficient model that can provide bounds on the optimality of an application's resource consumption, becoming a basis against which online resource management heuristics can be measured. In this thesis, we tackle the problem of managing resource tradeoffs of power/energy constrained applications. We begin by studying the nuances of power/energy tradeoffs with the response time and throughput of stream processing applications. We then study the power performance tradeoff of batch processing applications to identify a power configuration that maximizes performance under a power bound. Next, we study the tradeoff of power/energy with network bandwidth and precision. Finally, we study how to combine tradeoffs into a generalized model of applications running under resource constraints. The work in this thesis presents detailed studies of the power/energy tradeoff with response time, throughput, performance, network bandwidth, and precision of stream and batch processing applications. To that end, we present an adaptive algorithm that manages stream processing tradeoffs of response time and throughput at the CPU level. At the task-level, we present an online heuristic that adaptively distributes bounded power in a cluster to improve performance, as well as an offline approach to optimally bound performance. We demonstrate how power can be used to reduce bandwidth bottlenecks and extend our offline approach to model bandwidth tradeoffs. Moreover, we present a tool that identifies parts of a program that can be downgraded in precision with minimal impact on accuracy, and maximal impact on energy consumption. Finally, we combine all the above tradeoffs into a flexible model that is efficient to solve and allows for bounding and/or optimizing the consumption of different resources

    Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware

    Get PDF
    The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems

    Hardware Acceleration for Unstructured Big Data and Natural Language Processing.

    Full text link
    The confluence of the rapid growth in electronic data in recent years, and the renewed interest in domain-specific hardware accelerators presents exciting technical opportunities. Traditional scale-out solutions for processing the vast amounts of text data have been shown to be energy- and cost-inefficient. In contrast, custom hardware accelerators can provide higher throughputs, lower latencies, and significant energy savings. In this thesis, I present a set of hardware accelerators for unstructured big-data processing and natural language processing. The first accelerator, called HAWK, aims to speed up the processing of ad hoc queries against large in-memory logs. HAWK is motivated by the observation that traditional software-based tools for processing large text corpora use memory bandwidth inefficiently due to software overheads, and, thus, fall far short of peak scan rates possible on modern memory systems. HAWK is designed to process data at a constant rate of 32 GB/s—faster than most extant memory systems. I demonstrate that HAWK outperforms state-of-the-art software solutions for text processing, almost by an order of magnitude in many cases. HAWK occupies an area of 45 sq-mm in its pareto-optimal configuration and consumes 22 W of power, well within the area and power envelopes of modern CPU chips. The second accelerator I propose aims to speed up similarity measurement calculations for semantic search in the natural language processing space. By leveraging the latency hiding concepts of multi-threading and simple scheduling mechanisms, my design maximizes functional unit utilization. This similarity measurement accelerator provides speedups of 36x-42x over optimized software running on server-class cores, while requiring 56x-58x lower energy, and only 1.3% of the area.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/116712/1/prateekt_1.pd

    High-performance and hardware-aware computing: proceedings of the second International Workshop on New Frontiers in High-performance and Hardware-aware Computing (HipHaC\u2711), San Antonio, Texas, USA, February 2011 ; (in conjunction with HPCA-17)

    Get PDF
    High-performance system architectures are increasingly exploiting heterogeneity. The HipHaC workshop aims at combining new aspects of parallel, heterogeneous, and reconfigurable microprocessor technologies with concepts of high-performance computing and, particularly, numerical solution methods. Compute- and memory-intensive applications can only benefit from the full hardware potential if all features on all levels are taken into account in a holistic approach

    Mobile Crowd Sensing in Edge Computing Environment

    Get PDF
    abstract: The mobile crowdsensing (MCS) applications leverage the user data to derive useful information by data-driven evaluation of innovative user contexts and gathering of information at a high data rate. Such access to context-rich data can potentially enable computationally intensive crowd-sourcing applications such as tracking a missing person or capturing a highlight video of an event. Using snippets and pictures captured from multiple mobile phone cameras with specific contexts can improve the data acquired in such applications. These MCS applications require efficient processing and analysis to generate results in real time. A human user, mobile device and their interactions cause a change in context on the mobile device affecting the quality contextual data that is gathered. Usage of MCS data in real-time mobile applications is challenging due to the complex inter-relationship between: a) availability of context, context is available with the mobile phones and not with the cloud, b) cost of data transfer to remote cloud servers, both in terms of communication time and energy, and c) availability of local computational resources on the mobile phone, computation may lead to rapid battery drain or increased response time. The resource-constrained mobile devices need to offload some of their computation. This thesis proposes ContextAiDe an end-end architecture for data-driven distributed applications aware of human mobile interactions using Edge computing. Edge processing supports real-time applications by reducing communication costs. The goal is to optimize the quality and the cost of acquiring the data using a) modeling and prediction of mobile user contexts, b) efficient strategies of scheduling application tasks on heterogeneous devices including multi-core devices such as GPU c) power-aware scheduling of virtual machine (VM) applications in cloud infrastructure e.g. elastic VMs. ContextAiDe middleware is integrated into the mobile application via Android API. The evaluation consists of overheads and costs analysis in the scenario of ``perpetrator tracking" application on the cloud, fog servers, and mobile devices. LifeMap data sets containing actual sensor data traces from mobile devices are used to simulate the application run for large scale evaluation.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201
    • 

    corecore