163 research outputs found

    Coupling Memory and Computation for Locality Management

    Get PDF
    We articulate the need for managing (data) locality automatically rather than leaving it to the programmer, especially in parallel programming systems. To this end, we propose techniques for coupling tightly the computation (including the thread scheduler) and the memory manager so that data and computation can be positioned closely in hardware. Such tight coupling of computation and memory management is in sharp contrast with the prevailing practice of considering each in isolation. For example, memory-management techniques usually abstract the computation as an unknown "mutator", which is treated as a "black box". As an example of the approach, in this paper we consider a specific class of parallel computations, nested-parallel computations. Such computations dynamically create a nesting of parallel tasks. We propose a method for organizing memory as a tree of heaps reflecting the structure of the nesting. More specifically, our approach creates a heap for a task if it is separately scheduled on a processor. This allows us to couple garbage collection with the structure of the computation and the way in which it is dynamically scheduled on the processors. This coupling enables taking advantage of locality in the program by mapping it to the locality of the hardware. For example for improved locality a heap can be garbage collected immediately after its task finishes when the heap contents is likely in cache

    Performance Evaluation Analysis of Spark Streaming Backpressure for Data-Intensive Pipelines

    Get PDF
    A significant rise in the adoption of streaming applications has changed the decisionmaking processes in the last decade. This movement has led to the emergence of several Big Data technologies for in-memory processing, such as the systems Apache Storm, Spark, Heron, Samza, Flink, and others. Spark Streaming, a widespread open-source implementation, processes data-intensive applications that often require large amounts of memory. However, Spark Unified Memory Manager cannot properly manage sudden or intensive data surges and their related inmemory caching needs, resulting in performance and throughput degradation, high latency, a large number of garbage collection operations, out-of-memory issues, and data loss. This work presents a comprehensive performance evaluation of Spark Streaming backpressure to investigate the hypothesis that it could support data-intensive pipelines under specific pressure requirements. The results reveal that backpressure is suitable only for small and medium pipelines for stateless and stateful applications. Furthermore, it points out the Spark Streaming limitations that lead to in-memory-based issues for data-intensive pipelines and stateful applications. In addition, the work indicates potential solutions.N/

    Purdue Contribution of Fusion Simulation Program

    Full text link

    One-sided differentiability: a challenge for computer algebra systems

    Get PDF
    Computer Algebra Systems (CASs) are extremely powerful and widely used digital tools. Focusing on differentiation, CASs include a command that computes the derivative of functions in one variable (and also the partial derivative of functions in several variables). We will focus in this article on real-valued functions of one real variable. Since CASs usually compute the derivative of real-valued functions as a whole, the value of the computed derivative at points where the left derivative and the right derivative are different (that we will call conflicting points) should be something like "undefined", although this isn't always the case: the output could strongly differ depending on the chosen CAS. We have analysed and compared in this article how some well-known CASs behave when addressing differentiation at the conflicting points of five different functions chosen by the authors. Finally, the ability for calculating one-sided limits of CASs allows to directly compute the result in these cumbersome cases using the formal definition of one-sided derivative, which we have also analysed and compared for the selected CASs. Regarding teaching, this is an important issue, as it is a topic of Secondary Education and nowadays the use of CASs as an auxiliary digital tool for teaching mathematics is very common

    Code offloading in opportunistic computing

    Get PDF
    With the advent of cloud computing, applications are no longer tied to a single device, but they can be migrated to a high-performance machine located in a distant data center. The key advantage is the enhancement of performance and consequently, the users experience. This activity is commonly referred computational offloading and it has been strenuously investigated in the past years. The natural candidate for computational offloading is the cloud, but recent results point out the hidden costs of cloud reliance in terms of latency and energy; Cuervo et. al. illustrates the limitations on cloud-based computational offloading based on WANs latency times. The dissertation confirms the results of Cuervo et. al. and illustrates more use cases where the cloud may not be the right choice. This dissertation addresses the following question: is it possible to build a novel approach for offloading the computation that overcomes the limitations of the state-of-the-art? In other words, is it possible to create a computational offloading solution that is able to use local resources when the Cloud is not usable, and remove the strong bond with the local infrastructure? To this extent, I propose a novel paradigm for computation offloading named anyrun computing, whose goal is to use any piece of higher-end hardware (locally or remotely accessible) to offloading a portion of the application. With anyrun computing I removed the boundaries that tie the solution to an infrastructure by adding locally available devices to augment the chances to succeed in offloading. To achieve the goals of the dissertation it is fundamental to have a clear view of all the steps that take part in the offloading process. To this extent, I firstly provided a categorization of such activities combined with their interactions and assessed the impact on the system. The outcome of the analysis is the mapping to the problem to a combinatorial optimization problem that is notoriously known to be NP-Hard. There are a set of well-known approaches to solving such kind of problems, but in this scenario, they cannot be used because they require a global view that can be only maintained by a centralized infrastructure. Thus, local solutions are needed. Moving further, to empirically tackle the anyrun computing paradigm, I propose the anyrun computing framework (ARC), a novel software framework whose objective is to decide whether to offload or not to any resource-rich device willing to lend assistance is advantageous compared to local execution with respect to a rich array of performance dimensions. The core of ARC is the nference nodel which receives a rich set of information about the available remote devices from the SCAMPI opportunistic computing framework developed within the European project SCAMPI, and employs the information to profile a given device, in other words, it decides whether offloading is advantageous compared to local execution, i.e. whether it can reduce the local footprint compared to local execution in the dimensions of interest (CPU and RAM usage, execution time, and energy consumption). To empirically evaluate ARC I presented a set of experimental results on the cloud, cloudlet, and opportunistic domain. In the cloud domain, I used the state of the art in cloud solutions over a set of significant benchmark problems and with three WANs access technologies (i.e. 3G, 4G, and high-speed WAN). The main outcome is that the cloud is an appealing solution for a wide variety of problems, but there is a set of circumstances where the cloud performs poorly. Moreover, I have empirically shown the limitations of cloud-based approaches, specifically, In some circumstances, problems with high transmission costs tend to perform poorly, unless they have high computational needs. The second part of the evaluation is done in opportunistic/cloudlet scenarios where I used my custom-made testbed to compare ARC and MAUI, the state of the art in computation offloading. To this extent, I have performed two distinct experiments: the first with a cloudlet environment and the second with an opportunistic environment. The key outcome is that ARC virtually matches the performances of MAUI (in terms of energy savings) in cloudlet environment, but it improves them by a 50% to 60% in the opportunistic domain

    The Future of Information Sciences : INFuture2015 : e-Institutions – Openness, Accessibility, and Preservation

    Get PDF

    Acceleration of Astrophysical Simulations with Special Hardware

    Get PDF
    This work presents the raceSPH and raceGRAV accelerator libraries, designed to interface astrophysical simulations with special-purpose hardware. The raceSPH focuses on the acceleration of Smoothed Particle Hydrodynamics (SPH), a method for approximating force interactions in fluid dynamics. Accelerators used range from vectorizing units on the microprocessors to Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs), and speed-ups range from 1.2x to 28x when measured in a synthetic benchmark and from 6x to 19x when used inside astrophysical simulations, for a total wallclock time speed-up of 1.6x to 2.4x, close to the theoretical maximum of 2.5x. The raceGRAV library computes gravitational force with high accuracy and is designed to complement the GRAPE accelerator. In direct summation tests, it provides performance on par with vectorizing units of the processor and comparable to the GRAPE-6 when normalized against number of pipelines. For the development of these libraries, a set of supporting modules were developed, including a PCI driver for modern Linux kernel versions, an MPRACE library for the communication with FPGA boards and a bu er management library for the efficient handling of data transfers

    Validity contracts for software transactions

    Full text link
    Software Transactional Memory is a promising approach to concurrent programming, freeing programmers from error-prone concurrency control decisions that are complicated and not composable. But few such systems address consistencies of transactional objects. In this thesis, I propose a contract-based transactional programming model toward more secure transactional softwares. In this general model, a validity contract specifies both requirements and effects for transactions. Validity contracts bring numerous benefits including reasoning about and verifying transactional programs, detecting and resolving transactional conflicts, automating object revalidation and easing program debugging. I introduce an ownership-based framework, namely AVID, derived from the general model, using object ownership as a mechanism for specifying and reasoning validity contracts. I have specified a formal type system and implemented a prototype type checker to support static checking. I also have built a transactional library framework AVID, based on existing Java DSTM2 framework, for expressing transactions and validity contracts. Experimental results on a multi-core system show that contracts add little overheads to the original STM. I find that contract-aware contention management yields significant speedups in some cases. The results have suggested compiler directed optimisation for tunning contract-based transactional programs. My further work will investigate the applications of transaction contracts on various aspects of TM research such as hardware support and open-nesting
    • …
    corecore