24,244 research outputs found

    LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing

    Get PDF
    LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft

    Reliability-Aware Optimization of Approximate Computational Kernels with Rely

    Get PDF
    Emerging high-performance architectures are anticipated to contain unreliable components (e.g., ALUs) that offer low power consumption at the expense of soft errors. Some applications (such as multimedia processing, machine learning, and big data analytics) can often naturally tolerate soft errors and can therefore trade accuracy of their results for reduced energy consumption by utilizing these unreliable hardware components. We present and evaluate a technique for reliability-aware optimization of approximate computational kernel implementations. Our technique takes a standard implementation of a computation and automatically replaces some of its arithmetic operations with unreliable versions that consume less power, but may produce incorrect results with some probability. Our technique works with a developer-provided specification of the required reliability of a computation -- the probability that it returns the correct result -- and produces an unreliable implementation that satisfies that specification. We evaluate our approach on five applications from the image processing, numerical analysis, and financial analysis domains and demonstrate how our technique enables automatic exploration of the trade-off between the reliability of a computation and its performance

    SCADA System Testbed for Cybersecurity Research Using Machine Learning Approach

    Full text link
    This paper presents the development of a Supervisory Control and Data Acquisition (SCADA) system testbed used for cybersecurity research. The testbed consists of a water storage tank's control system, which is a stage in the process of water treatment and distribution. Sophisticated cyber-attacks were conducted against the testbed. During the attacks, the network traffic was captured, and features were extracted from the traffic to build a dataset for training and testing different machine learning algorithms. Five traditional machine learning algorithms were trained to detect the attacks: Random Forest, Decision Tree, Logistic Regression, Naive Bayes and KNN. Then, the trained machine learning models were built and deployed in the network, where new tests were made using online network traffic. The performance obtained during the training and testing of the machine learning models was compared to the performance obtained during the online deployment of these models in the network. The results show the efficiency of the machine learning models in detecting the attacks in real time. The testbed provides a good understanding of the effects and consequences of attacks on real SCADA environmentsComment: E-Preprin

    Aggressive and reliable high-performance architectures - techniques for thermal control, energy efficiency, and performance augmentation

    Get PDF
    As more and more transistors fit in a single chip, consumers of the electronics industry continue to expect decline in cost-per-function. Advancements in process technology offer steady improvements in system performance. The improvements manifest themselves as shrinking area, faster circuits and improved battery life. However, this migration toward sub-micro/nano-meter technologies presents a new set of challenges as the system becomes extremely sensitive to any voltage, temperature or process variations. One approach to immunize the system from the adverse effects of these variations is to add sufficient safety margins to the operating clock frequency of the system. Clearly, this approach is overly conservative because these worst case scenarios rarely occur. But, process technology in nanoscale era has already hit the power and frequency walls. Regardless of any of these challenges, the present processors not only need to run faster, but also cooler and use lesser energy. At a juncture where there is no further improvement in clock frequency is possible, data dependent latching through Timing Speculation (TS) provides a silver lining. Timing speculation is a widely known method for realizing better-than-worst-case systems. TS is aggressive in nature, where the mechanism is to dynamically tune the system frequency beyond the worst-case limits obtained from application characteristics to enhance the performance of system-on-chips (SoCs). However, such aggressive tuning has adverse consequences that need to be overcome. Power dissipation, on-chip temperature and reliability are key issues that cannot be ignored. A carefully designed power management technique combined with a reliable, controlled, aggressive clocking not only attempts to constrain power dissipation within a limit, but also improves performance whenever possible. In this dissertation, we present a novel power level switching mechanism by redefining the existing voltage-frequency pairs. We introduce an aggressive yet reliable framework for energy efficient thermal control. We were able to achieve up to 40% speed-up compared to a base scheme without overclocking. We compare our method against different schemes. We observe that up to 75% Energy-Delay squared product (ED2) savings relative to base architecture is possible. We showcase the loss of efficiency in present chip multiprocessor systems due to excess power supplied, and propose Utilization-aware Task Scheduling (UTS) - a power management scheme that increases energy efficiency of chip multiprocessors. Our experiments demonstrate that UTS along with aggressive timing speculation squeezes out maximum performance from the system without loss of efficiency, and breaching power & thermal constraints. From our evaluation we infer that UTS improves performance by up to 12% due to aggressive power level switching and over 50% in ED2 savings compared to traditional power management techniques. Aggressive clocking systems having TS as their central theme operate at a clock frequency range beyond specified safe limits, exploiting the data dependence on circuit critical paths. However, the margin for performance enhancement is restricted due to extreme difference between short paths and critical paths. In this thesis, we show that increasing the lengths of short paths of the circuit increases the margin of TS, leading to performance improvement in aggressively designed systems. We develop Min-arc algorithm to efficiently add delay buffers to selected short paths while keeping down the area penalty. We show that by using our algorithm, it is possible to increase the circuit contamination delay by up to 30% without affecting the propagation delay, with moderate area overhead. We also explore the possibility of increasing short path delays further by relaxing the constraint on propagation delay, and achieve even higher performance. Overall, we bring out the inter-relationship between power, temperature and reliability of aggressively clocked systems. Our main objective is to achieve maximal performance benefits and improved energy efficiency within thermal constraints by effectively combining dynamic frequency scaling, dynamic voltage scaling and reliable overclocking. We provide solutions to improve the existing power management in chip multiprocessors to dynamically maximize system utilization and satisfy the power constraints within safe thermal limits

    Proxy compilation for Java via a code migration technique

    Get PDF
    There is an increasing trend that intermediate representations (IRs) are used to deliver programs in more and more languages, such as Java. Although Java can provide many advantages, including a wider portability and better optimisation opportunities on execution, it introduces extra overhead by requiring an IR translation for the program execution. For maximum execution performance, an optimising compiler is placed in the runtime to selectively optimise code regions regarded as “hotspots”. This common approach has been effectively deployed in many implementation of programming languages. However, the computational resources demanded by this approach made it less efficient, or even difficult to deploy directly in a resourceconstrained environment. One implementation approach is to use a remote compilation technique to support compilation during the execution. The work presented in this dissertation supports the thesis that execution performance can be improved by the use of efficient optimising compilation by using a proxy dynamic optimising compiler. After surveying various approaches to the design and implementation of remote compilation, a proxy compilation system called Apus is defined. To demonstrate the effectiveness of using a dynamic optimising compiler as a proxy compiler, a complete proxy compilation system is written based on a research-oriented Java VirtualMachine (JVM). The proxy compilation system is discussed in detail, showing how to deliver remote binaries and manage a cache of binaries by using a code migration approach. The proxy compilation client shows how the proxy compilation service is integrated with the selective optimisation system to maximise execution performance. The results of empirical measurements of the system are given, showing the efficiency of code optimisation from either the proxy compilation service and a local binary cache. The conclusion of this work is that Java execution performance can be improved by efficient optimising compilation with a proxy compilation service by using a code migration technique

    System Support for Bandwidth Management and Content Adaptation in Internet Applications

    Full text link
    This paper describes the implementation and evaluation of an operating system module, the Congestion Manager (CM), which provides integrated network flow management and exports a convenient programming interface that allows applications to be notified of, and adapt to, changing network conditions. We describe the API by which applications interface with the CM, and the architectural considerations that factored into the design. To evaluate the architecture and API, we describe our implementations of TCP; a streaming layered audio/video application; and an interactive audio application using the CM, and show that they achieve adaptive behavior without incurring much end-system overhead. All flows including TCP benefit from the sharing of congestion information, and applications are able to incorporate new functionality such as congestion control and adaptive behavior.Comment: 14 pages, appeared in OSDI 200

    Dependable Computing on Inexact Hardware through Anomaly Detection.

    Full text link
    Reliability of transistors is on the decline as transistors continue to shrink in size. Aggressive voltage scaling is making the problem even worse. Scaled-down transistors are more susceptible to transient faults as well as permanent in-field hardware failures. In order to continue to reap the benefits of technology scaling, it has become imperative to tackle the challenges risen due to the decreasing reliability of devices for the mainstream commodity market. Along with the worsening reliability, achieving energy efficiency and performance improvement by scaling is increasingly providing diminishing marginal returns. More than any other time in history, the semiconductor industry faces the crossroad of unreliability and the need to improve energy efficiency. These challenges of technology scaling can be tackled by categorizing the target applications in the following two categories: traditional applications that have relatively strict correctness requirement on outputs and emerging class of soft applications, from various domains such as multimedia, machine learning, and computer vision, that are inherently inaccuracy tolerant to a certain degree. Traditional applications can be protected against hardware failures by low-cost detection and protection methods while soft applications can trade off quality of outputs to achieve better performance or energy efficiency. For traditional applications, I propose an efficient, software-only application analysis and transformation solution to detect data and control flow transient faults. The intelligence of the data flow solution lies in the use of dynamic application information such as control flow, memory and value profiling. The control flow protection technique achieves its efficiency by simplifying signature calculations in each basic block and by performing checking at a coarse-grain level. For soft applications, I develop a quality control technique. The quality control technique employs continuous, light-weight checkers to ensure that the approximation is controlled and application output is acceptable. Overall, I show that the use of low-cost checkers to produce dependable results on commodity systems---constructed from inexact hardware components---is efficient and practical.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/113341/1/dskhudia_1.pd

    Application acceleration for wireless and mobile data networks

    Get PDF
    This work studies application acceleration for wireless and mobile data networks. The problem of accelerating application can be addressed along multiple dimensions. The first dimension is advanced network protocol design, i.e., optimizing underlying network protocols, particulary transport layer protocol and link layer protocol. Despite advanced network protocol design, in this work we observe that certain application behaviors can fundamentally limit the performance achievable when operating over wireless and mobile data networks. The performance difference is caused by the complex application behaviors of these non-FTP applications. Explicitly dealing with application behaviors can improve application performance for new environments. Along this overcoming application behavior dimension, we accelerate applications by studying specific types of applications including Client-server, Peer-to-peer and Location-based applications. In exploring along this dimension, we identify a set of application behaviors that significantly affect application performance. To accommodate these application behaviors, we firstly extract general design principles that can apply to any applications whenever possible. These design principles can also be integrated into new application designs. We also consider specific applications by applying these design principles and build prototypes to demonstrate the effectiveness of the solutions. In the context of application acceleration, even though all the challenges belong to the two aforementioned dimensions of advanced network protocol design and overcoming application behavior are addressed, application performance can still be limited by the underlying network capability, particularly physical bandwidth. In this work, we study the possibility of speeding up data delivery by eliminating traffic redundancy present in application traffics. Specifically, we first study the traffic redundancy along multiple dimensions using traces obtained from multiple real wireless network deployments. Based on the insights obtained from the analysis, we propose Wireless Memory (WM), a two-ended AP-client solution to effectively exploit traffic redundancy in wireless and mobile environments. Application acceleration can be achieved along two other dimensions: network provision ing and quality of service (QoS). Network provisioning allocates network resources such as physical bandwidth or wireless spectrum, while QoS provides different priority to different applications, users, or data flows. These two dimensions have their respective limitations in the context of application acceleration. In this work, we focus on the two dimensions of overcoming application behavior and Eliminating traffic redundancy to improve application performance. The contribution of this work is as follows. First, we study the problem of application acceleration for wireless and mobile data networks, and we characterize the dimensions along which to address the problem. Second, we identify that application behaviors can significantly affect application performance, and we propose a set of design principles to deal with the behaviors. We also build prototypes to conduct system research. Third, we consider traffic redundancy elimination and propose a wireless memory approach.Ph.D.Committee Chair: Sivakumar, Raghupathy; Committee Member: Ammar, Mostafa; Committee Member: Fekri, Faramarz; Committee Member: Ji, Chuanyi; Committee Member: Ramachandran, Umakishor
    corecore