4,027 research outputs found

    Adaptive Knobs for Resource Efficient Computing

    Get PDF
    Performance demands of emerging domains such as artificial intelligence, machine learning and vision, Internet-of-things etc., continue to grow. Meeting such requirements on modern multi/many core systems with higher power densities, fixed power and energy budgets, and thermal constraints exacerbates the run-time management challenge. This leaves an open problem on extracting the required performance within the power and energy limits, while also ensuring thermal safety. Existing architectural solutions including asymmetric and heterogeneous cores and custom acceleration improve performance-per-watt in specific design time and static scenarios. However, satisfying applicationsโ€™ performance requirements under dynamic and unknown workload scenarios subject to varying system dynamics of power, temperature and energy requires intelligent run-time management. Adaptive strategies are necessary for maximizing resource efficiency, considering i) diverse requirements and characteristics of concurrent applications, ii) dynamic workload variation, iii) core-level heterogeneity and iv) power, thermal and energy constraints. This dissertation proposes such adaptive techniques for efficient run-time resource management to maximize performance within fixed budgets under unknown and dynamic workload scenarios. Resource management strategies proposed in this dissertation comprehensively consider application and workload characteristics and variable effect of power actuation on performance for pro-active and appropriate allocation decisions. Specific contributions include i) run-time mapping approach to improve power budgets for higher throughput, ii) thermal aware performance boosting for efficient utilization of power budget and higher performance, iii) approximation as a run-time knob exploiting accuracy performance trade-offs for maximizing performance under power caps at minimal loss of accuracy and iv) co-ordinated approximation for heterogeneous systems through joint actuation of dynamic approximation and power knobs for performance guarantees with minimal power consumption. The approaches presented in this dissertation focus on adapting existing mapping techniques, performance boosting strategies, software and dynamic approximations to meet the performance requirements, simultaneously considering system constraints. The proposed strategies are compared against relevant state-of-the-art run-time management frameworks to qualitatively evaluate their efficacy

    Optimizing energy-efficiency for multi-core packet processing systems in a compiler framework

    Get PDF
    Network applications become increasingly computation-intensive and the amount of traffic soars unprecedentedly nowadays. Multi-core and multi-threaded techniques are thus widely employed in packet processing system to meet the changing requirement. However, the processing power cannot be fully utilized without a suitable programming environment. The compilation procedure is decisive for the quality of the code. It can largely determine the overall system performance in terms of packet throughput, individual packet latency, core utilization and energy efficiency. The thesis investigated compilation issues in networking domain first, particularly on energy consumption. And as a cornerstone for any compiler optimizations, a code analysis module for collecting program dependency is presented and incorporated into a compiler framework. With that dependency information, a strategy based on graph bi-partitioning and mapping is proposed to search for an optimal configuration in a parallel-pipeline fashion. The energy-aware extension is specifically effective in enhancing the energy-efficiency of the whole system. Finally, a generic evaluation framework for simulating the performance and energy consumption of a packet processing system is given. It accepts flexible architectural configuration and is capable of performingarbitrary code mapping. The simulation time is extremely short compared to full-fledged simulators. A set of our optimization results is gathered using the framework

    Doctor of Philosophy

    Get PDF
    dissertationThe embedded system space is characterized by a rapid evolution in the complexity and functionality of applications. In addition, the short time-to-market nature of the business motivates the use of programmable devices capable of meeting the conflicting constraints of low-energy, high-performance, and short design times. The keys to achieving these conflicting constraints are specialization and maximally extracting available application parallelism. General purpose processors are flexible but are either too power hungry or lack the necessary performance. Application-specific integrated circuits (ASICS) efficiently meet the performance and power needs but are inflexible. Programmable domain-specific architectures (DSAs) are an attractive middle ground, but their design requires significant time, resources, and expertise in a variety of specialties, which range from application algorithms to architecture and ultimately, circuit design. This dissertation presents CoGenE, a design framework that automates the design of energy-performance-optimal DSAs for embedded systems. For a given application domain and a user-chosen initial architectural specification, CoGenE consists of a a Compiler to generate execution binary, a simulator Generator to collect performance/energy statistics, and an Explorer that modifies the current architecture to improve energy-performance-area characteristics. The above process repeats automatically until the user-specified constraints are achieved. This removes or alleviates the time needed to understand the application, manually design the DSA, and generate object code for the DSA. Thus, CoGenE is a new design methodology that represents a significant improvement in performance, energy dissipation, design time, and resources. This dissertation employs the face recognition domain to showcase a flexible architectural design methodology that creates "ASIC-like" DSAs. The DSAs are instruction set architecture (ISA)-independent and achieve good energy-performance characteristics by coscheduling the often conflicting constraints of data access, data movement, and computation through a flexible interconnect. This represents a significant increase in programming complexity and code generation time. To address this problem, the CoGenE compiler employs integer linear programming (ILP)-based 'interconnect-aware' scheduling techniques for automatic code generation. The CoGenE explorer employs an iterative technique to search the complete design space and select a set of energy-performance-optimal candidates. When compared to manual designs, results demonstrate that CoGenE produces superior designs for three application domains: face recognition, speech recognition and wireless telephony. While CoGenE is well suited to applications that exhibit a streaming behavior, multithreaded applications like ray tracing present a different but important challenge. To demonstrate its generality, CoGenE is evaluated in designing a novel multicore N-wide SIMD architecture, known as StreamRay, for the ray tracing domain. CoGenE is used to synthesize the SIMD execution cores, the compiler that generates the application binary, and the interconnection subsystem. Further, separating address and data computations in space reduces data movement and contention for resources, thereby significantly improving performance compared to existing ray tracing approaches

    Improving efficiency and resilience in large-scale computing systems through analytics and data-driven management

    Full text link
    Applications running in large-scale computing systems such as high performance computing (HPC) or cloud data centers are essential to many aspects of modern society, from weather forecasting to financial services. As the number and size of data centers increase with the growing computing demand, scalable and efficient management becomes crucial. However, data center management is a challenging task due to the complex interactions between applications, middleware, and hardware layers such as processors, network, and cooling units. This thesis claims that to improve robustness and efficiency of large-scale computing systems, significantly higher levels of automated support than what is available in today's systems are needed, and this automation should leverage the data continuously collected from various system layers. Towards this claim, we propose novel methodologies to automatically diagnose the root causes of performance and configuration problems and to improve efficiency through data-driven system management. We first propose a framework to diagnose software and hardware anomalies that cause undesired performance variations in large-scale computing systems. We show that by training machine learning models on resource usage and performance data collected from servers, our approach successfully diagnoses 98% of the injected anomalies at runtime in real-world HPC clusters with negligible computational overhead. We then introduce an analytics framework to address another major source of performance anomalies in cloud data centers: software misconfigurations. Our framework discovers and extracts configuration information from cloud instances such as containers or virtual machines. This is the first framework to provide comprehensive visibility into software configurations in multi-tenant cloud platforms, enabling systematic analysis for validating the correctness of software configurations. This thesis also contributes to the design of robust and efficient system management methods that leverage continuously monitored resource usage data. To improve performance under power constraints, we propose a workload- and cooling-aware power budgeting algorithm that distributes the available power among servers and cooling units in a data center, achieving up to 21% improvement in throughput per Watt compared to the state-of-the-art. Additionally, we design a network- and communication-aware HPC workload placement policy that reduces communication overhead by up to 30% in terms of hop-bytes compared to existing policies.2019-07-02T00:00:00

    ์‹ฌ์ธตํ•™์Šต์„ ์ด์šฉํ•œ ์•ก์ฒด๊ณ„์˜ ์„ฑ์งˆ ์˜ˆ์ธก

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ(๋ฐ•์‚ฌ)--์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› :์ž์—ฐ๊ณผํ•™๋Œ€ํ•™ ํ™”ํ•™๋ถ€,2020. 2. ์ •์—ฐ์ค€.์ตœ๊ทผ ๊ธฐ๊ณ„ํ•™์Šต ๊ธฐ์ˆ ์˜ ๊ธ‰๊ฒฉํ•œ ๋ฐœ์ „๊ณผ ์ด์˜ ํ™”ํ•™ ๋ถ„์•ผ์— ๋Œ€ํ•œ ์ ์šฉ์€ ๋‹ค์–‘ํ•œ ํ™”ํ•™์  ์„ฑ์งˆ์— ๋Œ€ํ•œ ๊ตฌ์กฐ-์„ฑ์งˆ ์ •๋Ÿ‰ ๊ด€๊ณ„๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ์˜ˆ์ธก ๋ชจํ˜•์˜ ๊ฐœ๋ฐœ์„ ๊ฐ€์†ํ•˜๊ณ  ์žˆ๋‹ค. ์šฉ๋งคํ™” ์ž์œ  ์—๋„ˆ์ง€๋Š” ๊ทธ๋Ÿฌํ•œ ๊ธฐ๊ณ„ํ•™์Šต์˜ ์ ์šฉ ์˜ˆ์ค‘ ํ•˜๋‚˜์ด๋ฉฐ ๋‹ค์–‘ํ•œ ์šฉ๋งค ๋‚ด์˜ ํ™”ํ•™๋ฐ˜์‘์—์„œ ์ค‘์š”ํ•œ ์—ญํ• ์„ ํ•˜๋Š” ๊ทผ๋ณธ์  ์„ฑ์งˆ ์ค‘ ํ•˜๋‚˜์ด๋‹ค. ๋ณธ ์—ฐ๊ตฌ์—์„œ ์šฐ๋ฆฌ๋Š” ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์šฉ๋งคํ™” ์ž์œ  ์—๋„ˆ์ง€๋ฅผ ์›์ž๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์œผ๋กœ๋ถ€ํ„ฐ ๊ตฌํ•  ์ˆ˜ ์žˆ๋Š” ์ƒˆ๋กœ์šด ์‹ฌ์ธตํ•™์Šต ๊ธฐ๋ฐ˜ ์šฉ๋งคํ™” ๋ชจํ˜•์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ œ์•ˆ๋œ ์‹ฌ์ธตํ•™์Šต ๋ชจํ˜•์˜ ๊ณ„์‚ฐ ๊ณผ์ •์€ ์šฉ๋งค์™€ ์šฉ์งˆ ๋ถ„์ž์— ๋Œ€ํ•œ ๋ถ€ํ˜ธํ™” ํ•จ์ˆ˜๊ฐ€ ๊ฐ ์›์ž์™€ ๋ถ„์ž๋“ค์˜ ๊ตฌ์กฐ์  ์„ฑ์งˆ์— ๋Œ€ํ•œ ๋ฒกํ„ฐ ํ‘œํ˜„์„ ์ถ”์ถœํ•˜๋ฉฐ, ์ด๋ฅผ ํ† ๋Œ€๋กœ ์›์ž๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์„ ๋ณต์žกํ•œ ํผ์…‰ํŠธ๋ก  ์‹ ๊ฒฝ๋ง ๋Œ€์‹  ๋ฒกํ„ฐ๊ฐ„์˜ ๊ฐ„๋‹จํ•œ ๋‚ด์ ์œผ๋กœ ๊ตฌํ•  ์ˆ˜ ์žˆ๋‹ค. 952๊ฐ€์ง€์˜ ์œ ๊ธฐ์šฉ์งˆ๊ณผ 147๊ฐ€์ง€์˜ ์œ ๊ธฐ์šฉ๋งค๋ฅผ ํฌํ•จํ•˜๋Š” 6,493๊ฐ€์ง€์˜ ์‹คํ—˜์น˜๋ฅผ ํ† ๋Œ€๋กœ ๊ธฐ๊ณ„ํ•™์Šต ๋ชจํ˜•์˜ ๊ต์ฐจ ๊ฒ€์ฆ ์‹œํ—˜์„ ์‹ค์‹œํ•œ ๊ฒฐ๊ณผ, ํ‰๊ท  ์ ˆ๋Œ€ ์˜ค์ฐจ ๊ธฐ์ค€ 0.2 kcal/mol ์ˆ˜์ค€์œผ๋กœ ๋งค์šฐ ๋†’์€ ์ •ํ™•๋„๋ฅผ ๊ฐ€์ง„๋‹ค. ์Šค์บํด๋“œ-๊ธฐ๋ฐ˜ ๊ต์ฐจ ๊ฒ€์ฆ์˜ ๊ฒฐ๊ณผ ์—ญ์‹œ 0.6 kcal/mol ์ˆ˜์ค€์œผ๋กœ, ์™ธ์‚ฝ์œผ๋กœ ๋ถ„๋ฅ˜ํ•  ์ˆ˜ ์žˆ๋Š” ๋น„๊ต์  ์ƒˆ๋กœ์šด ๋ถ„์ž ๊ตฌ์กฐ์— ๋Œ€ํ•œ ์˜ˆ์ธก์— ๋Œ€ํ•ด์„œ๋„ ์šฐ์ˆ˜ํ•œ ์ •ํ™•๋„๋ฅผ ๋ณด์ธ๋‹ค. ๋˜ํ•œ, ์ œ์•ˆ๋œ ํŠน์ • ๊ธฐ๊ณ„ํ•™์Šต ๋ชจํ˜•์€ ๊ทธ ๊ตฌ์กฐ ์ƒ ํŠน์ • ์šฉ๋งค์— ํŠนํ™”๋˜์ง€ ์•Š์•˜๊ธฐ ๋•Œ๋ฌธ์— ๋†’์€ ์–‘๋„์„ฑ์„ ๊ฐ€์ง€๋ฉฐ ํ•™์Šต์— ์ด์šฉํ•  ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๋ฅผ ๋Š˜์ด๋Š” ๋ฐ ์šฉ์ดํ•˜๋‹ค. ์›์ž๊ฐ„ ์ƒํ˜ธ์ž‘์šฉ์— ๋Œ€ํ•œ ๋ถ„์„์„ ํ†ตํ•ด ์ œ์•ˆ๋œ ์‹ฌ์ธตํ•™์Šต ๋ชจํ˜• ์šฉ๋งคํ™” ์ž์œ  ์—๋„ˆ์ง€์— ๋Œ€ํ•œ ๊ทธ๋ฃน-๊ธฐ์—ฌ๋„๋ฅผ ์ž˜ ์žฌํ˜„ํ•  ์ˆ˜ ์žˆ์Œ์„ ์•Œ ์ˆ˜ ์žˆ์œผ๋ฉฐ, ๊ธฐ๊ณ„ํ•™์Šต์„ ํ†ตํ•ด ๋‹จ์ˆœํžˆ ๋ชฉํ‘œ๋กœ ํ•˜๋Š” ์„ฑ์งˆ๋งŒ์„ ์˜ˆ์ธกํ•˜๋Š” ๊ฒƒ์„ ๋„˜์–ด ๋”์šฑ ์ƒ์„ธํ•œ ๋ฌผ๋ฆฌํ™”ํ•™์  ์ดํ•ด๋ฅผ ํ•˜๋Š” ๊ฒƒ์ด ๊ฐ€๋Šฅํ•  ๊ฒƒ์ด๋ผ ๊ธฐ๋Œ€ํ•  ์ˆ˜ ์žˆ๋‹ค.Recent advances in machine learning technologies and their chemical applications lead to the developments of diverse structure-property relationship based prediction models for various chemical properties; the free energy of solvation is one of them and plays a dominant role as a fundamental measure of solvation chemistry. Here, we introduce a novel machine learning-based solvation model, which calculates the target solvation free energy from pairwise atomistic interactions. The novelty of our proposed solvation model involves rather simple architecture: two encoding function extracts vector representations of the atomic and the molecular features from the given chemical structure, while the inner product between two atomistic features calculates their interactions, instead of black-boxed perceptron networks. The cross-validation result on 6,493 experimental measurements for 952 organic solutes and 147 organic solvents achieves an outstanding performance, which is 0.2 kcal/mol in MUE. The scaffold-based split method exhibits 0.6 kcal/mol, which shows that the proposed model guarantees reasonable accuracy even for extrapolated cases. Moreover, the proposed model shows an excellent transferability for enlarging training data due to its solvent-non-specific nature. Analysis of the atomistic interaction map shows there is a great potential that our proposed model reproduces group contributions on the solvation energy, which makes us believe that the proposed model not only provides the predicted target property, but also gives us more detailed physicochemical insights.1. Introduction 1 2. Delfos: Deep Learning Model for Prediction of Solvation Free Energies in Generic Organic Solvents 7 2.1. Methods 7 2.1.1. Embedding of Chemical Contexts 7 2.1.2. Encoder-Predictor Network 9 2.2. Results and Discussions 13 2.2.1. Computational Setup and Results 13 2.2.2. Transferability of the Model for New Compounds 17 2.2.3. Visualization of Attention Mechanism 26 3. Group Contribution Method for the Solvation Energy Estimation with Vector Representations of Atom 29 3.1. Model Description 29 3.1.1. Word Embedding 29 3.1.2. Network Architecture 33 3.2. Results and Discussions 39 3.2.1. Computational Details 39 3.2.2. Prediction Accuracy 42 3.2.3. Model Transferability 44 3.2.4. Group Contributions of Solvation Energy 49 4. Empirical Structure-Property Relationship Model for Liquid Transport Properties 55 5. Concluding Remarks 61 A. Analyzing Kinetic Trapping as a First-Order Dynamical Phase Transition in the Ensemble of Stochastic Trajectories 65 A1. Introduction 65 A2. Theory 68 A3. Lattice Gas Model 70 A4. Mathematical Model 73 A5. Dynamical Phase Transitions 75 A6. Conclusion 82 B. Reaction-Path Thermodynamics of the Michaelis-Menten Kinetics 85 B1. Introduction 85 B2. Reaction Path Thermodynamics 88 B3. Fixed Observation Time 94 B4. Conclusions 101Docto

    A method for mapping between ASMs and implementation language

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 193-196).One of the challenges of model-based engineering is traceability: the ability to relate the set of models developed during the design stages to the implemented system. This thesis develops a language specific method for creating bidirectional traceability, a mapping between model and implementation, suitable for tracing requirements from model through implementation and vice versa. The mapping is created as a byproduct of code generation and reverse engineering, and can be used to subsequently synchronize changes between the model and implementation. The creation of the mapping is specifically demonstrated through generating Java code from an abstract state machine (ASM) based modeling language, called the Timed Abstract State Machine (TASM) language. This code generation process involves a series of three transformations. The first transformation creates a specialised System Dependency Graph (SDG) called a TASM SDG from a TASM specification.(cont.) The second uses Triple Graph Grammars to transform the TASM SDG to a Java SDG (JSDG). The applied grammars are saved as the mapping information. The third transformation procedurally generates Java code. In order to make this methodology possible, this thesis introduces the TASM SDG, as well as a novel algorithm, generally applicable to ASM languages, that explicates state transitions. The approach presented extends the bidirectional traceability capabilities inherent in the TASM language to Java. The code generation technique is demonstrated using an industrial case study from the automotive domain, an Electronic Throttle Controller (ETC).by David Cheng-Ping Wang.S.M

    Graph Pattern Matching on Symmetric Multiprocessor Systems

    Get PDF
    Graph-structured data can be found in nearly every aspect of today's world, be it road networks, social networks or the internet itself. From a processing perspective, finding comprehensive patterns in graph-structured data is a core processing primitive in a variety of applications, such as fraud detection, biological engineering or social graph analytics. On the hardware side, multiprocessor systems, that consist of multiple processors in a single scale-up server, are the next important wave on top of multi-core systems. In particular, symmetric multiprocessor systems (SMP) are characterized by the fact, that each processor has the same architecture, e.g. every processor is a multi-core and all multiprocessors share a common and huge main memory space. Moreover, large SMPs will feature a non-uniform memory access (NUMA), whose impact on the design of efficient data processing concepts should not be neglected. The efficient usage of SMP systems, that still increase in size, is an interesting and ongoing research topic. Current state-of-the-art architectural design principles provide different and in parts disjunct suggestions on which data should be partitioned and or how intra-process communication should be realized. In this thesis, we propose a new synthesis of four of the most well-known principles Shared Everything, Partition Serial Execution, Data Oriented Architecture and Delegation, to create the NORAD architecture, which stands for NUMA-aware DORA with Delegation. We built our research prototype called NeMeSys on top of the NORAD architecture to fully exploit the provided hardware capacities of SMPs for graph pattern matching. Being an in-memory engine, NeMeSys allows for online data ingestion as well as online query generation and processing through a terminal based user interface. Storing a graph on a NUMA system inherently requires data partitioning to cope with the mentioned NUMA effect. Hence, we need to dissect the graph into a disjunct set of partitions, which can then be stored on the individual memory domains. This thesis analyzes the capabilites of the NORAD architecture, to perform scalable graph pattern matching on SMP systems. To increase the systems performance, we further develop, integrate and evaluate suitable optimization techniques. That is, we investigate the influence of the inherent data partitioning, the interplay of messaging with and without sufficient locality information and the actual partition placement on any NUMA socket in the system. To underline the applicability of our approach, we evaluate NeMeSys against synthetic datasets and perform an end-to-end evaluation of the whole system stack on the real world knowledge graph of Wikidata
    • โ€ฆ
    corecore