6,266 research outputs found

    Improving latency tolerance of multithreading through decoupling

    Get PDF
    The increasing hardware complexity of dynamically scheduled superscalar processors may compromise the scalability of this organization to make an efficient use of future increases in transistor budget. SMT processors, designed over a superscalar core, are therefore directly concerned by this problem. The article presents and evaluates a novel processor microarchitecture which combines two paradigms: simultaneous multithreading and access/execute decoupling. Since its decoupled units issue instructions in order, this architecture is significantly less complex, in terms of critical path delays, than a centralized out-of-order design, and it is more effective for future growth in issue-width and clock speed. We investigate how both techniques complement each other. Since decoupling features an excellent memory latency hiding efficiency, the large amount of parallelism exploited by multithreading may be used to hide the latency of functional units and keep them fully utilized. The study shows that, by adding decoupling to a multithreaded architecture, fewer threads are needed to achieve maximum throughput. Therefore, in addition to the obvious hardware complexity reduction, it places lower demands on the memory system. The study also reveals that multithreading by itself exhibits little memory latency tolerance. Results suggest that most of the latency hiding effectiveness of SMT architectures comes from the dynamic scheduling. On the other hand, decoupling is very effective at hiding memory latency. An increase in the cache miss penalty from 1 to 32 cycles reduces the performance of a 4-context multithreaded decoupled processor by less than 2 percent. For the nondecoupled multithreaded processor, the loss of performance is about 23 percent.Peer ReviewedPostprint (published version

    A deep reinforcement learning based homeostatic system for unmanned position control

    Get PDF
    Deep Reinforcement Learning (DRL) has been proven to be capable of designing an optimal control theory by minimising the error in dynamic systems. However, in many of the real-world operations, the exact behaviour of the environment is unknown. In such environments, random changes cause the system to reach different states for the same action. Hence, application of DRL for unpredictable environments is difficult as the states of the world cannot be known for non-stationary transition and reward functions. In this paper, a mechanism to encapsulate the randomness of the environment is suggested using a novel bio-inspired homeostatic approach based on a hybrid of Receptor Density Algorithm (an artificial immune system based anomaly detection application) and a Plastic Spiking Neuronal model. DRL is then introduced to run in conjunction with the above hybrid model. The system is tested on a vehicle to autonomously re-position in an unpredictable environment. Our results show that the DRL based process control raised the accuracy of the hybrid model by 32%.N/

    The 1990 progress report and future plans

    Get PDF
    This document describes the progress and plans of the Artificial Intelligence Research Branch (RIA) at ARC in 1990. Activities span a range from basic scientific research to engineering development and to fielded NASA applications, particularly those applications that are enabled by basic research carried out at RIA. Work is conducted in-house and through collaborative partners in academia and industry. Our major focus is on a limited number of research themes with a dual commitment to technical excellence and proven applicability to NASA short, medium, and long-term problems. RIA acts as the Agency's lead organization for research aspects of artificial intelligence, working closely with a second research laboratory at JPL and AI applications groups at all NASA centers

    Decompose and Conquer: Addressing Evasive Errors in Systems on Chip

    Full text link
    Modern computer chips comprise many components, including microprocessor cores, memory modules, on-chip networks, and accelerators. Such system-on-chip (SoC) designs are deployed in a variety of computing devices: from internet-of-things, to smartphones, to personal computers, to data centers. In this dissertation, we discuss evasive errors in SoC designs and how these errors can be addressed efficiently. In particular, we focus on two types of errors: design bugs and permanent faults. Design bugs originate from the limited amount of time allowed for design verification and validation. Thus, they are often found in functional features that are rarely activated. Complete functional verification, which can eliminate design bugs, is extremely time-consuming, thus impractical in modern complex SoC designs. Permanent faults are caused by failures of fragile transistors in nano-scale semiconductor manufacturing processes. Indeed, weak transistors may wear out unexpectedly within the lifespan of the design. Hardware structures that reduce the occurrence of permanent faults incur significant silicon area or performance overheads, thus they are infeasible for most cost-sensitive SoC designs. To tackle and overcome these evasive errors efficiently, we propose to leverage the principle of decomposition to lower the complexity of the software analysis or the hardware structures involved. To this end, we present several decomposition techniques, specific to major SoC components. We first focus on microprocessor cores, by presenting a lightweight bug-masking analysis that decomposes a program into individual instructions to identify if a design bug would be masked by the program's execution. We then move to memory subsystems: there, we offer an efficient memory consistency testing framework to detect buggy memory-ordering behaviors, which decomposes the memory-ordering graph into small components based on incremental differences. We also propose a microarchitectural patching solution for memory subsystem bugs, which augments each core node with a small distributed programmable logic, instead of including a global patching module. In the context of on-chip networks, we propose two routing reconfiguration algorithms that bypass faulty network resources. The first computes short-term routes in a distributed fashion, localized to the fault region. The second decomposes application-aware routing computation into simple routing rules so to quickly find deadlock-free, application-optimized routes in a fault-ridden network. Finally, we consider general accelerator modules in SoC designs. When a system includes many accelerators, there are a variety of interactions among them that must be verified to catch buggy interactions. To this end, we decompose such inter-module communication into basic interaction elements, which can be reassembled into new, interesting tests. Overall, we show that the decomposition of complex software algorithms and hardware structures can significantly reduce overheads: up to three orders of magnitude in the bug-masking analysis and the application-aware routing, approximately 50 times in the routing reconfiguration latency, and 5 times on average in the memory-ordering graph checking. These overhead reductions come with losses in error coverage: 23% undetected bug-masking incidents, 39% non-patchable memory bugs, and occasionally we overlook rare patterns of multiple faults. In this dissertation, we discuss the ideas and their trade-offs, and present future research directions.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147637/1/doowon_1.pd

    Correct-by-Construction Development of Dynamic Topology Control Algorithms

    Get PDF
    Wireless devices are influencing our everyday lives today and will even more so in the future. A wireless sensor network (WSN) consists of dozens to hundreds of small, cheap, battery-powered, resource-constrained sensor devices (motes) that cooperate to serve a common purpose. These networks are applied in safety- and security-critical areas (e.g., e-health, intrusion detection). The topology of such a system is an attributed graph consisting of nodes representing the devices and edges representing the communication links between devices. Topology control (TC) improves the energy consumption behavior of a WSN by blocking costly links. This allows a mote to reduce its transmission power. A TC algorithm must fulfill important consistency properties (e.g., that the resulting topology is connected). The traditional development process for TC algorithms only considers consistency properties during the initial specification phase. The actual implementation is carried out manually, which is error prone and time consuming. Thus, it is difficult to verify that the implementation fulfills the required consistency properties. The problem becomes even more severe if the development process is iterative. Additionally, many TC algorithms are batch algorithms, which process the entire topology, irrespective of the extent of the topology modifications since the last execution. Therefore, dynamic TC is desirable, which reacts to change events of the topology. In this thesis, we propose a model-driven correct-by-construction methodology for developing dynamic TC algorithms. We model local consistency properties using graph constraints and global consistency properties using second-order logic. Graph transformation rules capture the different types of topology modifications. To specify the control flow of a TC algorithm, we employ the programmed graph transformation language story-driven modeling. We presume that local consistency properties jointly imply the global consistency properties. We ensure the fulfillment of the local consistency properties by synthesizing weakest preconditions for each rule. The synthesized preconditions prohibit the application of a rule if and only if the application would lead to a violation of a consistency property. Still, this restriction is infeasible for topology modifications that need to be executed in any case. Therefore, as a major contribution of this thesis, we propose the anticipation loop synthesis algorithm, which transforms the synthesized preconditions into routines that anticipate all violations of these preconditions. This algorithm also enables the correct-by-construction runtime reconfiguration of adaptive WSNs. We provide tooling for both common evaluation steps. Cobolt allows to evaluate the specified TC algorithms rapidly using the network simulator Simonstrator. cMoflon generates embedded C code for hardware testbeds that build on the sensor operating system Contiki

    The development of CFD methods for rotor applications

    Get PDF
    The optimum design of the advancing helicopter rotor for high-speed forward flight always involves a tradeoff between transonic and stall limitations. However, the preoccupation of the rotor industry was primarily concerned with stall until well into the 1970s. This emphasis on stall resulted from the prevalent use of low-solidity rotors with rather outdated airfoil sections. The use of cambered airfoil sections and higher-solidity rotors substantially reduced stall and revealed the advancing transonic flow to be a more persistent limitation to high-speed rotor performance. Work in this area was spurred not only by operational necessity but also by the development of a tool for the prediction of these flows (the method of computational fluid dynamics). The development of computational fluid dynamics for these rotor problems was a major Army and NASA achievement. This work is now being extended to other rotor flow problems. The developments are outlined

    Arithmetic coding revisited

    Get PDF
    Over the last decade, arithmetic coding has emerged as an important compression tool. It is now the method of choice for adaptive coding on multisymbol alphabets because of its speed, low storage requirements, and effectiveness of compression. This article describes a new implementation of arithmetic coding that incorporates several improvements over a widely used earlier version by Witten, Neal, and Cleary, which has become a de facto standard. These improvements include fewer multiplicative operations, greatly extended range of alphabet sizes and symbol probabilities, and the use of low-precision arithmetic, permitting implementation by fast shift/add operations. We also describe a modular structure that separates the coding, modeling, and probability estimation components of a compression system. To motivate the improved coder, we consider the needs of a word-based text compression program. We report a range of experimental results using this and other models. Complete source code is available
    corecore