554 research outputs found

    Locality-aware cache random replacement policies

    Get PDF
    Measurement-Based Probabilistic Timing Analysis (MBPTA) facilitates the analysis of complex software running on hardware comprising high-performance features. MBPTA also aims at preventing additional analysis costs for timing analysis techniques and preserving the confidence on derived WCET estimates. Cache behavior has a deep influence on WCET estimates and hence on “the amount of software” that can be consolidated onto a single hardware platform. Deterministic replacement policies such as LRU (Least Recently Used) and NMRU (Non-Most Recently Used) have systematic pathological cases that may lead to high execution times and WCET estimates. Instead, random replacement (RR) decreases pathological cases probability, at the cost of temporal locality. We present two new MBPTA-amenable replacement policies that completely remove the presented pathological cases. The first policy, Random Permutations (RP) preserves higher temporal locality than RR; while the second, NMRU Random Permutations (NMRURP), also protects the Most Recently Used line from eviction. Both proposed policies build upon restricted random replacement choices. Our simulation evaluation (validated against a real prototype) using the Mälardalen benchmarks and a case study shows that RP and NMRURP deliver both high average performance (within 1% of LRUs and NRMU performance) and tight WCET estimates 11% and 24% lower than those of RR.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 772773) and the HiPEACH Network of Excellence. Pedro Benedicte and Jaume Abella have been partially supported by the MINECO under FPU15/01394 grant and Ramon y Cajal postdoctoral fellowship number RYC- 2019-14717 respectively.Peer ReviewedPostprint (author's final draft

    Decision tree learning for intelligent mobile robot navigation

    Get PDF
    The replication of human intelligence, learning and reasoning by means of computer algorithms is termed Artificial Intelligence (Al) and the interaction of such algorithms with the physical world can be achieved using robotics. The work described in this thesis investigates the applications of concept learning (an approach which takes its inspiration from biological motivations and from survival instincts in particular) to robot control and path planning. The methodology of concept learning has been applied using learning decision trees (DTs) which induce domain knowledge from a finite set of training vectors which in turn describe systematically a physical entity and are used to train a robot to learn new concepts and to adapt its behaviour. To achieve behaviour learning, this work introduces the novel approach of hierarchical learning and knowledge decomposition to the frame of the reactive robot architecture. Following the analogy with survival instincts, the robot is first taught how to survive in very simple and homogeneous environments, namely a world without any disturbances or any kind of "hostility". Once this simple behaviour, named a primitive, has been established, the robot is trained to adapt new knowledge to cope with increasingly complex environments by adding further worlds to its existing knowledge. The repertoire of the robot behaviours in the form of symbolic knowledge is retained in a hierarchy of clustered decision trees (DTs) accommodating a number of primitives. To classify robot perceptions, control rules are synthesised using symbolic knowledge derived from searching the hierarchy of DTs. A second novel concept is introduced, namely that of multi-dimensional fuzzy associative memories (MDFAMs). These are clustered fuzzy decision trees (FDTs) which are trained locally and accommodate specific perceptual knowledge. Fuzzy logic is incorporated to deal with inherent noise in sensory data and to merge conflicting behaviours of the DTs. In this thesis, the feasibility of the developed techniques is illustrated in the robot applications, their benefits and drawbacks are discussed

    Patmos: a time-predictable microprocessor

    Get PDF

    Analog Spiking Neuromorphic Circuits and Systems for Brain- and Nanotechnology-Inspired Cognitive Computing

    Get PDF
    Human society is now facing grand challenges to satisfy the growing demand for computing power, at the same time, sustain energy consumption. By the end of CMOS technology scaling, innovations are required to tackle the challenges in a radically different way. Inspired by the emerging understanding of the computing occurring in a brain and nanotechnology-enabled biological plausible synaptic plasticity, neuromorphic computing architectures are being investigated. Such a neuromorphic chip that combines CMOS analog spiking neurons and nanoscale resistive random-access memory (RRAM) using as electronics synapses can provide massive neural network parallelism, high density and online learning capability, and hence, paves the path towards a promising solution to future energy-efficient real-time computing systems. However, existing silicon neuron approaches are designed to faithfully reproduce biological neuron dynamics, and hence they are incompatible with the RRAM synapses, or require extensive peripheral circuitry to modulate a synapse, and are thus deficient in learning capability. As a result, they eliminate most of the density advantages gained by the adoption of nanoscale devices, and fail to realize a functional computing system. This dissertation describes novel hardware architectures and neuron circuit designs that synergistically assemble the fundamental and significant elements for brain-inspired computing. Versatile CMOS spiking neurons that combine integrate-and-fire, passive dense RRAM synapses drive capability, dynamic biasing for adaptive power consumption, in situ spike-timing dependent plasticity (STDP) and competitive learning in compact integrated circuit modules are presented. Real-world pattern learning and recognition tasks using the proposed architecture were demonstrated with circuit-level simulations. A test chip was implemented and fabricated to verify the proposed CMOS neuron and hardware architecture, and the subsequent chip measurement results successfully proved the idea. The work described in this dissertation realizes a key building block for large-scale integration of spiking neural network hardware, and then, serves as a step-stone for the building of next-generation energy-efficient brain-inspired cognitive computing systems

    A Survey of Probabilistic Timing Analysis Techniques for Real-Time Systems

    Get PDF
    This survey covers probabilistic timing analysis techniques for real-time systems. It reviews and critiques the key results in the field from its origins in 2000 to the latest research published up to the end of August 2018. The survey provides a taxonomy of the different methods used, and a classification of existing research. A detailed review is provided covering the main subject areas: static probabilistic timing analysis, measurement-based probabilistic timing analysis, and hybrid methods. In addition, research on supporting mechanisms and techniques, case studies, and evaluations is also reviewed. The survey concludes by identifying open issues, key challenges and possible directions for future research

    Next-Gen Hybrid Memory and Interconnect System Architectures

    Get PDF
    This dissertation mainly addresses two problems that emerge along with the 'big data' trend: the increasing demands of memory capacity for mobile computing platform, and the needs for interconnection network with higher bandwidth/energy efficiency in the HPC/Data Center. The current mobile applications have rapidly growing memory footprints, posing a great challenge for memory system design. Insufficient DRAM main memory will incur frequent data swaps between memory and storage, a process that hurts performance, consumes energy and deteriorates the write endurance of typical flash storage devices. Alternately, a larger DRAM has higher leakage power and drains the battery faster. Further, DRAM scaling trends make further growth of DRAM in the mobile space prohibitive due to cost. Emerging non-volatile memory (NVM) has the potential to alleviate these issues due to its higher capacity per cost than DRAM and minimal static power. Recently, a wide spectrum of NVM technologies, including phase-change memories (PCM), memristor, and 3D XPoint have emerged. Despite the mentioned advantages, NVM has longer access latency compared to DRAM and NVM writes can incur higher latencies and wear costs. Therefore integration of these new memory technologies in the memory hierarchy requires a fundamental rearchitecting of traditional system designs. In this work, we propose a hardware-accelerated memory manager (HMMU) that addresses both types of memory in a flat space address space. We design a set of data placement and data migration policies within this memory manager, such that we may exploit the advantages of each memory technology. By augmenting the system with this HMMU, we reduce the overall memory latency while also reducing energy consumption and writes to the NVM. Experimental results show that our design achieves a 39% reduction in energy consumption with only a 12% performance degradation versus an all-DRAM baseline that is likely untenable in the future. After developing the pure hardware memory management for the data migration between DRAM and NVM, we consider to integrate information from the software stack into our system. These software information, such as programmers' hints or application profiling results, reveals the longer-term memory access pattern and data object properties; but they come at the cost of high software latency. Hardware approaches can avoid the latencies of software kernel processes related to page migration, such as page fault handling. However, hardware's vision is limited to a short time window, as it can only monitor and analyze the recently received memory requests. Ideally, the execution time advantages of pure hardware approaches, should be combined with the data object properties in a global scope. Further, application programmer's hints could guide the data placement at the allocation time, thus data objects with similar property could be congregated to reduce unnecessary page migrations. In this work, we propose such a hardware-software cooperative approach. In particular, we built a heap memory manager that allows the programmer to choose the memory type for each data object allocation. Such denotations are relayed to the hardware memory manager as hints for the decisions on data placement and migration. Meanwhile the hardware memory manager is still capable of capturing the per-application phase changes and maintaining flexibility in its data redistribution. The integration of the two mechanisms leads to optimal results from both long-term and short-term aspects. Experiment results show that our design shortens the overall memory latency while also reducing energy consumption and writes to the NVM versus prior approaches. Our design achieves a 40% reduction in energy consumption with only a 16% performance degradation versus the all-DRAM memory system. As for the HPC/Data domain, a primary problem is how to scale up the interconnection network to service the ever-increasing number of nodes. Photonic-links, with its high bandwidth and low signal loss across long distance propagation, is a promising technology to solve this problem. The higher bandwidth allows the router to connect more nodes while the long-distance connection makes it possible to implement more advanced typologies, such as the flattened butterfly. Both factors help to reduce the average number of hops between nodes across the network. Such high-radix and short distance network is essential to provisioning low latency communications in massive scale systems. However, due to the different physical and device properties, interconnection network needs redesign to adopt the photonic links. We first listed the basic formulas and design flow for interconnection network, and introduced a highly efficient event-driven simulator. Then we conducted a series of experiments to explore the design space, and gave a quantitative comparison between interconnection networks made of pure electrical links and those with electronic/photonic hybrid design

    A Survey on Cache Management Mechanisms for Real-Time Embedded Systems

    Get PDF
    © ACM, 2015. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in ACM Computing Surveys, {48, 2, (November 2015)} http://doi.acm.org/10.1145/2830555Multicore processors are being extensively used by real-time systems, mainly because of their demand for increased computing power. However, multicore processors have shared resources that affect the predictability of real-time systems, which is the key to correctly estimate the worst-case execution time of tasks. One of the main factors for unpredictability in a multicore processor is the cache memory hierarchy. Recently, many research works have proposed different techniques to deal with caches in multicore processors in the context of real-time systems. Nevertheless, a review and categorization of these techniques is still an open topic and would be very useful for the real-time community. In this article, we present a survey of cache management techniques for real-time embedded systems, from the first studies of the field in 1990 up to the latest research published in 2014. We categorize the main research works and provide a detailed comparison in terms of similarities and differences. We also identify key challenges and discuss future research directions.King Saud University NSER

    Exploring Adaptive Implementation of On-Chip Networks

    Get PDF
    As technology geometries have shrunk to the deep submicron regime, the communication delay and power consumption of global interconnections in high performance Multi- Processor Systems-on-Chip (MPSoCs) are becoming a major bottleneck. The Network-on- Chip (NoC) architecture paradigm, based on a modular packet-switched mechanism, can address many of the on-chip communication issues such as performance limitations of long interconnects and integration of large number of Processing Elements (PEs) on a chip. The choice of routing protocol and NoC structure can have a significant impact on performance and power consumption in on-chip networks. In addition, building a high performance, area and energy efficient on-chip network for multicore architectures requires a novel on-chip router allowing a larger network to be integrated on a single die with reduced power consumption. On top of that, network interfaces are employed to decouple computation resources from communication resources, to provide the synchronization between them, and to achieve backward compatibility with existing IP cores. Three adaptive routing algorithms are presented as a part of this thesis. The first presented routing protocol is a congestion-aware adaptive routing algorithm for 2D mesh NoCs which does not support multicast (one-to-many) traffic while the other two protocols are adaptive routing models supporting both unicast (one-to-one) and multicast traffic. A streamlined on-chip router architecture is also presented for avoiding congested areas in 2D mesh NoCs via employing efficient input and output selection. The output selection utilizes an adaptive routing algorithm based on the congestion condition of neighboring routers while the input selection allows packets to be serviced from each input port according to its congestion level. Moreover, in order to increase memory parallelism and bring compatibility with existing IP cores in network-based multiprocessor architectures, adaptive network interface architectures are presented to use multiple SDRAMs which can be accessed simultaneously. In addition, a smart memory controller is integrated in the adaptive network interface to improve the memory utilization and reduce both memory and network latencies. Three Dimensional Integrated Circuits (3D ICs) have been emerging as a viable candidate to achieve better performance and package density as compared to traditional 2D ICs. In addition, combining the benefits of 3D IC and NoC schemes provides a significant performance gain for 3D architectures. In recent years, inter-layer communication across multiple stacked layers (vertical channel) has attracted a lot of interest. In this thesis, a novel adaptive pipeline bus structure is proposed for inter-layer communication to improve the performance by reducing the delay and complexity of traditional bus arbitration. In addition, two mesh-based topologies for 3D architectures are also introduced to mitigate the inter-layer footprint and power dissipation on each layer with a small performance penalty.Siirretty Doriast
    • …
    corecore