49 research outputs found

    Embedded Social Insect-Inspired Intelligence Networks for System-level Runtime Management

    Get PDF
    Large-scale distributed computing architectures such as, e.g. systems on chip or many-core devices, offer advantages over monolithic or centralised single-core systems in terms of speed, power/thermal performance and fault tolerance. However, these are not implicit properties of such systems and runtime management at software or hardware level is required to unlock these features. Biological systems naturally present such properties and are also adaptive and scalable. To consider how these can be similarly achieved in hardware may be beneficial. We present Social Insect behaviours as a suitable model for enabling autonomous runtime management (RTM) in many-core architectures. The emergent properties sought to establish are self-organisation of task mapping and systemlevel fault tolerance. For example, large social insect colonies accomplish a wide range of tasks to build and maintain the colony. Many thousands of individuals, each possessing relatively little intelligence, contribute without any centralised control. Hence, it would seem that social insects have evolved a scalable approach to task allocation, load balancing and robustness that can be applied to large many-core computing systems. Based on this, a self-optimising and adaptive, yet fundamentally scalable, design approach for many-core systems based on the emergent behaviours of social-insect colonies are developed. Experiments capture decision-making processes of each colony member to exhibit such high-level behaviours and embed these decision engines within the routers of the many-core system

    VLSI hardware neural accelerator using reduced precision arithmetic

    Get PDF

    A novel approach for the hardware implementation of a PPMC statistical data compressor

    Get PDF
    This thesis aims to understand how to design high-performance compression algorithms suitable for hardware implementation and to provide hardware support for an efficient compression algorithm. Lossless data compression techniques have been developed to exploit the available bandwidth of applications in data communications and computer systems by reducing the amount of data they transmit or store. As the amount of data to handle is ever increasing, traditional methods for compressing data become· insufficient. To overcome this problem, more powerful methods have been developed. Among those are the so-called statistical data compression methods that compress data based on their statistics. However, their high complexity and space requirements have prevented their hardware implementation and the full exploitation of their potential benefits. This thesis looks into the feasibility of the hardware implementation of one of these statistical data compression methods by exploring the potential for reorganising and restructuring the method for hardware implementation and investigating ways of achieving efficient and effective designs to achieve an efficient and cost-effective algorithm. [Continues.

    Zero-maintenance of electronic systems: Perspectives, challenges, and opportunities

    Get PDF
    Self-engineering systems that are capable of repairing themselves in-situ without the need for human decision (or intervention) could be used to achieve zero-maintenance. This philosophy is synonymous to the way in which the human body heals and repairs itself up to a point. This article synthesises issues related to an emerging area of self-healing technologies that links software and hardware mitigations strategies. Efforts are concentrated on built-in detection, masking and active mitigation that comprises self-recovery or self-repair capability, and has a focus on system resilience and recovering from fault events. Design techniques are critically reviewed to clarify the role of fault coverage, resource allocation and fault awareness, set in the context of existing and emerging printable/nanoscale manufacturing processes. The qualitative analysis presents new opportunities to form a view on the research required for a successful integration of zero-maintenance. Finally, the potential cost benefits and future trends are enumerated

    Social Insect-Inspired Adaptive Hardware

    Get PDF
    Modern VLSI transistor densities allow large systems to be implemented within a single chip. As technologies get smaller, fundamental limits of silicon devices are reached resulting in lower design yields and post-deployment failures. Many-core systems provide a platform for leveraging the computing resource on offer by deep sub-micron technologies and also offer high-level capabilities for mitigating the issues with small feature sizes. However, designing for many-core systems that can adapt to in-field failures and operation variability requires an extremely large multi-objective optimisation space. When a many-core reaches the size supported by the densities of modern technologies (thousands of processing cores), finding design solutions in this problem space becomes extremely difficult. Many biological systems show properties that are adaptive and scalable. This thesis proposes a self-optimising and adaptive, yet scalable, design approach for many-core based on the emergent behaviours of social-insect colonies. In these colonies there are many thousands of individuals with low intelligence who contribute, without any centralised control, to complete a wide range of tasks to build and maintain the colony. The experiments presented translate biological models of social-insect intelligence into simple embedded intelligence circuits. These circuits sense low-level system events and use this manage the parameters of the many-core's Network-on-Chip (NoC) during runtime. Centurion, a 128-node many-core, was created to investigate these models at large scale in hardware. The results show that, by monitoring a small number of signals within each NoC router, task allocation emerges from the social-insect intelligence models that can self-configure to support representative applications. It is demonstrated that emergent task allocation supports fault tolerance with no extra hardware overhead. The response-threshold decision making circuitry uses a negligible amount of hardware resources relative to the size of the many-core and is an ideal technology for implementing embedded intelligence for system runtime management of large-complexity single-chip systems

    Architectural soup: a proposed very general purpose computer

    Get PDF
    Phd ThesisThis thesis is concerned with architecture for long term general purpose computers. The work is based on current trends in machine architecture and technology. Projections from these generated "Architectural Soups". An Architectural Soup has the potential to emulate many different machine architectures. The characteristics of this class of machine are, three dimensional, simple cells and a simple communications topology, which can be reconfigured at a very low level. This thesis aims to show potential usefulness and viability of machines with such capability. Methods of programming are considered, and important design issues are investigated. A specific implementation architecture is described and illustrated through simulation. An assessment is made of the architecture and of the simulator used. In addition, the implementation architecture is used as the basis for a VLSI design, which shows the simplicity of a Soup cell, and provides estimates of the possible number of cells in future machines.The Science and Engineering Research Council

    Gbit/second lossless data compression hardware

    Get PDF
    This thesis investigates how to improve the performance of lossless data compression hardware as a tool to reduce the cost per bit stored in a computer system or transmitted over a communication network. Lossless data compression allows the exact reconstruction of the original data after decompression. Its deployment in some high-bandwidth applications has been hampered due to performance limitations in the compressing hardware that needs to match the performance of the original system to avoid becoming a bottleneck. Advancing the area of lossless data compression hardware, hence, offers a valid motivation with the potential of doubling the performance of the system that incorporates it with minimum investment. This work starts by presenting an analysis of current compression methods with the objective of identifying the factors that limit performance and also the factors that increase it. [Continues.

    Novel III-V compound semiconductor technologies for low power digital logic applications

    Get PDF
    As silicon (Si) complementary metal oxide semiconductor (CMOS) technology continues to scale into the 10 nm node, chip power consumption is approaching 200 W/cm2 and any further increase is unsustainable. Incorporating III-V compound semiconductor n-type devices into future CMOS generations could allow for the the reduction in supply voltage, and therefore, power consumption, while simultaneously improving on-state performance. The advanced state of Si CMOS places stringent demands on III-V devices, however: the current 14 nm Si tri-gate devices employ high aspect ratio, densely spaced fins which serve to significantly increase current per chip surface area. III-V devices need to significantly out perform state of the art Si devices in order to merit their disruptive incorporation into the well established CMOS process. This necessitates that they too exploit the vertical dimension. To this end, this thesis reports on the fabrication, measurement and analysis of high aspect ratio junctionless InGaAs FinFETs. The junctionless architecture was first demonstrated in 2010 and was shown to circumvent pro- hibitive fabrication challenges for devices with ultra short gate lengths. This work investigated the impact of fin width on both the on and off-state performance of 200 nm gate length devices, with nominal fin widths of 10, 15 and 20 nm. Excellent subthreshold performance was demonstrated, with the narrowest fin width exhibiting a minimum subthreshold swing (SS) of 73 mV/Dec., and an average SS of 80 mV/Dec. over two decades of current. A maximum on-current, Ion, of 80.51 μA/cm2 was measured at a gate overdrive of 0.5 V from an off-state current, Ioff, of 100 nA/cm2 and a drain voltage, Vd, of 0.5 V, with current normalised by gated perimeter. This is competitive with other III-V junctionless devices at similar gate lengths. With current normalised to base fin width, however, Ion increases to 371.8 μA/cm2, which is a record value among equivalently normalised non-planar III-V junctionless devices at any gate length. This technology, therefore, clearly demonstrates the feasibility of incorporating scaled, etched InGaAs fins into future logic generations. Perhaps the greatest bottleneck to the incorporation of III-V compounds into future CMOS technology nodes, however, is the lack of a suitable III-V PMOS candidate: co-integrating different material systems onto a common substate incurs great fabrication complexity, and therefore, cost. III-V antimonides, however, have recently emerged as promising candidates for III-V PMOS and exhibit the highest bulk electron mobility of all III-Vs in addition to a hole mobility second only to germanium. InGaSb ternary compounds have been shown to offer the best combined performance for electrons and holes in the same material, and as such, have the potential to the enable the most simplistic incarnation of III-V CMOS; provided, of course, that is possible to form a gate stack to both device polarities with sufficient electrical properties. To date, however, there has been no investigation into the high-k dielectric interface to InGaSb. To this end, this thesis presents results of the first investigation into the impact of in-situ H2 plasma exposure on the electrical properties of the p/n-In0.3Ga0.7Sb-Al2O3 interface. The parameter space was explored systematically in terms of H2 plasma power and exposure time, and further, the impact of impact of in-situ trimethylaluminium (TMA) pre-cleaning and annealing in forming gas was assessed. Metal oxide semiconductor capacitors (MOSCAPs) were fabricated subsequent to H2 plasma processing and Al2O3 deposition, and the correspond- ing capacitance-voltage and conductance-voltage measurements were analysed both qualita- tively and quantitatively via the simulation of an equivalent circuit model. X-Ray photoelectron spectroscopy (XPS) analysis of samples processed as part of the plasma power series revealed a combination of ex-situ HCl cleaning and in-situ H2 plasma exposure to completely remove In and Sb sub oxides, with the Ga-O content reduced to Ga-O:InGaSb <0.1. The optimal process, which included ex-situ HCl surface cleaning, in-situ H2 plasma and TMA pre-cleaning, and a post gate metal forming gas anneal, was unequivocally demonstrated to yield a fully unpinnned MOS interface with both n and p-type MOSCAPs explicitly demonstrating a genuine minority carrier response. Interface state and border trap densities were extracted, with a minimum Dit of 1.73x1012 cm-2 eV-1 located at ~110 meV below the conduction band edge and peak border trap densities approximately aligned with the valence and conduction band edges of 3x1019 cm-3 eV-1 and 6.5x1019 cm-3 eV-1 respectively. These results indicate that the optimal gate stack process is indeed applicable to both p and n- type InGaSb MOSFETs, and therefore, represent a critical advancement towards achieving high performance III-V CMOS

    An Efficient NoC-based Framework To Improve Dataflow Thread Management At Runtime

    Get PDF
    This doctoral thesis focuses on how the application threads that are based on dataflow execution model can be managed at Network-on-Chip (NoC) level. The roots of the dataflow execution model date back to the early 1970’s. Applications adhering to such program execution model follow a simple producer-consumer communication scheme for synchronising parallel thread related activities. In dataflow execution environment, a thread can run if and only if all its required inputs are available. Applications running on a large and complex computing environment can significantly benefit from the adoption of dataflow model. In the first part of the thesis, the work is focused on the thread distribution mechanism. It has been shown that how a scalable hash-based thread distribution mechanism can be implemented at the router level with low overheads. To enhance the support further, a tool to monitor the dataflow threads’ status and a simple, functional model is also incorporated into the design. Next, a software defined NoC has been proposed to manage the distribution of dataflow threads by exploiting its reconfigurability. The second part of this work is focused more on NoC microarchitecture level. Traditional 2D-mesh topology is combined with a standard ring, to understand how such hybrid network topology can outperform the traditional topology (such as 2D-mesh). Finally, a mixed-integer linear programming based analytical model has been proposed to verify if the application threads mapped on to the free cores is optimal or not. The proposed mathematical model can be used as a yardstick to verify the solution quality of the newly developed mapping policy. It is not trivial to provide a complete low-level framework for dataflow thread execution for better resource and power management. However, this work could be considered as a primary framework to which improvements could be carried out

    Scalable Parameterised Algorithms for two Steiner Problems

    Get PDF
    In the Steiner Problem, we are given as input (i) a connected graph with nonnegative integer weights associated with the edges; and (ii) a subset of vertices called terminals. The task is to find a minimum-weight subgraph connecting all the terminals. In the Group Steiner Problem, we are given as input (i) a connected graph with nonnegative integer weights associated with the edges; and (ii) a collection of subsets of vertices called groups. The task is to find a minimum-weight subgraph that contains at least one vertex from each group. Even though the Steiner Problem and the Group Steiner Problem are NP-complete, they are known to admit parameterised algorithms that run in linear time in the size of the input graph and the exponential part can be restricted to the number of terminals and the number of groups, respectively. In this thesis, we discuss two parameterised algorithms for solving the Steiner Problem, and by reduction, the Group Steiner Problem: (a) a dynamic programming algorithm presented by Dreyfus and Wagner in 1971; and (b) an improvement of the Dreyfus-Wagner algorithm presented by Erickson, Monma and Veinott in 1987 that runs in linear time in the size of the input graph. We develop a parallel implementation of the Erickson-Monma-Veinott algorithm, and carry out extensive experiments to study the scalability of our implementation with respect to its runtime, memory bandwidth, and memory usage. Our experimental results demonstrate that the implementation can scale up to a billion edges on a single modern compute node provided that the number of terminals is small. For example, using our parallel implementation a Steiner tree for a graph with hundred million edges and ten terminals can be found in approximately twenty minutes. For an input graph with one hundred million edges and ten terminals, our parallel implementation is at least fifteen times faster than its serial counterpart on a Haswell compute node with two processors and twelve cores in each processor. Our implementation of the Erickson-Monma-Veinott algorithm is available as open source
    corecore