2,127 research outputs found

    Exploration and Design of Power-Efficient Networked Many-Core Systems

    Get PDF
    Multiprocessing is a promising solution to meet the requirements of near future applications. To get full benefit from parallel processing, a manycore system needs efficient, on-chip communication architecture. Networkon- Chip (NoC) is a general purpose communication concept that offers highthroughput, reduced power consumption, and keeps complexity in check by a regular composition of basic building blocks. This thesis presents power efficient communication approaches for networked many-core systems. We address a range of issues being important for designing power-efficient manycore systems at two different levels: the network-level and the router-level. From the network-level point of view, exploiting state-of-the-art concepts such as Globally Asynchronous Locally Synchronous (GALS), Voltage/ Frequency Island (VFI), and 3D Networks-on-Chip approaches may be a solution to the excessive power consumption demanded by today’s and future many-core systems. To this end, a low-cost 3D NoC architecture, based on high-speed GALS-based vertical channels, is proposed to mitigate high peak temperatures, power densities, and area footprints of vertical interconnects in 3D ICs. To further exploit the beneficial feature of a negligible inter-layer distance of 3D ICs, we propose a novel hybridization scheme for inter-layer communication. In addition, an efficient adaptive routing algorithm is presented which enables congestion-aware and reliable communication for the hybridized NoC architecture. An integrated monitoring and management platform on top of this architecture is also developed in order to implement more scalable power optimization techniques. From the router-level perspective, four design styles for implementing power-efficient reconfigurable interfaces in VFI-based NoC systems are proposed. To enhance the utilization of virtual channel buffers and to manage their power consumption, a partial virtual channel sharing method for NoC routers is devised and implemented. Extensive experiments with synthetic and real benchmarks show significant power savings and mitigated hotspots with similar performance compared to latest NoC architectures. The thesis concludes that careful codesigned elements from different network levels enable considerable power savings for many-core systems.Siirretty Doriast

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    Design Space Exploration for MPSoC Architectures

    Get PDF
    Multiprocessor system-on-chip (MPSoC) designs utilize the available technology and communication architectures to meet the requirements of the upcoming applications. In MPSoC, the communication platform is both the key enabler, as well as the key differentiator for realizing efficient MPSoCs. It provides product differentiation to meet a diverse, multi-dimensional set of design constraints, including performance, power, energy, reconfigurability, scalability, cost, reliability and time-to-market. The communication resources of a single interconnection platform cannot be fully utilized by all kind of applications, such as the availability of higher communication bandwidth for computation but not data intensive applications is often unfeasible in the practical implementation. This thesis aims to perform the architecture-level design space exploration towards efficient and scalable resource utilization for MPSoC communication architecture. In order to meet the performance requirements within the design constraints, careful selection of MPSoC communication platform, resource aware partitioning and mapping of the application play important role. To enhance the utilization of communication resources, variety of techniques such as resource sharing, multicast to avoid re-transmission of identical data, and adaptive routing can be used. For implementation, these techniques should be customized according to the platform architecture. To address the resource utilization of MPSoC communication platforms, variety of architectures with different design parameters and performance levels, namely Segmented bus (SegBus), Network-on-Chip (NoC) and Three-Dimensional NoC (3D-NoC), are selected. Average packet latency and power consumption are the evaluation parameters for the proposed techniques. In conventional computing architectures, fault on a component makes the connected fault-free components inoperative. Resource sharing approach can utilize the fault-free components to retain the system performance by reducing the impact of faults. Design space exploration also guides to narrow down the selection of MPSoC architecture, which can meet the performance requirements with design constraints.Siirretty Doriast

    MemPool: A Scalable Manycore Architecture with a Low-Latency Shared L1 Memory

    Full text link
    Shared L1 memory clusters are a common architectural pattern (e.g., in GPGPUs) for building efficient and flexible multi-processing-element (PE) engines. However, it is a common belief that these tightly-coupled clusters would not scale beyond a few tens of PEs. In this work, we tackle scaling shared L1 clusters to hundreds of PEs while supporting a flexible and productive programming model and maintaining high efficiency. We present MemPool, a manycore system with 256 RV32IMAXpulpimg "Snitch" cores featuring application-tunable functional units. We designed and implemented an efficient low-latency PE to L1-memory interconnect, an optimized instruction path to ensure each PE's independent execution, and a powerful DMA engine and system interconnect to stream data in and out. MemPool is easy to program, with all the cores sharing a global view of a large, multi-banked, L1 scratchpad memory, accessible within at most five cycles in the absence of conflicts. We provide multiple runtimes to program MemPool at different abstraction levels and illustrate its versatility with a wide set of applications. MemPool runs at 600 MHz (60 gate delays) in typical conditions (TT/0.80V/25{\deg}C) in 22 nm FDX technology and achieves a performance of up to 229 GOPS or 192 GOPS/W with less than 2% of execution stalls.Comment: 14 pages, 17 figures, 2 table

    Resource Allocation in Ad Hoc Networks

    No full text
    Unlike the centralized network, the ad hoc network does not have any central administrations and energy is constrained, e.g. battery, so the resource allocation plays a very important role in efficiently managing the limited energy in ad hoc networks. This thesis focuses on the resource allocation in ad hoc networks and aims to develop novel techniques that will improve the network performance from different network layers, such as the physical layer, Medium Access Control (MAC) layer and network layer. This thesis examines the energy utilization in High Speed Downlink Packet Access (HSDPA) systems at the physical layer. Two resource allocation techniques, known as channel adaptive HSDPA and two-group HSDPA, are developed to improve the performance of an ad hoc radio system through reducing the residual energy, which in turn, should improve the data rate in HSDPA systems. The channel adaptive HSDPA removes the constraint on the number of channels used for transmissions. The two-group allocation minimizes the residual energy in HSDPA systems and therefore enhances the physical data rates in transmissions due to adaptive modulations. These proposed approaches provide better data rate than rates achieved with the current HSDPA type of algorithm. By considering both physical transmission power and data rates for defining the cost function of the routing scheme, an energy-aware routing scheme is proposed in order to find the routing path with the least energy consumption. By focusing on the routing paths with low energy consumption, computational complexity is significantly reduced. The data rate enhancement achieved by two-group resource allocation further reduces the required amount of energy per bit for each path. With a novel load balancing technique, the information bits can be allocated to each path in such that a way the overall amount of energy consumed is minimized. After loading bits to multiple routing paths, an end-to-end delay minimization solution along a routing path is developed through studying MAC distributed coordination function (DCF) service time. Furthermore, the overhead effect and the related throughput reduction are studied. In order to enhance the network throughput at the MAC layer, two MAC DCF-based adaptive payload allocation approaches are developed through introducing Lagrange optimization and studying equal data transmission period

    Real-Time Cross-Layer Routing Protocol for Ad Hoc Wireless Sensor Networks

    Get PDF
    Reliable and energy efficient routing is a critical issue in Wireless Sensor Networks (WSNs) deployments. Many approaches have been proposed for WSN routing, but sensor field implementations, compared to computer simulations and fully-controlled testbeds, tend to be lacking in the literature and not fully documented. Typically, WSNs provide the ability to gather information cheaply, accurately and reliably over both small and vast physical regions. Unlike other large data network forms, where the ultimate input/output interface is a human being, WSNs are about collecting data from unattended physical environments. Although WSNs are being studied on a global scale, the major current research is still focusing on simulations experiments. In particular for sensor networks, which have to deal with very stringent resource limitations and that are exposed to severe physical conditions, real experiments with real applications are essential. In addition, the effectiveness of simulation studies is severely limited in terms of the difficulty in modeling the complexities of the radio environment, power consumption on sensor devices, and the interactions between the physical, network and application layers. The routing problem in ad hoc WSNs is nontrivial issue because of sensor node failures due to restricted recourses. Thus, the routing protocols of WSNs encounter two conflicting issue: on the one hand, in order to optimise routes, frequent topology updates are required, while on the other hand, frequent topology updates result in imbalanced energy dissipation and higher message overhead. In the literature, such as in (Rahul et al., 2002), (Woo et al., 2003), (TinyOS, 2004), (Gnawali et al., 2009) and (Burri et al., 2007) several authors have presented routing algorithms for WSNs that consider purely one or two metrics at most in attempting to optimise routes while attempting to keep small message overhead and balanced energy dissipation. Recent studies on energy efficient routing in multihop WSNs have shown a great reliance on radio link quality in the path selection process. If sensor nodes along the routing path and closer to the base station advertise a high quality link to forwarding upstream packets, these sensor nodes will experience a faster depletion rate in their residual energy. This results in a topological routing hole or network partitioning as stated and resolved in and (Daabaj 2010). This chapter presents an empirical study on how to improve energy efficiency for reliable multihop communication by developing a real-time cross-layer lifetime-oriented routing protocol and integrating useful routing information from different layers to examine their joint benefit on the lifetime of individual sensor nodes and the entire sensor network. The proposed approach aims to balance the workload and energy usage among relay nodes to achieve balanced energy dissipation, thereby maximizing the functional network lifetime. The obtained experimental results are presented from prototype real-network experiments based on Crossbow’s sensor motes (Crossbow, 2010), i.e., Mica2 low-power wireless sensor platforms (Crossbow, 2010). The distributed real-time routing protocol which is proposed In this chapter aims to face the dynamics of the real world sensor networks and also to discover multiple paths between the base station and source sensor nodes. The proposed routing protocol is compared experimentally with a reliability-oriented collection-tree protocol, i.e., the TinyOS MintRoute protocol (Woo et al., 2003). The experimental results show that our proposed protocol has a higher node energy efficiency, lower control overhead, and fair average delay

    A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

    Full text link
    Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

    Techniques for Processing TCP/IP Flow Content in Network Switches at Gigabit Line Rates

    Get PDF
    The growth of the Internet has enabled it to become a critical component used by businesses, governments and individuals. While most of the traffic on the Internet is legitimate, a proportion of the traffic includes worms, computer viruses, network intrusions, computer espionage, security breaches and illegal behavior. This rogue traffic causes computer and network outages, reduces network throughput, and costs governments and companies billions of dollars each year. This dissertation investigates the problems associated with TCP stream processing in high-speed networks. It describes an architecture that simplifies the processing of TCP data streams in these environments and presents a hardware circuit capable of TCP stream processing on multi-gigabit networks for millions of simultaneous network connections. Live Internet traffic is analyzed using this new TCP processing circuit

    RAID-2: Design and implementation of a large scale disk array controller

    Get PDF
    We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly
    corecore