134 research outputs found

    Floorplan-Aware High Performance NoC Design

    Full text link
    Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci

    Novel Metric for Load Balance and Congestion Reducing in Network on-Chip

    Get PDF
    The Network-on-Chip (NoC) is an alternative pattern that is considered as an emerging technology for distributed embedded systems. The traditional use of multi-cores in computing increase the calculation performance; but affect the network communication causing congestion on nodes which therefore decrease the global performance of the NoC. To alleviate this problematic phenomenon, several strategies were implemented, to reduce or prevent the occurrence of congestion, such as network status metrics, new routing algorithm, packets injection control, and switching strategies. In this paper, we carried out a study on congestion in a 2D mesh network, through various detailed simulations. Our focus was on the most used congestion metrics in NoC. According to our experiments and performed simulations under different traffic scenarios, we found that these metrics are less representative, less significant and yet they do not give a true overview of reading within the NoC nodes at a given cycle. Our study shows that the use of other complementary information regarding the state of nodes and network traffic flow in the design of a novel metric, can really improve the results. In this paper, we put forward a novel metric that takes into account the overall operating state of a router in the design of adaptive XY routing algorithm, aiming to improve routing decisions and network performance. We compare the throughput, latency, resource utilization, and congestion occurrence of proposed metric to three published metrics on two specific traffic patterns in a varied packets injection rate. Our results indicate that our novel metric-based adaptive XY routing has overcome congestion and significantly improve resource utilization through load balancing; achieving an average improvement rate up to 40 % compared to adaptive XY routing based on the previous congestion metrics

    Doctor of Philosophy

    Get PDF
    dissertationPortable electronic devices will be limited to available energy of existing battery chemistries for the foreseeable future. However, system-on-chips (SoCs) used in these devices are under a demand to offer more functionality and increased battery life. A difficult problem in SoC design is providing energy-efficient communication between its components while maintaining the required performance. This dissertation introduces a novel energy-efficient network-on-chip (NoC) communication architecture. A NoC is used within complex SoCs due it its superior performance, energy usage, modularity, and scalability over traditional bus and point-to-point methods of connecting SoC components. This is the first academic research that combines asynchronous NoC circuits, a focus on energy-efficient design, and a software framework to customize a NoC for a particular SoC. Its key contribution is demonstrating that a simple, asynchronous NoC concept is a good match for low-power devices, and is a fruitful area for additional investigation. The proposed NoC is energy-efficient in several ways: simple switch and arbitration logic, low port radix, latch-based router buffering, a topology with the minimum number of 3-port routers, and the asynchronous advantages of zero dynamic power consumption while idle and the lack of a clock tree. The tool framework developed for this work uses novel methods to optimize the topology and router oorplan based on simulated annealing and force-directed movement. It studies link pipelining techniques that yield improved throughput in an energy-efficient manner. A simulator is automatically generated for each customized NoC, and its traffic generators use a self-similar message distribution, as opposed to Poisson, to better match application behavior. Compared to a conventional synchronous NoC, this design is superior by achieving comparable message latency with half the energy

    Toward Reliable, Secure, and Energy-Efficient Multi-Core System Design

    Get PDF
    Computer hardware researchers have perennially focussed on improving the performance of computers while stipulating the energy consumption under a strict budget. While several innovations over the years have led to high performance and energy efficient computers, more challenges have also emerged as a fallout. For example, smaller transistor devices in modern multi-core systems are afflicted with several reliability and security concerns, which were inconceivable even a decade ago. Tackling these bottlenecks happens to negatively impact the power and performance of the computers. This dissertation explores novel techniques to gracefully solve some of the pressing challenges of the modern computer design. Specifically, the proposed techniques improve the reliability of on-chip communication fabric under a high power supply noise, increase the energy-efficiency of low-power graphics processing units, and demonstrate an unprecedented security loophole of the low-power computing paradigm through rigorous hardware-based experiments

    Toward a sustainable cybersecurity ecosystem

    Get PDF
    © 2020 by the authors. Licensee MDPI, Basel, Switzerland. Cybersecurity issues constitute a key concern of today’s technology-based economies. Cybersecurity has become a core need for providing a sustainable and safe society to online users in cyberspace. Considering the rapid increase of technological implementations, it has turned into a global necessity in the attempt to adapt security countermeasures, whether direct or indirect, and prevent systems from cyberthreats. Identifying, characterizing, and classifying such threats and their sources is required for a sustainable cyber-ecosystem. This paper focuses on the cybersecurity of smart grids and the emerging trends such as using blockchain in the Internet of Things (IoT). The cybersecurity of emerging technologies such as smart cities is also discussed. In addition, associated solutions based on artificial intelligence and machine learning frameworks to prevent cyber-risks are also discussed. Our review will serve as a reference for policy-makers from the industry, government, and the cybersecurity research community

    The MANGO clockless network-on-chip: Concepts and implementation

    Get PDF

    BlackOut: Enabling fine-grained power gating of buffers in Network-on-Chip routers

    Get PDF
    The Network-on-Chip (NoC) router buffers play an instrumental role in the performance of both the interconnection fabric and the entire multi-/many-core system. Nevertheless, the buffers also constitute the major leakage power consumers in NoC implementations. Traditionally, they are designed to accommodate worst-case traffic scenarios, so they tend to remain idle, or under-utilized, for extended periods of time. The under-utilization of these valuable resources is exemplified when one profiles real application workloads; the generated traffic is bursty in nature, whereby high traffic periods are sporadic and infrequent, in general. The mitigation of the leakage power consumption of NoC buffers via power gating has been explored in the literature, both at coarse (router-level) and fine (buffer-level) granularities. However, power gating at the router granularity is suitable only for low and medium traffic conditions, where the routers have enough opportunities to be powered down. Under high traffic, the sleeping potential rapidly diminishes. Moreover, disabling an entire router greatly affects the NoC functionality and the network connectivity. This article presents BlackOut, a fine-grained power-gating methodology targeting individual router buffers. The goal is to minimize leakage power consumption, without adversely impacting the system performance. The proposed framework is agnostic of the routing algorithm and the network topology, and it is applicable to any router micro-architecture. Evaluation results obtained using both synthetic traffic patterns and real applications in 64-core systems indicate energy savings of up to 70%, as compared to a baseline NoC, with a near-negligible performance overhead of around 2%. BlackOut is also shown to significantly outperformby 35%, on averagetwo current state-of-the-art power-gating solutions, in terms of energy savings. Not tailored to any topology, routing algorithm and NoC router architecture.Router-to-router communication. No need for custom, region-based/global networks.Effective at low, medium and high traffic. Other solutions are more restrictive.+35% energy saving, on average, against two state-of-the-art power-gating solutions.Negligible performance overhead (+2%) compared to the baseline architecture

    Evaluation of preformed particle gel as a diverting agent for acidizing

    Get PDF
    Preformed particle gels (PPGs) have been developed and successfully used to improve conformance in heterogeneous reservoirs. However, no research has been reported to show their potential diverting applications in acidizing because of their unknown performance in this process. This work targets on evaluating the potential of PPGs to be used as diverting agents and factors influencing the diverting performance. The swelling properties of three types of PPGs (PPG #1, #2, and #3) were first studied in brine and acid, including the swelling kinetics and equilibrium, PPG strength, and deswelling performance. Effects of different brine concentration, acid concentration, and temperature were also conducted. Then, coreflooding experiments were carried out to investigate their plugging, acid diversion, and permeability recovery performance in carbonate rocks. Effects of PPGs injection volume and particle size were finally investigated. PPG #1 and #2 can swell up to about 10 times and maintain stable in acid for about one day before deswelling. Moreover, PPG #2 even swells a little more in acid at first several hours at 45℃. PPG #3\u27s swelling ratio ranges from 20 to 65, which is sensitive to the brine concentration and it reaches deswelling equilibrium after 20 minutes in acid. The gel strength of PPG #2 is the highest and it can be degraded in brine at 80°C in five days. Coreflooding results show that these three PPGs have different diversion ability and can be used as diverting agents to enhance the acid stimulation. Larger injection volume and large particles are favorable by using PPGs as diverting agents in acidizing. The properties evaluation and coreflooding experiments show that PPG #2 can be the best candidate as a diverting agent --Abstract, page iii

    High performance communication on reconfigurable clusters

    Get PDF
    High Performance Computing (HPC) has matured to where it is an essential third pillar, along with theory and experiment, in most domains of science and engineering. Communication latency is a key factor that is limiting the performance of HPC, but can be addressed by integrating communication into accelerators. This integration allows accelerators to communicate with each other without CPU interactions, and even bypassing the network stack. Field Programmable Gate Arrays (FPGAs) are the accelerators that currently best integrate communication with computation. The large number of Multi-gigabit Transceivers (MGTs) on most high-end FPGAs can provide high-bandwidth and low-latency inter-FPGA connections. Additionally, the reconfigurable FPGA fabric enables tight coupling between computation kernel and network interface. Our thesis is that an application-aware communication infrastructure for a multi-FPGA system makes substantial progress in solving the HPC communication bottleneck. This dissertation aims to provide an application-aware solution for communication infrastructure for FPGA-centric clusters. Specifically, our solution demonstrates application-awareness across multiple levels in the network stack, including low-level link protocols, router microarchitectures, routing algorithms, and applications. We start by investigating the low-level link protocol and the impact of its latency variance on performance. Our results demonstrate that, although some link jitter is always present, we can still assume near-synchronous communication on an FPGA-cluster. This provides the necessary condition for statically-scheduled routing. We then propose two novel router microarchitectures for two different kinds of workloads: a wormhole Virtual Channel (VC)-based router for workloads with dynamic communication, and a statically-scheduled Virtual Output Queueing (VOQ)-based router for workloads with static communication. For the first (VC-based) router, we propose a framework that generates application-aware router configurations. Our results show that, by adding application-awareness into router configuration, the network performance of FPGA clusters can be substantially improved. For the second (VOQ-based) router, we propose a novel offline collective routing algorithm. This shows a significant advantage over a state-of-the-art collective routing algorithm. We apply our communication infrastructure to a critical strong-scaling HPC kernel, the 3D FFT. The experimental results demonstrate that the performance of our design is faster than that on CPUs and GPUs by at least one order of magnitude (achieving strong scaling for the target applications). Surprisingly, the FPGA cluster performance is similar to that of an ASIC-cluster. We also implement the 3D FFT on another multi-FPGA platform: the Microsoft Catapult II cloud. Its performance is also comparable or superior to CPU and GPU HPC clusters. The second application we investigate is Molecular Dynamics Simulation (MD). We model MD on both FPGA clouds and clusters. We find that combining processing and general communication in the same device leads to extremely promising performance and the prospect of MD simulations well into the us/day range with a commodity cloud

    Design and analysis of fair, efficient and low-latency schedulers for high-speed packet-switched networks

    Get PDF
    A variety of emerging applications in education, medicine, business, and entertainment rely heavily on high-quality transmission of multimedia data over high speed networks. Packet scheduling algorithms in switches and routers play a critical role in the overall Quality of Service (QoS) strategy to ensure the performance required by such applications. Fair allocation of the link bandwidth among the traffic flows that share the link is an intuitively desirable property of packet schedulers. In addition, strict fairness can improve the isolation between users, help in countering certain kinds of denial-of-service attacks and offer a more predictable performance. Besides fairness, efficiency of implementation and low latency are among the most desirable properties of packet schedulers. The first part of this dissertation presents a novel scheduling discipline called Elastic Round Robin (ERR) which is simple, fair and efficient with a low latency bound. The perpacket work complexity of ERR is O(1). Our analysis also shows that, in comparison to all previously proposed scheduling disciplines of equivalent complexity, ERR has significantly better fairness properties as well as a lower latency bound. However, all frame-based schedulers including ERR suffer from high start-up latencies, burstiness in the output anddelayed correction of fairness. In the second part of this dissertation we propose a new scheduling discipline called Prioritized Elastic Round Robin (PERR) which overcomes the limitations associated with the round robin service order of ERR. The PERR scheduler achieves this by rearranging the sequence in which packets are transmitted in each round of the ERR scheduler. Our analysis reveals that PERR has a low work complexity which is independent of the number of flows. We also prove that PERR has better fairness and latency characteristics than other known schedulers of equivalent complexity. In addition to their obvious applications in Internet routers and switches, both the ERR and PERR schedulers also satisfy the unique requirements of wormhole switching, popular in interconnection networks of parallel systems. Finally, using real gateway traces and based on a new measure of instantaneous fairness borrowed from the field of economics, we present simulation results that demonstrate the improved fairness characteristics and latency bounds of the ERR and and PERR schedulers in comparison with other scheduling disciplines of equivalent efficiency.Ph.D., Electrical Engineering -- Drexel University, 200
    corecore