Search CORE

1,808 research outputs found

Evaluating and Optimizing IP Lookup on Many core Processors

Author: Guan Hongtao
He Peng
Salamatian Kavé
Xie Gaogang
Publication venue: HAL CCSD
Publication date: 01/01/2012
Field of study

International audienceIn recent years, there has been a growing interest in multi/many core processors as a target architecture for high performance software router. Because of its key position in routers, hardware IP lookup implementation has been intensively studied with TCAM and FPGA based architecture. However, increasing interest in software implementation has also been observed. In this paper, we evaluate the performance of software only IP lookup on a many core chip, the TILEPro64 processor. For this purpose we have implemented two widely used IP lookup algorithms, DIR-24-8-BASIC and Tree Bitmap. We evaluate the performance of these two algorithms over the TILEPro64 processor with both synthetic and real-world traces. After a detailed analysis, we propose a hybrid scheme which provides high lookup speed and low worst case update overhead. Our work shows how to exploit the architectural features of TILEPro64 to improve the performance, including many optimization in both single-core and parallelism aspects. Experiment results show by using only 18 cores, we can achieve a lookup throughput of 60Mpps (almost 40Gbps) with low power consumption, which demonstrates great performance potentials in many core processor

CiteSeerX

Crossref

Hal - Université Grenoble Alpes

HAL Université de Savoie

Optimizing energy-efficiency for multi-core packet processing systems in a compiler framework

Author: Huang Jing
Publication venue: Dublin City University. Research Institute for Networks and Communications Engineering (RINCE)
Publication date: 01/11/2012
Field of study

Network applications become increasingly computation-intensive and the amount of traffic soars unprecedentedly nowadays. Multi-core and multi-threaded techniques are thus widely employed in packet processing system to meet the changing requirement. However, the processing power cannot be fully utilized without a suitable programming environment. The compilation procedure is decisive for the quality of the code. It can largely determine the overall system performance in terms of packet throughput, individual packet latency, core utilization and energy efficiency. The thesis investigated compilation issues in networking domain first, particularly on energy consumption. And as a cornerstone for any compiler optimizations, a code analysis module for collecting program dependency is presented and incorporated into a compiler framework. With that dependency information, a strategy based on graph bi-partitioning and mapping is proposed to search for an optimal configuration in a parallel-pipeline fashion. The energy-aware extension is specifically effective in enhancing the energy-efficiency of the whole system. Finally, a generic evaluation framework for simulating the performance and energy consumption of a packet processing system is given. It accepts flexible architectural configuration and is capable of performingarbitrary code mapping. The simulation time is extremely short compared to full-fledged simulators. A set of our optimization results is gathered using the framework

Irish Universities

DCU Online Research Access Service

A Computational Approach to Packet Classification

Author: CAIDA.
Firestone Daniel
Kingma Diederik P
Labs Habana
Mao Hongzi
Rashelbach Alon
Valadarsky Asaf
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/07/2020
Field of study

Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die caches; however, they do not scale well with the number of rules. We present a novel approach, NuevoMatch, which improves the memory scaling of existing methods. A new data structure, Range Query Recursive Model Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of the accesses to main memory with model inference computations. We describe an efficient training algorithm that guarantees the correctness of the RQ-RMI-based classification. The use of RQ-RMI allows the rules to be compressed into model weights that fit into the hardware cache. Further, it takes advantage of the growing support for fast neural network processing in modern CPUs, such as wide vector instructions, achieving a rate of tens of nanoseconds per lookup. Our evaluation using 500K multi-field rules from the standard ClassBench benchmark shows a geometric mean compression factor of 4.9x, 8x, and 82x, and average performance improvement of 2.4x, 2.6x, and 1.6x in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all state-of-the-art algorithms.Comment: To appear in SIGCOMM 202

arXiv.org e-Print Archive

Crossref

High-level services for networks-on-chip

Author: Fiorin Leandro
Sami Mariagiovanna
Silvano Cristina
Publication venue
Publication date: 08/09/2015
Field of study

Future technology trends envision that next-generation Multiprocessors Systems-on- Chip (MPSoCs) will be composed of a combination of a large number of processing and storage elements interconnected by complex communication architectures. Communication and interconnection between these basic blocks play a role of crucial importance when the number of these elements increases. Enabling reliable communication channels between cores becomes therefore a challenge for system designers. Networks-on-Chip (NoCs) appeared as a strategy for connecting and managing the communication between several design elements and IP blocks, as required in complex Systems-on-Chip (SoCs). The topic can be considered as a multidisciplinary synthesis of multiprocessing, parallel computing, networking, and on- chip communication domains. Networks-on-Chip, in addition to standard communication services, can be employed for providing support for the implementation of system-level services. This dissertation will demonstrate how high-level services can be added to an MPSoC platform by embedding appropriate hardware/software support in the network interfaces (NIs) of the NoC. In this dissertation, the implementation of innovative modules acting in parallel with protocol translation and data transmission in NIs is proposed and evaluated. The modules can support the execution of the high-level services in the NoC at a relatively low cost in terms of area and energy consumption. Three types of services will be addressed and discussed: security, monitoring, and fault tolerance. With respect to the security aspect, this dissertation will discuss the implementation of an innovative data protection mechanism for detecting and preventing illegal accesses to protected memory blocks and/or memory mapped peripherals. The second aspect will be addressed by proposing the implementation of a monitoring system based on programmable multipurpose monitoring probes aimed at detecting NoC internal events and run-time characteristics. As last topic, new architectural solutions for the design of fault tolerant network interfaces will be presented and discussed

RERO DOC Digital Library

Implementation and Evaluation of an NoC Architecture for FPGAs

Author: Le Thuan
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2009
Field of study

The Networks-on-Chip (NoC) approach for designing Systems-on-Chip (SoC) is currently emerging as an advanced concept for overcoming the scalability and efficiency problems of traditional bus-based systems. A great deal of theoretical research has been done in this area that provides good insight and shows promising results. There is a great need for research in hardware implementation of NoC-based systems to determine the feasibility of implementing various topologies and protocols, and also to accurately determine what design tradeoffs are involved in NoC implementation. This thesis addresses the challenges of implementing an NoC-based system on FPGAs for running real benchmark applications. The NoC used a mesh topology and circuit-switched communication protocol. An experimental framework was developed that allowed implementation of NoC-based system from a high level specification, using the Celoxica Handel-C hardware description language. Two test applications: charged couple device (CCD) and JPEG were developed in Handel-C to be used as our benchmark applications. Both benchmarks are computational expensive and require large quantities of data transfer that will test the NoC system. Implementation results show that the NoC-based system gives superior area utilization and speed performance compared to the bus-based system, running the same benchmarks

Scholarship at UWindsor

Towards Terabit Carrier Ethernet and Energy Efficient Optical Transport Networks

Author: Rasmussen Anders
Publication venue: Technical University of Denmark
Publication date: 01/01/2013
Field of study

Online Research Database In Technology

Memory Hierarchy Hardware-Software Co-design in Embedded Systems

Author: Ge Zhiguo
Lim H. B.
Wong Weng Fai
Publication venue
Publication date: 01/01/2005
Field of study

The memory hierarchy is the main bottleneck in modern computer systems as the gap between the speed of the processor and the memory continues to grow larger. The situation in embedded systems is even worse. The memory hierarchy consumes a large amount of chip area and energy, which are precious resources in embedded systems. Moreover, embedded systems have multiple design objectives such as performance, energy consumption, and area, etc. Customizing the memory hierarchy for specific applications is a very important way to take full advantage of limited resources to maximize the performance. However, the traditional custom memory hierarchy design methodologies are phase-ordered. They separate the application optimization from the memory hierarchy architecture design, which tend to result in local-optimal solutions. In traditional Hardware-Software co-design methodologies, much of the work has focused on utilizing reconfigurable logic to partition the computation. However, utilizing reconfigurable logic to perform the memory hierarchy design is seldom addressed. In this paper, we propose a new framework for designing memory hierarchy for embedded systems. The framework will take advantage of the flexible reconfigurable logic to customize the memory hierarchy for specific applications. It combines the application optimization and memory hierarchy design together to obtain a global-optimal solution. Using the framework, we performed a case study to design a new software-controlled instruction memory that showed promising potential.Singapore-MIT Alliance (SMA

DSpace@MIT

Optimizing Deep Packet Inspection for High-Speed Traffic Analysis

Author: A. Este
D. Bonfiglio
F. Gringoli
Fulvio Risso
L. Bernaille
Luigi Ciminiera
M. Crotti
Niccolò Cascarano
R. Smith
V. Paxson
V. Vapnik
Publication venue: Springer
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino