Search CORE

1,185 research outputs found

Resource Modification On Multicore Server With Kernel Bypass

Author: Ashari Ahmad
Priambodo Dimas Febriyan
Publication venue: 'Universitas Gadjah Mada'
Publication date: 31/10/2020
Field of study

Technology develops very fast marked by many innovations both from hardware and software. Multicore servers with a growing number of cores require efficient software. Kernel and Hardware used to handle various operational needs have some limitations. This limitation is due to the high level of complexity especially in handling as a server such as single socket discriptor, single IRQ and lack of pooling so that it requires some modifications. The Kernel Bypass is one of the methods to overcome the deficiencies of the kernel. Modifications on this server are a combination increase throughput and decrease server latency. Modifications at the driver level with hashing rx signal and multiple receives modification with multiple ip receivers, multiple thread receivers and multiple port listener used to increase throughput. Modifications using pooling principles at either the kernel level or the program level are used to decrease the latency. This combination of modifications makes the server more reliable with an average throughput increase of 250.44% and a decrease in latency 65.83%

A Bloom Filter-Based Monitoring Station for a Lawful Interception Platform

Author: Hernández Gutiérrez José Alberto
Muñoz Muñoz Alfonso
Rodríguez de los Santos López Gerson
Urueña Pascual Manuel
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Lawful Interception (LI) is a fundamental tool in today's Police investigations.Therefore, it is important to make it as quickly and securely as possible as well as a reasonable cost per suspect. This makes traffic capture in aggregation links quite attractive, although this implies high wirespeeds which require the use of specific hardware-based architectures. This paper proposes a novel Bloom Filter-based monitoring station architecture for efficient packet capture in aggregation links. With said Bloom filter, we filter out most of the packets in the link and capture only those belonging to lawful interception wiretaps. Next, we present an FPGA-based implementation of said architecture and obtain the maximum capture rate achievable by injecting traffic through four parallel Gigabit Ethernet lines. Finally, we identify the limitations of our current design and suggest the possibility of further extending it to higher wirespeeds.- Best Paper AwardThe work presented in this paper has been funded by the INDECT project grant number FP7-ICT-218086, and the Spanish CramNet project (grant no. TEC2012-38362-C03-01).European Community's Seventh Framework Progra

Universidad Carlos III de Madrid e-Archivo

Energy and Delay Optimization of Heterogeneous Multicore Wireless Multimedia Sensor Nodes by Adaptive Genetic-Simulated Annealing Algorithm

Author: Christophe de Vaulx
Haiying Zhou
Huan Wang
Jianwen Xiang
Kun Mean Hou
Qing Wang
Shengwu Xiong
Tianhui Shen
Xing Liu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

ECG Signal Reconstruction on the IoT-Gateway and Efficacy of Compressive Sensing Under Real-time Constraints

Author: Alinier Guillaume
Amira Abbes
Bensaali Faycal
Dimitrakopoulos George
Disi Mohammed Al
Djelouat Hamza
Kotronis Christos
Politis Elena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Remote health monitoring is becoming indispensable, though, Internet of Things (IoTs)-based solutions have many implementation challenges, including energy consumption at the sensing node, and delay and instability due to cloud computing. Compressive sensing (CS) has been explored as a method to extend the battery lifetime of medical wearable devices. However, it is usually associated with computational complexity at the decoding end, increasing the latency of the system. Meanwhile, mobile processors are becoming computationally stronger and more efficient. Heterogeneous multicore platforms (HMPs) offer a local processing solution that can alleviate the limitations of remote signal processing. This paper demonstrates the real-time performance of compressed ECG reconstruction on ARM's big.LITTLE HMP and the advantages they provide as the primary processing unit of the IoT architecture. It also investigates the efficacy of CS in minimizing power consumption of a wearable device under real-time and hardware constraints. Results show that both the orthogonal matching pursuit and subspace pursuit reconstruction algorithms can be executed on the platform in real time and yield optimum performance on a single A15 core at minimum frequency. The CS extends the battery life of wearable medical devices up to 15.4% considering ECGs suitable for wellness applications and up to 6.6% for clinical grade ECGs. Energy consumption at the gateway is largely due to an active internet connection; hence, processing the signals locally both mitigates system's latency and improves gateway's battery life. Many remote health solutions can benefit from an architecture centered around the use of HMPs, a step toward better remote health monitoring systems.Peer reviewedFinal Published versio

Robust and Traffic Aware Medium Access Control Mechanisms for Energy-Efficient mm-Wave Wireless Network-on-Chip Architectures

Author: Mansoor Naseef
Publication venue: RIT Scholar Works
Publication date: 07/08/2017
Field of study

To cater to the performance/watt needs, processors with multiple processing cores on the same chip have become the de-facto design choice. In such multicore systems, Network-on-Chip (NoC) serves as a communication infrastructure for data transfer among the cores on the chip. However, conventional metallic interconnect based NoCs are constrained by their long multi-hop latencies and high power consumption, limiting the performance gain in these systems. Among, different alternatives, due to the CMOS compatibility and energy-efficiency, low-latency wireless interconnect operating in the millimeter wave (mm-wave) band is nearer term solution to this multi-hop communication problem. This has led to the recent exploration of millimeter-wave (mm-wave) wireless technologies in wireless NoC architectures (WiNoC). To realize the mm-wave wireless interconnect in a WiNoC, a wireless interface (WI) equipped with on-chip antenna and transceiver circuit operating at 60GHz frequency range is integrated to the ports of some NoC switches. The WIs are also equipped with a medium access control (MAC) mechanism that ensures a collision free and energy-efficient communication among the WIs located at different parts on the chip. However, due to shrinking feature size and complex integration in CMOS technology, high-density chips like multicore systems are prone to manufacturing defects and dynamic faults during chip operation. Such failures can result in permanently broken wireless links or cause the MAC to malfunction in a WiNoC. Consequently, the energy-efficient communication through the wireless medium will be compromised. Furthermore, the energy efficiency in the wireless channel access is also dependent on the traffic pattern of the applications running on the multicore systems. Due to the bursty and self-similar nature of the NoC traffic patterns, the traffic demand of the WIs can vary both spatially and temporally. Ineffective management of such traffic variation of the WIs, limits the performance and energy benefits of the novel mm-wave interconnect technology. Hence, to utilize the full potential of the novel mm-wave interconnect technology in WiNoCs, design of a simple, fair, robust, and efficient MAC is of paramount importance. The main goal of this dissertation is to propose the design principles for robust and traffic-aware MAC mechanisms to provide high bandwidth, low latency, and energy-efficient data communication in mm-wave WiNoCs. The proposed solution has two parts. In the first part, we propose the cross-layer design methodology of robust WiNoC architecture that can minimize the effect of permanent failure of the wireless links and recover from transient failures caused by single event upsets (SEU). Then, in the second part, we present a traffic-aware MAC mechanism that can adjust the transmission slots of the WIs based on the traffic demand of the WIs. The proposed MAC is also robust against the failure of the wireless access mechanism. Finally, as future research directions, this idea of traffic awareness is extended throughout the whole NoC by enabling adaptiveness in both wired and wireless interconnection fabric

RIT Scholar Works

A Survey of Prediction and Classification Techniques in Multicore Processor Systems

Author: Ababei Cristinel
Moghaddam Milad Ghorbani
Publication venue: e-Publications@Marquette
Publication date: 01/05/2019
Field of study

In multicore processor systems, being able to accurately predict the future provides new optimization opportunities, which otherwise could not be exploited. For example, an oracle able to predict a certain application\u27s behavior running on a smart phone could direct the power manager to switch to appropriate dynamic voltage and frequency scaling modes that would guarantee minimum levels of desired performance while saving energy consumption and thereby prolonging battery life. Using predictions enables systems to become proactive rather than continue to operate in a reactive manner. This prediction-based proactive approach has become increasingly popular in the design and optimization of integrated circuits and of multicore processor systems. Prediction transforms from simple forecasting to sophisticated machine learning based prediction and classification that learns from existing data, employs data mining, and predicts future behavior. This can be exploited by novel optimization techniques that can span across all layers of the computing stack. In this survey paper, we present a discussion of the most popular techniques on prediction and classification in the general context of computing systems with emphasis on multicore processors. The paper is far from comprehensive, but, it will help the reader interested in employing prediction in optimization of multicore processor systems

Virtualization services: scalable methods for virtualizing multicore systems

Author: Raj Himanshu
Publication venue: Georgia Institute of Technology
Publication date: 10/01/2008
Field of study

Multi-core technology is bringing parallel processing capabilities from servers to laptops and even handheld devices. At the same time, platform support for system virtualization is making it easier to consolidate server and client resources, when and as needed by applications. This consolidation is achieved by dynamically mapping the virtual machines on which applications run to underlying physical machines and their processing cores. Low cost processor and I/O virtualization methods efficiently scaled to different numbers of processing cores and I/O devices are key enablers of such consolidation. This dissertation develops and evaluates new methods for scaling virtualization functionality to multi-core and future many-core systems. Specifically, it re-architects virtualization functionality to improve scalability and better exploit multi-core system resources. Results from this work include a self-virtualized I/O abstraction, which virtualizes I/O so as to flexibly use different platforms' processing and I/O resources. Flexibility affords improved performance and resource usage and most importantly, better scalability than that offered by current I/O virtualization solutions. Further, by describing system virtualization as a service provided to virtual machines and the underlying computing platform, this service can be enhanced to provide new and innovative functionality. For example, a virtual device may provide obfuscated data to guest operating systems to maintain data privacy; it could mask differences in device APIs or properties to deal with heterogeneous underlying resources; or it could control access to data based on the ``trust' properties of the guest VM. This thesis demonstrates that extended virtualization services are superior to existing operating system or user-level implementations of such functionality, for multiple reasons. First, this solution technique makes more efficient use of key performance-limiting resource in multi-core systems, which are memory and I/O bandwidth. Second, this solution technique better exploits the parallelism inherent in multi-core architectures and exhibits good scalability properties, in part because at the hypervisor level, there is greater control in precisely which and how resources are used to realize extended virtualization services. Improved control over resource usage makes it possible to provide value-added functionalities for both guest VMs and the platform. Specific instances of virtualization services described in this thesis are the network virtualization service that exploits heterogeneous processing cores, a storage virtualization service that provides location transparent access to block devices by extending the functionality provided by network virtualization service, a multimedia virtualization service that allows efficient media device sharing based on semantic information, and an object-based storage service with enhanced access control.Ph.D.Committee Chair: Schwan, Karsten; Committee Member: Ahamad, Mustaq; Committee Member: Fujimoto, Richard; Committee Member: Gavrilovska, Ada; Committee Member: Owen, Henry; Committee Member: Xenidis, Jim

SEUSS: rapid serverless deployment using environment snapshots

Author: Appavoo Jonathan
Awad Yara
Cadden James
Dong Han
Krieger Orran
Unger Thomas
Publication venue
Publication date: 01/01/2019
Field of study

Modern FaaS systems perform well in the case of repeat executions when function working sets stay small. However, these platforms are less effective when applied to more complex, large-scale and dynamic workloads. In this paper, we introduce SEUSS (serverless execution via unikernel snapshot stacks), a new system-level approach for rapidly deploying serverless functions. Through our approach, we demonstrate orders of magnitude improvements in function start times and cacheability, which improves common re-execution paths while also unlocking previously-unsupported large-scale bursty workloads.Published versio

arXiv.org e-Print Archive

Boston University Institutional Repository (OpenBU)

FDMA Enabled Phase-based Wireless Network-on-Chip using Graphene-based THz-band Antennas

Author: Shenoy Manur Deekshith
Publication venue: RIT Scholar Works
Publication date: 01/11/2017
Field of study

The future growth in System-on-chip design is moving in the direction of multicore systems. Design of efficient interconnects between cores are crucial for improving the performance of a multicore processor. Such trends are seen due to the benefits the multicore systems provide in terms of power reduction and scalability. Network-on-chips (NoC) are viewed as an emerging solution in the design of interconnects in multicore systems. However, Traditional Network-on-chip architectures are no longer able to satisfy the performance requirements due to long distance communication over multi-hop wireline paths. Multi-hop communication leads to higher energy consumption, increase in latency and reduction in bandwidth. Research in recent years has explored emerging technologies such as 3D integration, photonic and radio frequency based Network-on-chips. The use of wireless interconnects using mm-wave antennas are able to alleviate the performance issues in a wireline interconnect system. However, to satisfy the increasing demand for higher bandwidth and lower energy consumption, Wireless Network-on-Chip enabled with high speed direct links operating in THz band between distant cores is desired. Recent research has brought to light highly efficient graphene-based antennas operating in THz band. These antennas can provide high data rate and are found to consume less power with low area overheads. In this thesis, an innovative approach using novel devices based on graphene structures is proposed to provide a high-performance on-chip interconnection. This novel approach combines the regular NoC structure with the proposed wireless infrastructure to exploit the performance benefits. An architecture with wireless interfaces on every core is explored in this work. Simultaneous multiple communications in a network can be achieved by adopting Frequency Division Multiple access (FDMA). However, in a system where all cores are equipped with a wireless interface, FDMA requires more number of frequency bands. This becomes difficult to achieve as the system scales and the number of cores increase. Therefore, a FDMA protocol along with a 4-phased repetitive multi-band architecture is envisioned in this work. The phase-based protocol allows multiple wireless links to be active at a time, the phase-based protocol along with the FDMA protocol provides a reliable data transfer between cores with lesser number of frequency bands. In this thesis, an architecture with a combination of FDMA and phase-based protocol using point-to-point graphene-based wireless links is proposed. The proposed architecture is also extended for a multichip system. With cycle accurate system-level simulations, it is shown that the proposed architecture provides huge gains in performance and energy-efficiency in data transfer both in NoC based multicore and multichip systems

RIT Scholar Works