Search CORE

3,613 research outputs found

Near-Memory Address Translation

Author: Falsafi Babak
Jevdjic Djordje
Picorel Javier
Publication venue
Publication date: 21/08/2017
Field of study

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Brief Announcement: Persistent Software Combining

Author: Fatourou Panagiota
Kallimanis Nikolaos D.
Kosmas Eleftherios
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 35th International Symposium on Distributed Computing (DISC 2021)
Publication date: 01/01/2021
Field of study

We study the performance power of software combining in designing recoverable algorithms and data structures. We present two recoverable synchronization protocols, one blocking and another wait-free, which illustrate how to use software combining to achieve both low persistence and synchronization cost. Our experiments show that these protocols outperform by far state-of-the-art recoverable universal constructions and transactional memory systems. We built recoverable queues and stacks, based on these protocols, that exhibit much better performance than previous such implementations

Dagstuhl Research Online Publication Server

SemperOS: A Distributed Capability System

Author: Asmussen Nils
Bhatotia Pramod
Hille Matthias
Härtig Hermann
Publication venue
Publication date: 10/07/2019
Field of study

Edinburgh Research Explorer

Coupled Depth Learning

Author: Baig Mohammad Haris
Torresani Lorenzo
Publication venue
Publication date: 09/02/2016
Field of study

In this paper we propose a method for estimating depth from a single image using a coarse to fine approach. We argue that modeling the fine depth details is easier after a coarse depth map has been computed. We express a global (coarse) depth map of an image as a linear combination of a depth basis learned from training examples. The depth basis captures spatial and statistical regularities and reduces the problem of global depth estimation to the task of predicting the input-specific coefficients in the linear combination. This is formulated as a regression problem from a holistic representation of the image. Crucially, the depth basis and the regression function are {\bf coupled} and jointly optimized by our learning scheme. We demonstrate that this results in a significant improvement in accuracy compared to direct regression of depth pixel values or approaches learning the depth basis disjointly from the regression function. The global depth estimate is then used as a guidance by a local refinement method that introduces depth details that were not captured at the global level. Experiments on the NYUv2 and KITTI datasets show that our method outperforms the existing state-of-the-art at a considerably lower computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation

arXiv.org e-Print Archive

Crossref

Atomic-SDN: Is Synchronous Flooding the Solution to Software-Defined Networking in IoT?

Author: Baddeley Michael
Nejabati Reza
Oikonomou George
Raza Usman
Simeonidou Dimitra
Sooriyabandara Mahesh
Stanoev Aleksandar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/05/2019
Field of study

The adoption of Software Defined Networking (SDN) within traditional networks has provided operators the ability to manage diverse resources and easily reconfigure networks as requirements change. Recent research has extended this concept to IEEE 802.15.4 low-power wireless networks, which form a key component of the Internet of Things (IoT). However, the multiple traffic patterns necessary for SDN control makes it difficult to apply this approach to these highly challenging environments. This paper presents Atomic-SDN, a highly reliable and low-latency solution for SDN in low-power wireless. Atomic-SDN introduces a novel Synchronous Flooding (SF) architecture capable of dynamically configuring SF protocols to satisfy complex SDN control requirements, and draws from the authors' previous experiences in the IEEE EWSN Dependability Competition: where SF solutions have consistently outperformed other entries. Using this approach, Atomic-SDN presents considerable performance gains over other SDN implementations for low-power IoT networks. We evaluate Atomic-SDN through simulation and experimentation, and show how utilizing SF techniques provides latency and reliability guarantees to SDN control operations as the local mesh scales. We compare Atomic-SDN against other SDN implementations based on the IEEE 802.15.4 network stack, and establish that Atomic-SDN improves SDN control by orders-of-magnitude across latency, reliability, and energy-efficiency metrics

arXiv.org e-Print Archive

Explore Bristol Research

A New System Architecture for Heterogeneous Compute Units

Author: Asmussen Nils
Publication venue
Publication date: 09/08/2019
Field of study

The ongoing trend to more heterogeneous systems forces us to rethink the design of systems. In this work, I study a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling (1) isolation and secure communication between all types of CUs, (2) a direct interaction of all CUs, removing the conventional CPU from the critical path, and (3) access to operating system (OS) services such as file systems and network stacks for all CUs. To study this system design, I am using a hardware/software co-design based on two key ideas: 1) introduce a new hardware component next to each CU used by the OS as the CUs' common interface and 2) let the OS kernel control applications remotely from a different CU. The hardware component is called data transfer unit (DTU) and offers the minimal set of features to reach the stated goals: secure message passing and memory access. The OS is called M³ and runs its kernel on a dedicated CU and runs the OS services and applications on the remaining CUs. The kernel is responsible for establishing DTU-based communication channels between services and applications. After a channel has been set up, services and applications communicate directly without involving the kernel. This approach allows to support arbitrary CUs as aforementioned first-class citizens, ranging from fixed-function accelerators to complex general-purpose cores

Qucosa

HSSS - Hochschulschriftenserver der SLUB

Technische Universität Dresden: Qucosa