Search CORE

159 research outputs found

Packet Chasing: Spying on Network Packets over a Cache Side-Channel

Author: Taram Mohammadkazem
Tullsen Dean
Venkat Ashish
Publication venue
Publication date: 01/01/2019
Field of study

This paper presents Packet Chasing, an attack on the network that does not require access to the network, and works regardless of the privilege level of the process receiving the packets. A spy process can easily probe and discover the exact cache location of each buffer used by the network driver. Even more useful, it can discover the exact sequence in which those buffers are used to receive packets. This then enables packet frequency and packet sizes to be monitored through cache side channels. This allows both covert channels between a sender and a remote spy with no access to the network, as well as direct attacks that can identify, among other things, the web page access patterns of a victim on the network. In addition to identifying the potential attack, this work proposes a software-based short-term mitigation as well as a light-weight, adaptive, cache partitioning mitigation that blocks the interference of I/O and CPU requests in the last-level cache

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Selective value prediction

Author: Brad Calder
Calder B.
Dean M. Tullsen
Glenn Reinman
Jacobsen E.
Kessler R.E.
Rychlik B.
Tullsen D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy

Author: Elthakeb Ahmed Taha
Esmaeilzadeh Hadi
Jalali Ali
Mireshghallah Fatemehsadat
Taram Mohammadkazem
Tullsen Dean
Publication venue
Publication date: 20/02/2021
Field of study

When receiving machine learning services from the cloud, the provider does not need to receive all features; in fact, only a subset of the features are necessary for the target prediction task. Discerning this subset is the key problem of this work. We formulate this problem as a gradient-based perturbation maximization method that discovers this subset in the input feature space with respect to the functionality of the prediction model used by the provider. After identifying the subset, our framework, Cloak, suppresses the rest of the features using utility-preserving constant values that are discovered through a separate gradient-based optimization process. We show that Cloak does not necessarily require collaboration from the service provider beyond its normal service, and can be applied in scenarios where we only have black-box access to the service provider's model. We theoretically guarantee that Cloak's optimizations reduce the upper bound of the Mutual Information (MI) between the data and the sifted representations that are sent out. Experimental results show that Cloak reduces the mutual information between the input and the sifted representations by 85.01% with only a negligible reduction in utility (1.42%). In addition, we show that Cloak greatly diminishes adversaries' ability to learn and infer non-conducive features.Comment: This paper is presented at the 2021 Web conference (WWW 2021

arXiv.org e-Print Archive

eScholarship - University of California

Simultaneous multithreading

Author: Tullsen Dean Michael
Publication venue
Publication date: 01/01/1996
Field of study

Thesis (Ph. D.)--University of Washington, 1996This dissertation examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar processor's functional units in a single cycle. Simultaneous multithreading significantly increases processor utilization in the face of both long instruction latencies and limited available parallelism per thread.This research presents several models of simultaneous multithreading and compares them with alternative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. The results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited in their ability to utilize the resources of a wide-issue super-scalar processor. Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. Simultaneous multithreading is also an attractive alternative to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources.This dissertation also shows that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. An architecture for simultaneous multithreading is presented that achieves three goals: (1) it minimizes the architectural impact on a conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads which will use the processor most efficiently each cycle, thereby providing the "best" instructions to the processor.An analytic response-time model shows that the benefits of simultaneous multithreading in a multiprogrammed environment are not limited to increased throughput. Those throughput increases lead to significant reductions in queueing time for runnable processes, leading to response-time improvements that in many cases are significantly greater than the throughput improvements themselves

DSpace at The University of Washington

Recommended from our members

A New Direction in Tree Based Search Engine Architectures Using Balanced Single Port Memories

Author: Baboescu Florin
Tullsen Dean
Publication venue: eScholarship, University of California
Publication date: 15/10/2004
Field of study

This paper examines the microarchitecture of a novel network search processor which provides both high execution throughput and balanced memory distribution. Pipelined forwarding engines are used in core routers to meet speed demands. Most algorithmic-based solutions for these engines use tree based search structures. The tree traversal is pipelined across a number of stages to achieve high throughput. Prior work has shown that the pipelining of these trees results in unevenly distributed memory. To address this imbalance, conventional approaches use either complex dynamic memory allocation schemes (which dramatically increase hardware complexity) or over provision each of the pipeline stages (which results in memory waste). This paper has three primary contributions: i) a novel logical pipeline architecture in which search operations can start execution at {\em any} stage, ii) a new allocation algorithm that leverages this degree of freedom to eliminate memory imbalance and thus memory waste, iii) a practical implementation of our logical pipeline which eliminates non-neighbor communication and guarantees in-order completion without using a dedicated task scheduler. The implementation also minimizes interconnect complexity by having searches enter and exit the pipeline through one location (while still allowing the search to begin at any stage). We validate our new scheme by implementing and simulating state of the art solutions for IPv4 lookup, VPN forwarding and packet classification. In our simulation we use both real life and synthetically generated routing tables and classifiers. We show that our new pipeline scheme and memory allocator can provide searches with a memory allocation efficiency that is within 1\% of non-pipelined schemes. This allows us to obtain a forwarding rate of 1 packet every 6 ns using memories with 2 ns cycle time, with a constant latency of 48 ns and near-perfect memory efficiency.Pre-2018 CSE ID: CS2004-079

eScholarship - University of California

Introduction

Author: Brad Calder
Dean Tullsen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Editorial

Author: Brad Calder
Dean Tullsen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Introduction

Author: Brad Calder
Dean Tullsen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref