159 research outputs found
Packet Chasing: Spying on Network Packets over a Cache Side-Channel
This paper presents Packet Chasing, an attack on the network that does not
require access to the network, and works regardless of the privilege level of
the process receiving the packets. A spy process can easily probe and discover
the exact cache location of each buffer used by the network driver. Even more
useful, it can discover the exact sequence in which those buffers are used to
receive packets. This then enables packet frequency and packet sizes to be
monitored through cache side channels. This allows both covert channels between
a sender and a remote spy with no access to the network, as well as direct
attacks that can identify, among other things, the web page access patterns of
a victim on the network. In addition to identifying the potential attack, this
work proposes a software-based short-term mitigation as well as a light-weight,
adaptive, cache partitioning mitigation that blocks the interference of I/O and
CPU requests in the last-level cache
Not All Features Are Equal: Discovering Essential Features for Preserving Prediction Privacy
When receiving machine learning services from the cloud, the provider does
not need to receive all features; in fact, only a subset of the features are
necessary for the target prediction task. Discerning this subset is the key
problem of this work. We formulate this problem as a gradient-based
perturbation maximization method that discovers this subset in the input
feature space with respect to the functionality of the prediction model used by
the provider. After identifying the subset, our framework, Cloak, suppresses
the rest of the features using utility-preserving constant values that are
discovered through a separate gradient-based optimization process. We show that
Cloak does not necessarily require collaboration from the service provider
beyond its normal service, and can be applied in scenarios where we only have
black-box access to the service provider's model. We theoretically guarantee
that Cloak's optimizations reduce the upper bound of the Mutual Information
(MI) between the data and the sifted representations that are sent out.
Experimental results show that Cloak reduces the mutual information between the
input and the sifted representations by 85.01% with only a negligible reduction
in utility (1.42%). In addition, we show that Cloak greatly diminishes
adversaries' ability to learn and infer non-conducive features.Comment: This paper is presented at the 2021 Web conference (WWW 2021
Simultaneous multithreading
Thesis (Ph. D.)--University of Washington, 1996This dissertation examines simultaneous multithreading, a technique permitting several independent threads to issue instructions to a superscalar processor's functional units in a single cycle. Simultaneous multithreading significantly increases processor utilization in the face of both long instruction latencies and limited available parallelism per thread.This research presents several models of simultaneous multithreading and compares them with alternative organizations: a wide superscalar, a fine-grain multithreaded processor, and single-chip, multiple-issue multiprocessing architectures. The results show that both (single-threaded) superscalar and fine-grain multithreaded architectures are limited in their ability to utilize the resources of a wide-issue super-scalar processor. Simultaneous multithreading has the potential to achieve 4 times the throughput of a superscalar, and double that of fine-grain multithreading. Simultaneous multithreading is also an attractive alternative to single-chip multiprocessors; simultaneous multithreaded processors with a variety of organizations outperform corresponding conventional multiprocessors with similar execution resources.This dissertation also shows that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. An architecture for simultaneous multithreading is presented that achieves three goals: (1) it minimizes the architectural impact on a conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads which will use the processor most efficiently each cycle, thereby providing the "best" instructions to the processor.An analytic response-time model shows that the benefits of simultaneous multithreading in a multiprogrammed environment are not limited to increased throughput. Those throughput increases lead to significant reductions in queueing time for runnable processes, leading to response-time improvements that in many cases are significantly greater than the throughput improvements themselves
Recommended from our members
A New Direction in Tree Based Search Engine Architectures Using Balanced Single Port Memories
This paper examines the microarchitecture of a novel network search
processor which provides both high execution throughput and balanced memory
distribution. Pipelined forwarding engines are used in core routers to meet
speed demands. Most algorithmic-based solutions for these engines use tree
based search structures. The tree traversal is pipelined across a number of
stages to achieve high throughput. Prior work has shown that the pipelining of
these trees results in unevenly distributed memory. To address this imbalance,
conventional approaches use either complex dynamic memory allocation schemes
(which dramatically increase hardware complexity) or over provision each of the
pipeline stages (which results in memory waste). This paper has three primary
contributions: i) a novel logical pipeline architecture in which search
operations can start execution at {\em any} stage, ii) a new allocation
algorithm that leverages this degree of freedom to eliminate memory imbalance
and thus memory waste, iii) a practical implementation of our logical pipeline
which eliminates non-neighbor communication and guarantees in-order completion
without using a dedicated task scheduler. The implementation also minimizes
interconnect complexity by having searches enter and exit the pipeline through
one location (while still allowing the search to begin at any stage). We
validate our new scheme by implementing and simulating state of the art
solutions for IPv4 lookup, VPN forwarding and packet classification. In our
simulation we use both real life and synthetically generated routing tables and
classifiers. We show that our new pipeline scheme and memory allocator can
provide searches with a memory allocation efficiency that is within 1\% of
non-pipelined schemes. This allows us to obtain a forwarding rate of 1 packet
every 6 ns using memories with 2 ns cycle time, with a constant latency of 48
ns and near-perfect memory efficiency.Pre-2018 CSE ID: CS2004-079
- …