1,925 research outputs found
Secure storage systems for untrusted cloud environments
The cloud has become established for applications that need to be scalable and highly
available. However, moving data to data centers owned and operated by a third party,
i.e., the cloud provider, raises security concerns because a cloud provider could easily
access and manipulate the data or program flow, preventing the cloud from being
used for certain applications, like medical or financial.
Hardware vendors are addressing these concerns by developing Trusted Execution
Environments (TEEs) that make the CPU state and parts of memory inaccessible from
the host software. While TEEs protect the current execution state, they do not provide
security guarantees for data which does not fit nor reside in the protected memory
area, like network and persistent storage.
In this work, we aim to address TEEs’ limitations in three different ways, first we
provide the trust of TEEs to persistent storage, second we extend the trust to multiple
nodes in a network, and third we propose a compiler-based solution for accessing
heterogeneous memory regions. More specifically,
• SPEICHER extends the trust provided by TEEs to persistent storage. SPEICHER
implements a key-value interface. Its design is based on LSM data structures, but
extends them to provide confidentiality, integrity, and freshness for the stored
data. Thus, SPEICHER can prove to the client that the data has not been tampered
with by an attacker.
• AVOCADO is a distributed in-memory key-value store (KVS) that extends the
trust that TEEs provide across the network to multiple nodes, allowing KVSs to
scale beyond the boundaries of a single node. On each node, AVOCADO carefully
divides data between trusted memory and untrusted host memory, to maximize
the amount of data that can be stored on each node. AVOCADO leverages the
fact that we can model network attacks as crash-faults to trust other nodes with
a hardened ABD replication protocol.
• TOAST is based on the observation that modern high-performance systems
often use several different heterogeneous memory regions that are not easily
distinguishable by the programmer. The number of regions is increased by the
fact that TEEs divide memory into trusted and untrusted regions. TOAST is a
compiler-based approach to unify access to different heterogeneous memory
regions and provides programmability and portability. TOAST uses a
load/store interface to abstract most library interfaces for different memory
regions
Shufflecake: Plausible Deniability for Multiple Hidden Filesystems on Linux
We present Shufflecake, a new plausible deniability design to hide the existence of encrypted data on a storage medium making it very difficult for an adversary to prove the existence of such data. Shufflecake can be considered a ``spiritual successor\u27\u27 of tools such as TrueCrypt and VeraCrypt, but vastly improved: it works natively on Linux, it supports any filesystem of choice, and can manage multiple volumes per device, so to make deniability of the existence of hidden partitions really plausible.
Compared to ORAM-based solutions, Shufflecake is extremely fast and simpler but does not offer native protection against multi-snapshot adversaries. However, we discuss security extensions that are made possible by its architecture, and we show evidence why these extensions might be enough to thwart more powerful adversaries.
We implemented Shufflecake as an in-kernel tool for Linux, adding useful features, and we benchmarked its performance showing only a minor slowdown compared to a base encrypted system. We believe Shufflecake represents a useful tool for people whose freedom of expression is threatened by repressive authorities or dangerous criminal organizations, in particular: whistleblowers, investigative journalists, and activists for human rights in oppressive regimes
DORAM revisited: Maliciously secure RAM-MPC with logarithmic overhead
Distributed Oblivious Random Access Memory (DORAM) is a secure multiparty protocol that allows a group of participants holding a secret-shared array to read and write to secret-shared locations within the array. The efficiency of a DORAM protocol is measured by the amount of communication and computation required per read/write query into the array. DORAM protocols are a necessary ingredient for executing Secure Multiparty Computation (MPC) in the RAM model.
Although DORAM has been widely studied, all existing DORAM protocols have focused on the setting where the DORAM servers are semi-honest. Generic techniques for upgrading a semi-honest DORAM protocol to the malicious model typically increase the asymptotic communication complexity of the DORAM scheme.
In this work, we present a 3-party DORAM protocol which requires communication and computation per query, for a database of size with -bit values, where is the security parameter. Our hidden constants in a big-O nation are small. We show that our protocol is UC-secure in the presence of a malicious, static adversary. This matches the communication and computation complexity of the best semi-honest DORAM protocol, and is the first malicious DORAM protocol with this complexity
Towards Fast and Scalable Private Inference
Privacy and security have rapidly emerged as first order design constraints.
Users now demand more protection over who can see their data (confidentiality)
as well as how it is used (control). Here, existing cryptographic techniques
for security fall short: they secure data when stored or communicated but must
decrypt it for computation. Fortunately, a new paradigm of computing exists,
which we refer to as privacy-preserving computation (PPC). Emerging PPC
technologies can be leveraged for secure outsourced computation or to enable
two parties to compute without revealing either users' secret data. Despite
their phenomenal potential to revolutionize user protection in the digital age,
the realization has been limited due to exorbitant computational,
communication, and storage overheads.
This paper reviews recent efforts on addressing various PPC overheads using
private inference (PI) in neural network as a motivating application. First,
the problem and various technologies, including homomorphic encryption (HE),
secret sharing (SS), garbled circuits (GCs), and oblivious transfer (OT), are
introduced. Next, a characterization of their overheads when used to implement
PI is covered. The characterization motivates the need for both GCs and HE
accelerators. Then two solutions are presented: HAAC for accelerating GCs and
RPU for accelerating HE. To conclude, results and effects are shown with a
discussion on what future work is needed to overcome the remaining overheads of
PI.Comment: Appear in the 20th ACM International Conference on Computing
Frontier
Adaptive Microarchitectural Optimizations to Improve Performance and Security of Multi-Core Architectures
With the current technological barriers, microarchitectural optimizations are increasingly important to ensure performance scalability of computing systems. The shift to multi-core architectures increases the demands on the memory system, and amplifies the role of microarchitectural optimizations in performance improvement. In a multi-core system, microarchitectural resources are usually shared, such as the cache, to maximize utilization but sharing can also lead to contention and lower performance. This can be mitigated through partitioning of shared caches.However, microarchitectural optimizations which were assumed to be fundamentally secure for a long time, can be used in side-channel attacks to exploit secrets, as cryptographic keys. Timing-based side-channels exploit predictable timing variations due to the interaction with microarchitectural optimizations during program execution. Going forward, there is a strong need to be able to leverage microarchitectural optimizations for performance without compromising security. This thesis contributes with three adaptive microarchitectural resource management optimizations to improve security and/or\ua0performance\ua0of multi-core architectures\ua0and a systematization-of-knowledge of timing-based side-channel attacks.\ua0We observe that to achieve high-performance cache partitioning in a multi-core system\ua0three requirements need to be met: i) fine-granularity of partitions, ii) locality-aware placement and iii) frequent changes. These requirements lead to\ua0high overheads for current centralized partitioning solutions, especially as the number of cores in the\ua0system increases. To address this problem, we present an adaptive and scalable cache partitioning solution (DELTA) using a distributed and asynchronous allocation algorithm. The\ua0allocations occur through core-to-core challenges, where applications with larger performance benefit will gain cache capacity. The\ua0solution is implementable in hardware, due to low computational complexity, and can scale to large core counts.According to our analysis, better performance can be achieved by coordination of multiple optimizations for different resources, e.g., off-chip bandwidth and cache, but is challenging due to the increased number of possible allocations which need to be evaluated.\ua0Based on these observations, we present a solution (CBP) for coordinated management of the optimizations: cache partitioning, bandwidth partitioning and prefetching.\ua0Efficient allocations, considering the inter-resource interactions and trade-offs, are achieved using local resource managers to limit the solution space.The continuously growing number of\ua0side-channel attacks leveraging\ua0microarchitectural optimizations prompts us to review attacks and defenses to understand the vulnerabilities of different microarchitectural optimizations. We identify the four root causes of timing-based side-channel attacks: determinism, sharing, access violation\ua0and information flow.\ua0Our key insight is that eliminating any of the exploited root causes, in any of the attack steps, is enough to provide protection.\ua0Based on our framework, we present a systematization of the attacks and defenses on a wide range of microarchitectural optimizations, which highlights their key similarities.\ua0Shared caches are an attractive attack surface for side-channel attacks, while defenses need to be efficient since the cache is crucial for performance.\ua0To address this issue, we present an adaptive and scalable cache partitioning solution (SCALE) for protection against cache side-channel attacks. The solution leverages randomness,\ua0and provides quantifiable and information theoretic security guarantees using differential privacy. The solution closes the performance gap to a state-of-the-art non-secure allocation policy for a mix of secure and non-secure applications
ZeroLeak: Using LLMs for Scalable and Cost Effective Side-Channel Patching
Security critical software, e.g., OpenSSL, comes with numerous side-channel
leakages left unpatched due to a lack of resources or experts. The situation
will only worsen as the pace of code development accelerates, with developers
relying on Large Language Models (LLMs) to automatically generate code. In this
work, we explore the use of LLMs in generating patches for vulnerable code with
microarchitectural side-channel leakages. For this, we investigate the
generative abilities of powerful LLMs by carefully crafting prompts following a
zero-shot learning approach. All generated code is dynamically analyzed by
leakage detection tools, which are capable of pinpointing information leakage
at the instruction level leaked either from secret dependent accesses or
branches or vulnerable Spectre gadgets, respectively. Carefully crafted prompts
are used to generate candidate replacements for vulnerable code, which are then
analyzed for correctness and for leakage resilience. From a cost/performance
perspective, the GPT4-based configuration costs in API calls a mere few cents
per vulnerability fixed. Our results show that LLM-based patching is far more
cost-effective and thus provides a scalable solution. Finally, the framework we
propose will improve in time, especially as vulnerability detection tools and
LLMs mature
Drones, Signals, and the Techno-Colonisation of Landscape
This research project is a cross-disciplinary, creative practice-led investigation that interrogates increasing military interest in the electromagnetic spectrum (EMS). The project’s central argument is that painted visualisations of normally invisible aspects of contemporary EMS-enabled warfare can reveal useful, novel, and speculative but informed perspectives that contribute to debates about war and technology. It pays particular attention to how visualising normally invisible signals reveals an insidious techno-colonisation of our extended environment from Earth to orbiting satellites
Waks-On/Waks-Off: Fast Oblivious Offline/Online Shuffling and Sorting with Waksman Networks
As more privacy-preserving solutions leverage trusted execution environments (TEEs) like Intel SGX, it becomes pertinent that these solutions can by design thwart TEE side-channel attacks that research has brought to light. In particular, such solutions need to be fully oblivious to circumvent leaking private information through memory or timing side channels.
In this work, we present fast fully oblivious algorithms for shuffling and sorting data. Oblivious shuffling and sorting are two fundamental primitives that are frequently used for permuting data in privacy-preserving solutions. We present novel oblivious shuffling and sorting algorithms in the offline/online model such that the bulk of the computation can be done in an offline phase that is independent of the data to be permuted. The resulting online phase provides performance improvements over state-of-the-art oblivious shuffling and sorting algorithms both asymptotically ( vs. ) and concretely ( and speedups), when permuting items each of size .
Our work revisits Waksman networks, and it uses the key observation that setting the control bits of a Waksman network for a uniformly random shuffle is independent of the data to be shuffled. However, setting the control bits of a Waksman network efficiently and fully obliviously poses a challenge, and we provide a novel algorithm to this end. The total costs (inclusive of offline computation) of our WaksShuffle shuffling algorithm and our WaksSort sorting algorithm are lower than all other fully oblivious shuffling and sorting algorithms when the items are at least moderately sized (i.e., > 1400 B), and the performance gap only widens as the item sizes increase. Furthermore, WaksShuffle improves the online cost of oblivious shuffling by for shuffling items of any size; similarly WaksShuffle+QS, our other sorting algorithm, provides speedups in the online cost of oblivious sorting
Design Space Exploration of Sparsity-Aware Application-Specific Spiking Neural Network Accelerators
Spiking Neural Networks (SNNs) offer a promising alternative to Artificial
Neural Networks (ANNs) for deep learning applications, particularly in
resource-constrained systems. This is largely due to their inherent sparsity,
influenced by factors such as the input dataset, the length of the spike train,
and the network topology. While a few prior works have demonstrated the
advantages of incorporating sparsity into the hardware design, especially in
terms of reducing energy consumption, the impact on hardware resources has not
yet been explored. This is where design space exploration (DSE) becomes
crucial, as it allows for the optimization of hardware performance by tailoring
both the hardware and model parameters to suit specific application needs.
However, DSE can be extremely challenging given the potentially large design
space and the interplay of hardware architecture design choices and
application-specific model parameters.
In this paper, we propose a flexible hardware design that leverages the
sparsity of SNNs to identify highly efficient, application-specific accelerator
designs. We develop a high-level, cycle-accurate simulation framework for this
hardware and demonstrate the framework's benefits in enabling detailed and
fine-grained exploration of SNN design choices, such as the layer-wise
logical-to-hardware ratio (LHR). Our experimental results show that our design
can (i) achieve up to reduction in hardware resources and (ii) deliver a
speed increase of up to , while requiring fewer hardware
resources compared to sparsity-oblivious designs. We further showcase the
robustness of our framework by varying spike train lengths with different
neuron population sizes to find the optimal trade-off points between accuracy
and hardware latency
Towards trustworthy computing on untrustworthy hardware
Historically, hardware was thought to be inherently secure and trusted due to its
obscurity and the isolated nature of its design and manufacturing. In the last two
decades, however, hardware trust and security have emerged as pressing issues.
Modern day hardware is surrounded by threats manifested mainly in undesired
modifications by untrusted parties in its supply chain, unauthorized and pirated
selling, injected faults, and system and microarchitectural level attacks. These threats,
if realized, are expected to push hardware to abnormal and unexpected behaviour
causing real-life damage and significantly undermining our trust in the electronic and
computing systems we use in our daily lives and in safety critical applications. A
large number of detective and preventive countermeasures have been proposed in
literature. It is a fact, however, that our knowledge of potential consequences to
real-life threats to hardware trust is lacking given the limited number of real-life
reports and the plethora of ways in which hardware trust could be undermined. With
this in mind, run-time monitoring of hardware combined with active mitigation of
attacks, referred to as trustworthy computing on untrustworthy hardware, is proposed
as the last line of defence. This last line of defence allows us to face the issue of live
hardware mistrust rather than turning a blind eye to it or being helpless once it occurs.
This thesis proposes three different frameworks towards trustworthy computing
on untrustworthy hardware. The presented frameworks are adaptable to different
applications, independent of the design of the monitored elements, based on
autonomous security elements, and are computationally lightweight. The first
framework is concerned with explicit violations and breaches of trust at run-time,
with an untrustworthy on-chip communication interconnect presented as a potential
offender. The framework is based on the guiding principles of component guarding,
data tagging, and event verification. The second framework targets hardware elements
with inherently variable and unpredictable operational latency and proposes a
machine-learning based characterization of these latencies to infer undesired latency
extensions or denial of service attacks. The framework is implemented on a DDR3
DRAM after showing its vulnerability to obscured latency extension attacks. The
third framework studies the possibility of the deployment of untrustworthy hardware
elements in the analog front end, and the consequent integrity issues that might arise
at the analog-digital boundary of system on chips. The framework uses machine
learning methods and the unique temporal and arithmetic features of signals at this
boundary to monitor their integrity and assess their trust level
- …