14 research outputs found
Covariance-aware private mean estimation without private covariance estimation
https://proceedings.neurips.cc/paper/2021/file/42778ef0b5805a96f9511e20b5611fce-Paper.pd
Fast Summary-based Whole-program Analysis to Identify Unsafe Memory Accesses in Rust
Rust is one of the most promising systems programming languages to
fundamentally solve the memory safety issues that have plagued low-level
software for over forty years. However, to accommodate the scenarios where
Rust's type rules might be too restrictive for certain systems programming and
where programmers opt for performance over security checks, Rust opens security
escape hatches allowing writing unsafe source code or calling unsafe libraries.
Consequently, unsafe Rust code and directly-linked unsafe foreign libraries may
not only introduce memory safety violations themselves but also compromise the
entire program as they run in the same monolithic address space as the safe
Rust.
This problem can be mitigated by isolating unsafe memory objects (those
accessed by unsafe code) and sandboxing memory accesses to the unsafe memory.
One category of prior work utilizes existing program analysis frameworks on
LLVM IR to identify unsafe memory objects and accesses. However, they suffer
the limitations of prolonged analysis time and low precision. In this paper, we
tackled these two challenges using summary-based whole-program analysis on
Rust's MIR. The summary-based analysis computes information on demand so as to
save analysis time. Performing analysis on Rust's MIR exploits the rich
high-level type information inherent to Rust, which is unavailable in LLVM IR.
This manuscript is a preliminary study of ongoing research. We have prototyped
a whole-program analysis for identifying both unsafe heap allocations and
memory accesses to those unsafe heap objects. We reported the overhead and the
efficacy of the analysis in this paper
Recommended from our members
New Container Architectures for Mobile, Drone, and Cloud Computing
Containers are increasingly used across many different types of computing to isolate and control apps while efficiently sharing computing resources. By using lightweight operating system virtualization, they can provide apps with a virtual computing abstraction while imposing minimal hardware requirements and a small footprint. My thesis is that new container architectures can provide additional functionality, better resource utilization, and stronger security for mobile, drone, and cloud computing. To demonstrate this, we introduce three new container architectures that enable new mobile app migration functionality, a new notion of virtual drones and efficient utilization of drone hardware, and stronger security for cloud computing by protecting containers against untrusted operating systems.
First, we introduce Flux to support multi-surface apps, apps that seamlessly run across multiple user devices, through app migration. Flux introduces two key mechanisms to overcome device heterogeneity and residual dependencies associated with app migration to enable app migration. Selective Record/Adaptive Replay to record just those device-agnostic app calls that lead to the generation of app-specific device-dependent state in services and replay them on the target. Checkpoint/Restore in Android (CRIA) to transition an app into a state in which device-specific information the app contains can be safely discarded before checkpointing and restoring the app within a containerized environment on the new device.
Second, we introduce AnDrone, a drone-as-a-service solution that makes drones accessible in the cloud. AnDrone provides a drone virtualization architecture to leverage the fact that computational costs are cheap compared to the operational and energy costs of putting a drone in the air. This enables multiple virtual drones to run simultaneously on the same physical drone at very little additional cost. To enable multiple virtual drones to run in an isolated and secure manner, each virtual drone runs its own containerized operating system instance. AnDrone introduces a new device container architecture, providing virtual drones with secure access to a full range of drone hardware devices, including sensors such as cameras and geofenced flight control.
Finally, we introduce BlackBox, a new container architecture that provides fine-grain protection of application data confidentiality and integrity without the need to trust the operating system. BlackBox introduces a container security monitor, a small trusted computing base that creates separate and independent physical address spaces for each container, such that there is no direct information flow from container to operating system or other container physical address spaces. Containerized apps do not need to be modified, can still make full use of operating system services via system calls, yet their CPU and memory state are isolated and protected from other containers and the operating system
Covariance-Aware Private Mean Estimation Without Private Covariance Estimation
We present two sample-efficient differentially private mean estimators for
-dimensional (sub)Gaussian distributions with unknown covariance.
Informally, given samples from such a distribution with
mean and covariance , our estimators output such that
, where is
the Mahalanobis distance. All previous estimators with the same guarantee
either require strong a priori bounds on the covariance matrix or require
samples.
Each of our estimators is based on a simple, general approach to designing
differentially private mechanisms, but with novel technical steps to make the
estimator private and sample-efficient. Our first estimator samples a point
with approximately maximum Tukey depth using the exponential mechanism, but
restricted to the set of points of large Tukey depth. Proving that this
mechanism is private requires a novel analysis. Our second estimator perturbs
the empirical mean of the data set with noise calibrated to the empirical
covariance, without releasing the covariance itself. Its sample complexity
guarantees hold more generally for subgaussian distributions, albeit with a
slightly worse dependence on the privacy parameter. For both estimators,
careful preprocessing of the data is required to satisfy differential privacy
Recommended from our members
The Design, Implementation, and Evaluation of Software and Architectural Support for Nested Virtualization on Modern Architectures
Nested virtualization, the discipline of running virtual machines inside other virtual machines, is increasingly important because of the need to deploy workloads that are already using virtualization on top of virtualized cloud infrastructures. However, nested virtualization performance on modern computer architectures is far from native execution speed, which remains a key impediment to further adoption. My thesis is that simple changes to hardware, software, and virtual machine configuration that are transparent to nested virtual machines can provide near-native execution speed for real application workloads. This dissertation presents three mechanisms that improve nested virtualization performance.
First, we present NEsted Virtualization Extensions for Arm (NEVE). As Arm servers make inroads in cloud infrastructure deployments, supporting nested virtualization on Arm is a key requirement. The requirement has recently been met with the introduction of nested virtualization support for the Arm architecture. We built the first hypervisor using Arm nested virtualization support and show that, despite similarities between Arm and x86 nested virtualization support, performance on Arm is much worse than on x86. This is due to excessive traps to the hypervisor caused by differences in non-nested virtualization support. To address this problem, we introduce a novel paravirtualization technique to rapidly prototype architectural changes for virtualization and evaluate their performance impact using existing hardware. Using this technique, we introduce NEVE, a set of simple architectural changes to Arm that can be used by software to coalesce and defer traps by logging the results of hypervisor instructions until the results are actually needed by the hypervisor. We show that NEVE allows hypervisors running real application workloads to provide an order of magnitude improvement in performance over current Arm nested virtualization support and up to three times less overhead than x86 nested virtualization. NEVE is included in the Armv8.4 architecture.
Second, we introduce virtual-passthrough, a new approach for providing virtual I/O devices for nested virtualization without the intervention of multiple levels of hypervisors. Virtual-passthrough preserves I/O interposition while addressing the performance problem of I/O intensive workloads as they perform many times worse with nested virtualization than without virtualization. With virtual-passthrough, virtual devices provided by a host hypervisor, the hypervisor that runs directly on the hardware, can be assigned to nested virtual machines directly without delivering data and control through multiple layers of hypervisors. The approach leverages the existing direct device assignment mechanism and implementation, so it only requires virtual machine configuration changes. Virtual-passthrough is platform-agnostic and easily supports important virtualization features such as migration. We have applied virtual-passthrough in the Linux KVM hypervisor for both x86 and Arm hardware, and show that it can provide more than an order of magnitude improvement in performance over current KVM virtual device support on real application workloads.
Third, we introduce Direct Virtual Hardware (DVH), a new approach that enables a host hypervisor to directly provide virtual hardware to nested virtual machines without the intervention of multiple levels of hypervisors. DVH is a generalization of virtual-passthrough and does not limit virtual hardware to I/O devices. Beyond virtual-passthrough, we introduce three additional DVH mechanisms: virtual timers, virtual inter-processor interrupts, and virtual idle. DVH provides virtual hardware for these mechanisms that mimics the underlying hardware and, in some cases, adds new enhancements that leverage the flexibility of software without the need for matching physical hardware support. We have implemented DVH in KVM. Our experimental results show that combining the four DVH mechanisms can provide even greater performance than virtual-passthrough alone and provide near-native execution speeds on real application workloads
Energy-aware design of hardware and software for ultra-low-power systems
Future visions of the Internet of Things and Industry 4.0
demand for large scale deployments of mobile devices while removing
the numerous disadvantages of using batteries: degradation, scale, weight,
pollution, and costs. However, this requires computing platforms with extremely
low energy consumptions, and thus employ ultra-low-power hardware, energy
harvesting solutions, and highly efficient power-management hardware and
software.
The goal of these power management solutions is to either achieve power
neutrality, a condition where energy harvest and energy consumption equalize
while maximizing the service quality, or to enhance power efficiency for
conserving energy reserves. To reach these goals, intelligent power-management
decisions are needed that utilize precise energy data.
This thesis discusses the measurement of energy in embedded systems, both
online and by external equipment, and the utilization of the acquired data for
modeling the power consumption states of each involved hardware component.
Furthermore, a method is shown to use the resulting models by instrumenting
preexisting device drivers.
These drivers enable new functionalities, such as online energy accounting and
energy application interfaces, and facilitate intelligent power management
decisions.
In order to reduce additional efforts for device driver reimplementation and
the violation of the separation of concerns paradigm, the approach shown
in this thesis synthesizes instrumentation aspects for an
aspect oriented programming language, so that the original device-driver
source code remains unaffected.
Eventually, an automated process of energy measurement and data
analysis is presented. This process is able to yield precise energy models
with low manual effort. In combination with the instrumentation synthesis of
aspect code, this method enables an accelerated creation process for energy
models of ultra-low-power systems. For all proposed methods,
empirical accuracy and overhead measurements are presented.
To support the claims of the author, first practical energy aware and
wireless-radio networked applications are showcased: An energy-neutral light
sensor, a photovoltaic-powered seminar-room door plate, and a sensor network
experiment testbed for research and education
Interaction-aware analysis and optimization of real-time application and operating system
Mechanical and electronic automation was a key component of the technological advances in the last two hundred years.
With the use of special-purpose machines, manual labor was replaced by mechanical motion, leaving workers with the operation of these machines, before also this task was conquered by embedded control systems.
With the advances of general-purpose computing, the development of these control systems shifted more and more from a problem-specific one to a one-size-fits-all mentality as the trade-off between per-instance overheads and development costs was in favor of flexible and reusable implementations.
However, with a scaling factor of thousands, if not millions, of deployed devices, overheads and inefficiencies accumulate; calling for a higher degree of specialization.
For the area real-time operating systems (RTOSs), which form the base layer for many of these computerized control systems, we deploy way more flexibility than what is actually required for the applications that run on top of it.
Since only the solution, but not the problem, became less specific to the control problem at hand, we have the chance to cut away inefficiencies, improve on system-analyses results, and optimize the resource consumption.
However, such a tailoring will only be favorable if it can be performed without much developer interaction and in an automated fashion.
Here, real-time systems are a good starting point, since we already have to have a large degree of static knowledge in order to guarantee their timeliness.
Until now, this static nature is not exploited to its full extent and optimization potentials are left unused.
The requirements of a system, with regard to the RTOS, manifest in the interactions between the application and the kernel.
Threads request resources from the RTOS, which in return determines and enforces a scheduling order that will ensure the timely completion of all necessary computations.
Since the RTOS runs only in the exception, its reaction to requests from the application (or from the environment) is its defining feature.
In this thesis, I will grasp these interactions, and thereby the required RTOS semantic, in a control-flow-sensitive fashion.
Extracted automatically, this knowledge about the reciprocal influence allows me to fit the implementation of a system closer to its actual requirements.
The result is a system that is not only in its usage a special-purpose system, but also in its implementation and in its provided guarantees.
In the development of my approach, it became clear that the focus on these interactions is not only highly fruitful for the optimization of a system, but also for its end-to-end analysis.
Therefore, this thesis does not only provide methods to reduce the kernel-execution overhead and a system's memory consumption, but it also includes methods to calculate tighter response-time bounds and to give guarantees about the correct behavior of the kernel.
All these contributions are enabled by my proposed interaction-aware methodology that takes the whole system, RTOS and application, into account.
With this thesis, I show that a control-flow-sensitive whole-system view on the interactions is feasible and highly rewarding.
With this approach, we can overcome many inefficiencies that arise from analyses that have an isolating focus on individual system components.
Furthermore, the interaction-aware methods keep close to the actual implementation, and therefore are able to consider the behavioral patterns of the finally deployed real-time computing system
Parallel and Flow-Based High Quality Hypergraph Partitioning
Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits.
Given a hypergraph and an integer , the task is to divide the vertices into disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks.
In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge.
The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases.
In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs.
Once sufficiently small, an initial partition is computed.
Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level.
An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time.
The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem.
Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality.
While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible.
We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways.
Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines.
In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof.
We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation.
For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements.
For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly.
Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework.
It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner.
Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level.
This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential.
We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening.
In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio.
This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening.
The last ingredient for high quality is an iterative improvement algorithm based on maximum flows.
In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts.
Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel.
Beyond the strive for highest quality, we present a deterministically parallel partitioning framework.
We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement.
Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small.
All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets.
To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar.
While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain.
With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense.
Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm
On the Combination of Game-Theoretic Learning and Multi Model Adaptive Filters
This paper casts coordination of a team of robots within the framework of game theoretic learning algorithms. In particular a novel variant of fictitious play is proposed, by considering multi-model adaptive filters as a method to estimate other players’ strategies. The proposed algorithm can be used as a coordination mechanism between players when they should take decisions under uncertainty. Each player chooses an action after taking into account the actions of the other players and also the uncertainty. Uncertainty can occur either in terms of noisy observations or various types of other players. In addition, in contrast to other game-theoretic and heuristic algorithms for distributed optimisation, it is not necessary to find the optimal parameters a priori. Various parameter values can be used initially as inputs to different models. Therefore, the resulting decisions will be aggregate results of all the parameter values. Simulations are used to test the performance of the proposed methodology against other game-theoretic learning algorithms.</p