23 research outputs found
Training Overparametrized Neural Networks in Sublinear Time
The success of deep learning comes at a tremendous computational and energy
cost, and the scalability of training massively overparametrized neural
networks is becoming a real barrier to the progress of artificial intelligence
(AI). Despite the popularity and low cost-per-iteration of traditional
backpropagation via gradient decent, stochastic gradient descent (SGD) has
prohibitive convergence rate in non-convex settings, both in theory and
practice.
To mitigate this cost, recent works have proposed to employ alternative
(Newton-type) training methods with much faster convergence rate, albeit with
higher cost-per-iteration. For a typical neural network with
parameters and input batch of datapoints in
, the previous work of [Brand, Peng, Song, and Weinstein,
ITCS'2021] requires time per iteration. In this paper, we
present a novel training method that requires only
amortized time in the same overparametrized regime, where
is some fixed constant. This method relies on a new and alternative view of
neural networks, as a set of binary search trees, where each iteration
corresponds to modifying a small subset of the nodes in the tree. We believe
this view would have further applications in the design and analysis of deep
neural networks (DNNs)
A Faster -means++ Algorithm
K-means++ is an important algorithm to choose initial cluster centers for the
k-means clustering algorithm. In this work, we present a new algorithm that can
solve the -means++ problem with near optimal running time. Given data
points in , the current state-of-the-art algorithm runs in
iterations, and each iteration takes
time. The overall running time is thus . We propose a
new algorithm \textsc{FastKmeans++} that only takes in time, in total
Differentially Oblivious Database Joins: Overcoming the Worst-Case Curse of Fully Oblivious Algorithms
Numerous high-profile works have shown that access patterns to even encrypted databases can leak secret information and sometimes even lead to reconstruction of the entire database. To thwart access pattern leakage, the literature has focused on oblivious algorithms, where obliviousness requires that the access patterns leak nothing about the input data.
In this paper, we consider the Join operator, an important database primitive that has been extensively studied and optimized. Unfortunately, any fully oblivious Join algorithm would require always padding the result to the worst-case length which is quadratic in the data size N. In comparison, an insecure baseline incurs only O(R + N) cost where R is the true result length, and in the common case in practice, R is relatively short. As a typical example, when R = O(N), any fully oblivious algorithm must inherently incur a prohibitive, N-fold slowdown relative to the insecure baseline. Indeed, the (non-private) database and algorithms literature invariably focuses on studying the instance-specific rather than worst-case performance of database algorithms. Unfortunately, the stringent notion of full obliviousness precludes the design of efficient algorithms with non-trivial instance-specific performance.
To overcome this worst-case performance barrier of full obliviousness and enable algorithms with good instance-specific performance, we consider a relaxed notion of access pattern privacy called (?, ?)-differential obliviousness (DO), originally proposed in the seminal work of Chan et al. (SODA\u2719). Rather than insisting that the access patterns leak no information whatsoever, the relaxed DO notion requires that the access patterns satisfy (?, ?)-differential privacy. We show that by adopting the relaxed DO notion, we can obtain efficient database Join mechanisms whose instance-specific performance approximately matches the insecure baseline, while still offering a meaningful notion of privacy to individual users. Complementing our upper bound results, we also prove new lower bounds regarding the performance of any DO Join algorithm.
Differential obliviousness (DO) is a new notion and is a relatively unexplored territory. Following the pioneering investigations by Chan et al. and others, our work is among the very first to formally explore how DO can help overcome the worst-case performance curse of full obliviousness; moreover, we motivate our work with database applications. Our work shows new evidence why DO might be a promising notion, and opens up several exciting future directions
Query Complexity of Active Learning for Function Family With Nearly Orthogonal Basis
Many machine learning algorithms require large numbers of labeled data to
deliver state-of-the-art results. In applications such as medical diagnosis and
fraud detection, though there is an abundance of unlabeled data, it is costly
to label the data by experts, experiments, or simulations. Active learning
algorithms aim to reduce the number of required labeled data points while
preserving performance. For many convex optimization problems such as linear
regression and -norm regression, there are theoretical bounds on the number
of required labels to achieve a certain accuracy. We call this the query
complexity of active learning. However, today's active learning algorithms
require the underlying learned function to have an orthogonal basis. For
example, when applying active learning to linear regression, the requirement is
the target function is a linear composition of a set of orthogonal linear
functions, and active learning can find the coefficients of these linear
functions. We present a theoretical result to show that active learning does
not need an orthogonal basis but rather only requires a nearly orthogonal
basis. We provide the corresponding theoretical proofs for the function family
of nearly orthogonal basis, and its applications associated with the
algorithmically efficient active learning framework
Agile Development of Linux Schedulers with Ekiben
Kernel task scheduling is important for application performance, adaptability
to new hardware, and complex user requirements. However, developing, testing,
and debugging new scheduling algorithms in Linux, the most widely used cloud
operating system, is slow and difficult. We developed Ekiben, a framework for
high velocity development of Linux kernel schedulers. Ekiben schedulers are
written in safe Rust, and the system supports live upgrade of new scheduling
policies into the kernel, userspace debugging, and bidirectional communication
with applications. A scheduler implemented with Ekiben achieved near identical
performance (within 1% on average) to the default Linux scheduler CFS on a wide
range of benchmarks. Ekiben is also able to support a range of research
schedulers, specifically the Shinjuku scheduler, a locality aware scheduler,
and the Arachne core arbiter, with good performance.Comment: 13 pages, 5 figures, submitted to Eurosys 202
Punica: Multi-Tenant LoRA Serving
Low-rank adaptation (LoRA) has become an important and popular method to
adapt pre-trained models to specific domains. We present Punica, a system to
serve multiple LoRA models in a shared GPU cluster. Punica contains a new CUDA
kernel design that allows batching of GPU operations for different LoRA models.
This allows a GPU to hold only a single copy of the underlying pre-trained
model when serving multiple, different LoRA models, significantly enhancing GPU
efficiency in terms of both memory and computation. Our scheduler consolidates
multi-tenant LoRA serving workloads in a shared GPU cluster. With a fixed-sized
GPU cluster, our evaluations show that Punica achieves 12x higher throughput in
serving multiple LoRA models compared to state-of-the-art LLM serving systems
while only adding 2ms latency per token. Punica is open source at
https://github.com/punica-ai/punica
Adore: Differentially Oblivious Relational Database Operators
There has been a recent effort in applying differential privacy on memory
access patterns to enhance data privacy. This is called differential
obliviousness. Differential obliviousness is a promising direction because it
provides a principled trade-off between performance and desired level of
privacy. To date, it is still an open question whether differential
obliviousness can speed up database processing with respect to full
obliviousness. In this paper, we present the design and implementation of three
new major database operators: selection with projection, grouping with
aggregation, and foreign key join. We prove that they satisfy the notion of
differential obliviousness. Our differentially oblivious operators have reduced
cache complexity, runtime complexity, and output size compared to their
state-of-the-art fully oblivious counterparts. We also demonstrate that our
implementation of these differentially oblivious operators can outperform their
state-of-the-art fully oblivious counterparts by up to .Comment: VLDB 202