5 research outputs found
Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds
In this paper, we study parallel algorithms for the correlation clustering
problem, where every pair of two different entities is labeled with similar or
dissimilar. The goal is to partition the entities into clusters to minimize the
number of disagreements with the labels. Currently, all efficient parallel
algorithms have an approximation ratio of at least 3. In comparison with the
ratio achieved by polynomial-time sequential algorithms
[CLN22], a significant gap exists.
We propose the first poly-logarithmic depth parallel algorithm that achieves
a better approximation ratio than 3. Specifically, our algorithm computes a
-approximate solution and uses work.
Additionally, it can be translated into a -time sequential
algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with
total memory.
Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12]
length-constrained multi-commodity flow algorithm, where we develop an
efficient parallel algorithm to solve a truncated correlation clustering linear
program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of
the truncated linear program can be rounded with a factor of at most 2.4 loss
by using the framework of [CMSY15]. Such a rounding framework can then be
implemented using parallel pivot-based approaches
Nested Active-Time Scheduling
The active-time scheduling problem considers the problem of scheduling preemptible jobs with windows (release times and deadlines) on a parallel machine that can schedule up to g jobs during each timestep. The goal in the active-time problem is to minimize the number of active steps, i.e., timesteps in which at least one job is scheduled. In this way, the active time models parallel scheduling when there is a fixed cost for turning the machine on at each discrete step.
This paper presents a 9/5-approximation algorithm for a special case of the active-time scheduling problem in which job windows are laminar (nested). This result improves on the previous best 2-approximation for the general case
Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax
Learning Electronic Health Records (EHRs) representation is a preeminent yet
under-discovered research topic. It benefits various clinical decision support
applications, e.g., medication outcome prediction or patient similarity search.
Current approaches focus on task-specific label supervision on vectorized
sequential EHR, which is not applicable to large-scale unsupervised scenarios.
Recently, contrastive learning shows great success on self-supervised
representation learning problems. However, complex temporality often degrades
the performance. We propose Graph Kernel Infomax, a self-supervised graph
kernel learning approach on the graphical representation of EHR, to overcome
the previous problems. Unlike the state-of-the-art, we do not change the graph
structure to construct augmented views. Instead, we use Kernel Subspace
Augmentation to embed nodes into two geometrically different manifold views.
The entire framework is trained by contrasting nodes and graph representations
on those two manifold views through the commonly used contrastive objectives.
Empirically, using publicly available benchmark EHR datasets, our approach
yields performance on clinical downstream tasks that exceeds the
state-of-the-art. Theoretically, the variation on distance metrics naturally
creates different views as data augmentation without changing graph structures
Constant bandwidth ORAM with small block size using PIR operations
Recently, server-with-computation model has been applied in Oblivious RAM scheme to achieve constant communication (constant number of blocks). However, existing works either result in large block size O(log^6N), or have some security flaws. Furthermore, a lower bound of sub-logarithmic bandwidth was given if we do not use expensive fully homomorphic operations. The question of \whether constant bandwidth with smaller block size without fully homomorphic operations is achievable remains open. In this paper, we provide an affirmative answer. We
propose a constant bandwidth ORAM scheme with block size O(log^3N) using only additive homomorphic operations. Our scheme is secure under the standard model. Technically, we design a non-trivial oblivious clear algorithm with very small bandwidth to improve the eviction algorithm in ORAM for which the lower bound proof does not apply. As an additional benefit, we are able to reduce the server storage due to the reduction in bucket size