Search CORE

5 research outputs found

Breaking 3-Factor Approximation for Correlation Clustering in Polylogarithmic Rounds

Author: Cao Nairen
Huang Shang-En
Su Hsin-Hao
Publication venue
Publication date: 13/07/2023
Field of study

In this paper, we study parallel algorithms for the correlation clustering problem, where every pair of two different entities is labeled with similar or dissimilar. The goal is to partition the entities into clusters to minimize the number of disagreements with the labels. Currently, all efficient parallel algorithms have an approximation ratio of at least 3. In comparison with the

1.994+\epsilon

ratio achieved by polynomial-time sequential algorithms [CLN22], a significant gap exists. We propose the first poly-logarithmic depth parallel algorithm that achieves a better approximation ratio than 3. Specifically, our algorithm computes a

(2.4+\epsilon)

-approximate solution and uses

\tilde{O}(m^{1.5})

work. Additionally, it can be translated into a

\tilde{O}(m^{1.5})

-time sequential algorithm and a poly-logarithmic rounds sublinear-memory MPC algorithm with

\tilde{O}(m^{1.5})

total memory. Our approach is inspired by Awerbuch, Khandekar, and Rao's [AKR12] length-constrained multi-commodity flow algorithm, where we develop an efficient parallel algorithm to solve a truncated correlation clustering linear program of Charikar, Guruswami, and Wirth [CGW05]. Then we show the solution of the truncated linear program can be rounded with a factor of at most 2.4 loss by using the framework of [CMSY15]. Such a rounding framework can then be implemented using parallel pivot-based approaches

arXiv.org e-Print Archive

Nested Active-Time Scheduling

Author: Cao Nairen
Fineman Jeremy T.
Li Shi
Russell Katina
Umboh Seeun William
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 33rd International Symposium on Algorithms and Computation (ISAAC 2022)
Publication date: 01/01/2022
Field of study

The active-time scheduling problem considers the problem of scheduling preemptible jobs with windows (release times and deadlines) on a parallel machine that can schedule up to g jobs during each timestep. The goal in the active-time problem is to minimize the number of active steps, i.e., timesteps in which at least one job is scheduled. In this way, the active time models parallel scheduling when there is a fixed cost for turning the machine on at each discrete step. This paper presents a 9/5-approximation algorithm for a special case of the active-time scheduling problem in which job windows are laminar (nested). This result improves on the previous best 2-approximation for the general case

Dagstuhl Research Online Publication Server

Self-supervised Representation Learning on Electronic Health Records with Graph Kernel Infomax

Author: Cao Nairen
Chang Der-Chen
Fineman Jeremy
Frieder Ophir
Russell Katina
Yao Hao-Ren
Publication venue
Publication date: 01/09/2022
Field of study

Learning Electronic Health Records (EHRs) representation is a preeminent yet under-discovered research topic. It benefits various clinical decision support applications, e.g., medication outcome prediction or patient similarity search. Current approaches focus on task-specific label supervision on vectorized sequential EHR, which is not applicable to large-scale unsupervised scenarios. Recently, contrastive learning shows great success on self-supervised representation learning problems. However, complex temporality often degrades the performance. We propose Graph Kernel Infomax, a self-supervised graph kernel learning approach on the graphical representation of EHR, to overcome the previous problems. Unlike the state-of-the-art, we do not change the graph structure to construct augmented views. Instead, we use Kernel Subspace Augmentation to embed nodes into two geometrically different manifold views. The entire framework is trained by contrasting nodes and graph representations on those two manifold views through the commonly used contrastive objectives. Empirically, using publicly available benchmark EHR datasets, our approach yields performance on clinical downstream tasks that exceeds the state-of-the-art. Theoretically, the variation on distance metrics naturally creates different views as data augmentation without changing graph structures

arXiv.org e-Print Archive

Constant bandwidth ORAM with small block size using PIR operations

Author: Gongxian Zeng
Linru Zhang
Nairen Cao
Siu-Ming Yiu
Yuechen Chen
Zheli Liu
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 20/06/2017
Field of study

Recently, server-with-computation model has been applied in Oblivious RAM scheme to achieve constant communication (constant number of blocks). However, existing works either result in large block size O(log^6N), or have some security flaws. Furthermore, a lower bound of sub-logarithmic bandwidth was given if we do not use expensive fully homomorphic operations. The question of \whether constant bandwidth with smaller block size without fully homomorphic operations is achievable remains open. In this paper, we provide an affirmative answer. We propose a constant bandwidth ORAM scheme with block size O(log^3N) using only additive homomorphic operations. Our scheme is secure under the standard model. Technically, we design a non-trivial oblivious clear algorithm with very small bandwidth to improve the eviction algorithm in ORAM for which the lower bound proof does not apply. As an additional benefit, we are able to reduce the server storage due to the reduction in bucket size

Cryptology ePrint Archive