Search CORE

428 research outputs found

A Class of MSR Codes for Clustered Distributed Storage

Author: Choi Beongjun
Moon Jaekyun
Sohn Jy-yong
Publication venue
Publication date: 06/01/2018
Field of study

Clustered distributed storage models real data centers where intra- and cross-cluster repair bandwidths are different. In this paper, exact-repair minimum-storage-regenerating (MSR) codes achieving capacity of clustered distributed storage are designed. Focus is given on two cases:

\epsilon=0

and

\epsilon=1/(n-k)

, where

\epsilon

is the ratio of the available cross- and intra-cluster repair bandwidths,

n

is the total number of distributed nodes and

k

is the number of contact nodes in data retrieval. The former represents the scenario where cross-cluster communication is not allowed, while the latter corresponds to the case of minimum cross-cluster bandwidth that is possible under the minimum storage overhead constraint. For the

\epsilon=0

case, two types of locally repairable codes are proven to achieve the MSR point. As for

\epsilon=1/(n-k)

, an explicit MSR coding scheme is suggested for the two-cluster situation under the specific condition of

n = 2k

.Comment: 9 pages, a part of this paper is submitted to IEEE ISIT201

arXiv.org e-Print Archive

Crossref

Hierarchical Coding for Distributed Computing

Author: Lee Kangwook
Moon Jaekyun
Park Hyegyeong
Sohn Jy-yong
Suh Changho
Publication venue
Publication date: 15/01/2018
Field of study

Coding for distributed computing supports low-latency computation by relieving the burden of straggling workers. While most existing works assume a simple master-worker model, we consider a hierarchical computational structure consisting of groups of workers, motivated by the need to reflect the architectures of real-world distributed computing systems. In this work, we propose a hierarchical coding scheme for this model, as well as analyze its decoding cost and expected computation time. Specifically, we first provide upper and lower bounds on the expected computing time of the proposed scheme. We also show that our scheme enables efficient parallel decoding, thus reducing decoding costs by orders of magnitude over non-hierarchical schemes. When considering both decoding cost and computing time, the proposed hierarchical coding is shown to outperform existing schemes in many practical scenarios.Comment: 7 pages, part of the paper is submitted to ISIT201

arXiv.org e-Print Archive

Crossref

Equal Improvability: A New Fairness Notion Considering the Long-term Impact

Author: Guldogan Ozgur
Lee Kangwook
Pedarsani Ramtin
Sohn Jy-yong
Zeng Yuchen
Publication venue
Publication date: 09/04/2023
Field of study

Devising a fair classifier that does not discriminate against different groups is an important problem in machine learning. Although researchers have proposed various ways of defining group fairness, most of them only focused on the immediate fairness, ignoring the long-term impact of a fair classifier under the dynamic scenario where each individual can improve its feature over time. Such dynamic scenarios happen in real world, e.g., college admission and credit loaning, where each rejected sample makes effort to change its features to get accepted afterwards. In this dynamic setting, the long-term fairness should equalize the samples' feature distribution across different groups after the rejected samples make some effort to improve. In order to promote long-term fairness, we propose a new fairness notion called Equal Improvability (EI), which equalizes the potential acceptance rate of the rejected samples across different groups assuming a bounded level of effort will be spent by each rejected sample. We analyze the properties of EI and its connections with existing fairness notions. To find a classifier that satisfies the EI requirement, we propose and study three different approaches that solve EI-regularized optimization problems. Through experiments on both synthetic and real datasets, we demonstrate that the proposed EI-regularized algorithms encourage us to find a fair classifier in terms of EI. Finally, we provide experimental results on dynamic scenarios which highlight the advantages of our EI metric in achieving the long-term fairness. Codes are available in a GitHub repository, see https://github.com/guldoganozgur/ei_fairness.Comment: Codes are available in a GitHub repository, see https://github.com/guldoganozgur/ei_fairness. ICLR 2023 Poster. 31 pages, 10 figures, 6 table

arXiv.org e-Print Archive

Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment

Author: Dinh Tuan
Hu Junjie
Lee Kangwook
Ming Yifei
Ossowski Timothy
Papailiopoulos Dimitris
Rajput Shashank
Sohn Jy-yong
Publication venue
Publication date: 07/11/2022
Field of study

Word translation without parallel corpora has become feasible, rivaling the performance of supervised methods. Recent findings have shown that the accuracy and robustness of unsupervised word translation (UWT) can be improved by making use of visual observations, which are universal representations across languages. In this work, we investigate the potential of using not only visual observations but also pretrained language-image models for enabling a more efficient and robust UWT. Specifically, we develop a novel UWT method dubbed Word Alignment using Language-Image Pretraining (WALIP), which leverages visual observations via the shared embedding space of images and texts provided by CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we retrieve word pairs with high confidences of similarity, computed using our proposed image-based fingerprints, which define the initial pivot for the word alignment. Second, we apply our robust Procrustes algorithm to estimate the linear mapping between two embedding spaces, which iteratively corrects and refines the estimated alignment. Our extensive experiments show that WALIP improves upon the state-of-the-art performance of bilingual word alignment for a few language pairs across different word embeddings and displays great robustness to the dissimilarity of language pairs or training corpora for two word embeddings.Comment: In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP Findings

arXiv.org e-Print Archive