428 research outputs found
A Class of MSR Codes for Clustered Distributed Storage
Clustered distributed storage models real data centers where intra- and
cross-cluster repair bandwidths are different. In this paper, exact-repair
minimum-storage-regenerating (MSR) codes achieving capacity of clustered
distributed storage are designed. Focus is given on two cases: and
, where is the ratio of the available cross- and
intra-cluster repair bandwidths, is the total number of distributed nodes
and is the number of contact nodes in data retrieval. The former represents
the scenario where cross-cluster communication is not allowed, while the latter
corresponds to the case of minimum cross-cluster bandwidth that is possible
under the minimum storage overhead constraint. For the case, two
types of locally repairable codes are proven to achieve the MSR point. As for
, an explicit MSR coding scheme is suggested for the
two-cluster situation under the specific condition of .Comment: 9 pages, a part of this paper is submitted to IEEE ISIT201
Hierarchical Coding for Distributed Computing
Coding for distributed computing supports low-latency computation by
relieving the burden of straggling workers. While most existing works assume a
simple master-worker model, we consider a hierarchical computational structure
consisting of groups of workers, motivated by the need to reflect the
architectures of real-world distributed computing systems. In this work, we
propose a hierarchical coding scheme for this model, as well as analyze its
decoding cost and expected computation time. Specifically, we first provide
upper and lower bounds on the expected computing time of the proposed scheme.
We also show that our scheme enables efficient parallel decoding, thus reducing
decoding costs by orders of magnitude over non-hierarchical schemes. When
considering both decoding cost and computing time, the proposed hierarchical
coding is shown to outperform existing schemes in many practical scenarios.Comment: 7 pages, part of the paper is submitted to ISIT201
Equal Improvability: A New Fairness Notion Considering the Long-term Impact
Devising a fair classifier that does not discriminate against different
groups is an important problem in machine learning. Although researchers have
proposed various ways of defining group fairness, most of them only focused on
the immediate fairness, ignoring the long-term impact of a fair classifier
under the dynamic scenario where each individual can improve its feature over
time. Such dynamic scenarios happen in real world, e.g., college admission and
credit loaning, where each rejected sample makes effort to change its features
to get accepted afterwards. In this dynamic setting, the long-term fairness
should equalize the samples' feature distribution across different groups after
the rejected samples make some effort to improve. In order to promote long-term
fairness, we propose a new fairness notion called Equal Improvability (EI),
which equalizes the potential acceptance rate of the rejected samples across
different groups assuming a bounded level of effort will be spent by each
rejected sample. We analyze the properties of EI and its connections with
existing fairness notions. To find a classifier that satisfies the EI
requirement, we propose and study three different approaches that solve
EI-regularized optimization problems. Through experiments on both synthetic and
real datasets, we demonstrate that the proposed EI-regularized algorithms
encourage us to find a fair classifier in terms of EI. Finally, we provide
experimental results on dynamic scenarios which highlight the advantages of our
EI metric in achieving the long-term fairness. Codes are available in a GitHub
repository, see https://github.com/guldoganozgur/ei_fairness.Comment: Codes are available in a GitHub repository, see
https://github.com/guldoganozgur/ei_fairness. ICLR 2023 Poster. 31 pages, 10
figures, 6 table
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Word translation without parallel corpora has become feasible, rivaling the
performance of supervised methods. Recent findings have shown that the accuracy
and robustness of unsupervised word translation (UWT) can be improved by making
use of visual observations, which are universal representations across
languages. In this work, we investigate the potential of using not only visual
observations but also pretrained language-image models for enabling a more
efficient and robust UWT. Specifically, we develop a novel UWT method dubbed
Word Alignment using Language-Image Pretraining (WALIP), which leverages visual
observations via the shared embedding space of images and texts provided by
CLIP models (Radford et al., 2021). WALIP has a two-step procedure. First, we
retrieve word pairs with high confidences of similarity, computed using our
proposed image-based fingerprints, which define the initial pivot for the word
alignment. Second, we apply our robust Procrustes algorithm to estimate the
linear mapping between two embedding spaces, which iteratively corrects and
refines the estimated alignment. Our extensive experiments show that WALIP
improves upon the state-of-the-art performance of bilingual word alignment for
a few language pairs across different word embeddings and displays great
robustness to the dissimilarity of language pairs or training corpora for two
word embeddings.Comment: In Proceedings of the 2022 Conference on Empirical Methods in Natural
Language Processing (EMNLP Findings
- …