Search CORE

6 research outputs found

On Perfect Bases in Finite Abelian Groups

Author: Bajnok Bela
Berson Connor
Just Hoang Anh
Publication venue
Publication date: 24/11/2022
Field of study

Let

G

be a finite abelian group and

s

be a positive integer. A subset

A

G

is called a {\em perfect

s

-basis of

G

} if each element of

G

can be written uniquely as the sum of at most

s

(not-necessarily-distinct) elements of

A

; similarly, we say that

A

is a {\em perfect restricted

s

-basis of

G

} if each element of

G

can be written uniquely as the sum of at most

s

distinct elements of

A

. We prove that perfect

s

-bases exist only in the trivial cases of

s=1

|A|=1

. The situation is different with restricted addition where perfection is more frequent; here we treat the case of

s=2

and prove that

G

has a perfect restricted

2

-basis if, and only if, it is isomorphic to

\mathbb{Z}_2

\mathbb{Z}_4

\mathbb{Z}_7

\mathbb{Z}_2^2

\mathbb{Z}_2^4

, or

\mathbb{Z}_2^2 \times \mathbb{Z}_4

.Comment: To appear in Involv

arXiv.org e-Print Archive

2D-Shapley: A Framework for Fragmented Data Valuation

Author: Chang Xiangyu
Chen Xi
Jia Ruoxi
Just Hoang Anh
Liu Zhihong
Publication venue
Publication date: 18/06/2023
Field of study

Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.Comment: ICML 202

arXiv.org e-Print Archive

LAVA: Data Valuation without Pre-Specified Learning Algorithms

Author: Jia Ruoxi
Jin Ming
Just Hoang Anh
Kang Feiyang
Ko Myeongseob
Wang Jiachen T.
Zeng Yi
Publication venue
Publication date: 19/12/2023
Field of study

Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. Our main results are as follows. (1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions. (2) We develop a novel method to value individual data based on the sensitivity analysis of the class-wise Wasserstein distance. Importantly, these values can be directly obtained for free from the output of off-the-shelf optimization solvers when computing the distance. (3) We evaluate our new data valuation framework over various use cases related to detecting low-quality data and show that, surprisingly, the learning-agnostic feature of our framework enables a significant improvement over SOTA performance while being orders of magnitude faster.Comment: ICLR 2023 Spotlight Latest Updated Version: 2023/12/1

arXiv.org e-Print Archive

Augment Your Past

Author: Just Hoang Anh
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 01/07/2019
Field of study

The project focuses on the past by providing the historical facts and photos of the Gettysburg College Campus from 1890s to 1920s. It allows the audience to compare the present Gettysburg College Campus with the past by using an augmented reality mobile application developed in Unity. The project hopes to interest the College community to learn more about the College history using modern approaches of conveying the past

Gettysburg College

Learning to Refit for Convex Learning Problems

Author: Chen Si
Jia Ruoxi
Jin Ran
Just Hoang Anh
Wang Tianhao
Zeng Yingyan
Publication venue
Publication date: 24/11/2021
Field of study

Machine learning (ML) models need to be frequently retrained on changing datasets in a wide variety of application scenarios, including data valuation and uncertainty quantification. To efficiently retrain the model, linear approximation methods such as influence function have been proposed to estimate the impact of data changes on model parameters. However, these methods become inaccurate for large dataset changes. In this work, we focus on convex learning problems and propose a general framework to learn to estimate optimized model parameters for different training sets using neural networks. We propose to enforce the predicted model parameters to obey optimality conditions and maintain utility through regularization techniques, which significantly improve generalization. Moreover, we rigorously characterize the expressive power of neural networks to approximate the optimizer of convex problems. Empirical results demonstrate the advantage of the proposed method in accurate and efficient model parameter estimation compared to the state-of-the-art

arXiv.org e-Print Archive

Shrimp farmers risk management and demand for insurance in Ben Tre and Tra Vinh Provinces in Vietnam

Crossref