6 research outputs found
On Perfect Bases in Finite Abelian Groups
Let be a finite abelian group and be a positive integer. A subset
of is called a {\em perfect -basis of } if each element of can be
written uniquely as the sum of at most (not-necessarily-distinct) elements
of ; similarly, we say that is a {\em perfect restricted -basis of
} if each element of can be written uniquely as the sum of at most
distinct elements of . We prove that perfect -bases exist only in the
trivial cases of or . The situation is different with restricted
addition where perfection is more frequent; here we treat the case of and
prove that has a perfect restricted -basis if, and only if, it is
isomorphic to , , , ,
, or .Comment: To appear in Involv
2D-Shapley: A Framework for Fragmented Data Valuation
Data valuation -- quantifying the contribution of individual data sources to
certain predictive behaviors of a model -- is of great importance to enhancing
the transparency of machine learning and designing incentive systems for data
sharing. Existing work has focused on evaluating data sources with the shared
feature or sample space. How to valuate fragmented data sources of which each
only contains partial features and samples remains an open question. We start
by presenting a method to calculate the counterfactual of removing a fragment
from the aggregated data matrix. Based on the counterfactual calculation, we
further propose 2D-Shapley, a theoretical framework for fragmented data
valuation that uniquely satisfies some appealing axioms in the fragmented data
context. 2D-Shapley empowers a range of new use cases, such as selecting useful
data fragments, providing interpretation for sample-wise data values, and
fine-grained data issue diagnosis.Comment: ICML 202
LAVA: Data Valuation without Pre-Specified Learning Algorithms
Traditionally, data valuation (DV) is posed as a problem of equitably
splitting the validation performance of a learning algorithm among the training
data. As a result, the calculated data values depend on many design choices of
the underlying learning algorithm. However, this dependence is undesirable for
many DV use cases, such as setting priorities over different data sources in a
data acquisition process and informing pricing mechanisms in a data
marketplace. In these scenarios, data needs to be valued before the actual
analysis and the choice of the learning algorithm is still undetermined then.
Another side-effect of the dependence is that to assess the value of individual
points, one needs to re-run the learning algorithm with and without a point,
which incurs a large computation burden. This work leapfrogs over the current
limits of data valuation methods by introducing a new framework that can value
training data in a way that is oblivious to the downstream learning algorithm.
Our main results are as follows. (1) We develop a proxy for the validation
performance associated with a training set based on a non-conventional
class-wise Wasserstein distance between training and validation sets. We show
that the distance characterizes the upper bound of the validation performance
for any given model under certain Lipschitz conditions. (2) We develop a novel
method to value individual data based on the sensitivity analysis of the
class-wise Wasserstein distance. Importantly, these values can be directly
obtained for free from the output of off-the-shelf optimization solvers when
computing the distance. (3) We evaluate our new data valuation framework over
various use cases related to detecting low-quality data and show that,
surprisingly, the learning-agnostic feature of our framework enables a
significant improvement over SOTA performance while being orders of magnitude
faster.Comment: ICLR 2023 Spotlight Latest Updated Version: 2023/12/1
Augment Your Past
The project focuses on the past by providing the historical facts and photos of the Gettysburg College Campus from 1890s to 1920s. It allows the audience to compare the present Gettysburg College Campus with the past by using an augmented reality mobile application developed in Unity. The project hopes to interest the College community to learn more about the College history using modern approaches of conveying the past
Learning to Refit for Convex Learning Problems
Machine learning (ML) models need to be frequently retrained on changing
datasets in a wide variety of application scenarios, including data valuation
and uncertainty quantification. To efficiently retrain the model, linear
approximation methods such as influence function have been proposed to estimate
the impact of data changes on model parameters. However, these methods become
inaccurate for large dataset changes. In this work, we focus on convex learning
problems and propose a general framework to learn to estimate optimized model
parameters for different training sets using neural networks. We propose to
enforce the predicted model parameters to obey optimality conditions and
maintain utility through regularization techniques, which significantly improve
generalization. Moreover, we rigorously characterize the expressive power of
neural networks to approximate the optimizer of convex problems. Empirical
results demonstrate the advantage of the proposed method in accurate and
efficient model parameter estimation compared to the state-of-the-art