6 research outputs found

    On Perfect Bases in Finite Abelian Groups

    Full text link
    Let GG be a finite abelian group and ss be a positive integer. A subset AA of GG is called a {\em perfect ss-basis of GG} if each element of GG can be written uniquely as the sum of at most ss (not-necessarily-distinct) elements of AA; similarly, we say that AA is a {\em perfect restricted ss-basis of GG} if each element of GG can be written uniquely as the sum of at most ss distinct elements of AA. We prove that perfect ss-bases exist only in the trivial cases of s=1s=1 or ∣A∣=1|A|=1. The situation is different with restricted addition where perfection is more frequent; here we treat the case of s=2s=2 and prove that GG has a perfect restricted 22-basis if, and only if, it is isomorphic to Z2\mathbb{Z}_2, Z4\mathbb{Z}_4, Z7\mathbb{Z}_7, Z22\mathbb{Z}_2^2, Z24\mathbb{Z}_2^4, or Z22×Z4\mathbb{Z}_2^2 \times \mathbb{Z}_4.Comment: To appear in Involv

    2D-Shapley: A Framework for Fragmented Data Valuation

    Full text link
    Data valuation -- quantifying the contribution of individual data sources to certain predictive behaviors of a model -- is of great importance to enhancing the transparency of machine learning and designing incentive systems for data sharing. Existing work has focused on evaluating data sources with the shared feature or sample space. How to valuate fragmented data sources of which each only contains partial features and samples remains an open question. We start by presenting a method to calculate the counterfactual of removing a fragment from the aggregated data matrix. Based on the counterfactual calculation, we further propose 2D-Shapley, a theoretical framework for fragmented data valuation that uniquely satisfies some appealing axioms in the fragmented data context. 2D-Shapley empowers a range of new use cases, such as selecting useful data fragments, providing interpretation for sample-wise data values, and fine-grained data issue diagnosis.Comment: ICML 202

    LAVA: Data Valuation without Pre-Specified Learning Algorithms

    Full text link
    Traditionally, data valuation (DV) is posed as a problem of equitably splitting the validation performance of a learning algorithm among the training data. As a result, the calculated data values depend on many design choices of the underlying learning algorithm. However, this dependence is undesirable for many DV use cases, such as setting priorities over different data sources in a data acquisition process and informing pricing mechanisms in a data marketplace. In these scenarios, data needs to be valued before the actual analysis and the choice of the learning algorithm is still undetermined then. Another side-effect of the dependence is that to assess the value of individual points, one needs to re-run the learning algorithm with and without a point, which incurs a large computation burden. This work leapfrogs over the current limits of data valuation methods by introducing a new framework that can value training data in a way that is oblivious to the downstream learning algorithm. Our main results are as follows. (1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets. We show that the distance characterizes the upper bound of the validation performance for any given model under certain Lipschitz conditions. (2) We develop a novel method to value individual data based on the sensitivity analysis of the class-wise Wasserstein distance. Importantly, these values can be directly obtained for free from the output of off-the-shelf optimization solvers when computing the distance. (3) We evaluate our new data valuation framework over various use cases related to detecting low-quality data and show that, surprisingly, the learning-agnostic feature of our framework enables a significant improvement over SOTA performance while being orders of magnitude faster.Comment: ICLR 2023 Spotlight Latest Updated Version: 2023/12/1

    Augment Your Past

    Full text link
    The project focuses on the past by providing the historical facts and photos of the Gettysburg College Campus from 1890s to 1920s. It allows the audience to compare the present Gettysburg College Campus with the past by using an augmented reality mobile application developed in Unity. The project hopes to interest the College community to learn more about the College history using modern approaches of conveying the past

    Learning to Refit for Convex Learning Problems

    Full text link
    Machine learning (ML) models need to be frequently retrained on changing datasets in a wide variety of application scenarios, including data valuation and uncertainty quantification. To efficiently retrain the model, linear approximation methods such as influence function have been proposed to estimate the impact of data changes on model parameters. However, these methods become inaccurate for large dataset changes. In this work, we focus on convex learning problems and propose a general framework to learn to estimate optimized model parameters for different training sets using neural networks. We propose to enforce the predicted model parameters to obey optimality conditions and maintain utility through regularization techniques, which significantly improve generalization. Moreover, we rigorously characterize the expressive power of neural networks to approximate the optimizer of convex problems. Empirical results demonstrate the advantage of the proposed method in accurate and efficient model parameter estimation compared to the state-of-the-art
    corecore