22 research outputs found

    Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases

    Full text link
    Many applications of machine learning on discrete domains, such as learning preference functions in recommender systems or auctions, can be reduced to estimating a set function that is sparse in the Fourier domain. In this work, we present a new family of algorithms for learning Fourier-sparse set functions. They require at most nkklog2k+knk - k \log_2 k + k queries (set function evaluations), under mild conditions on the Fourier coefficients, where nn is the size of the ground set and kk the number of non-zero Fourier coefficients. In contrast to other work that focused on the orthogonal Walsh-Hadamard transform, our novel algorithms operate with recently introduced non-orthogonal Fourier transforms that offer different notions of Fourier-sparsity. These naturally arise when modeling, e.g., sets of items forming substitutes and complements. We demonstrate effectiveness on several real-world applications

    Fourier Analysis-based Iterative Combinatorial Auctions

    Full text link
    Recent advances in Fourier analysis have brought new tools to efficiently represent and learn set functions. In this paper, we bring the power of Fourier analysis to the design of combinatorial auctions (CAs). The key idea is to approximate bidders' value functions using Fourier-sparse set functions, which can be computed using a relatively small number of queries. Since this number is still too large for real-world CAs, we propose a new hybrid design: we first use neural networks to learn bidders' values and then apply Fourier analysis to the learned representations. On a technical level, we formulate a Fourier transform-based winner determination problem and derive its mixed integer program formulation. Based on this, we devise an iterative CA that asks Fourier-based queries. We experimentally show that our hybrid ICA achieves higher efficiency than prior auction designs, leads to a fairer distribution of social welfare, and significantly reduces runtime. With this paper, we are the first to leverage Fourier analysis in CA design and lay the foundation for future work in this area

    Ensemble Analysis of Adaptive Compressed Genome Sequencing Strategies

    Full text link
    Acquiring genomes at single-cell resolution has many applications such as in the study of microbiota. However, deep sequencing and assembly of all of millions of cells in a sample is prohibitively costly. A property that can come to rescue is that deep sequencing of every cell should not be necessary to capture all distinct genomes, as the majority of cells are biological replicates. Biologically important samples are often sparse in that sense. In this paper, we propose an adaptive compressed method, also known as distilled sensing, to capture all distinct genomes in a sparse microbial community with reduced sequencing effort. As opposed to group testing in which the number of distinct events is often constant and sparsity is equivalent to rarity of an event, sparsity in our case means scarcity of distinct events in comparison to the data size. Previously, we introduced the problem and proposed a distilled sensing solution based on the breadth first search strategy. We simulated the whole process which constrained our ability to study the behavior of the algorithm for the entire ensemble due to its computational intensity. In this paper, we modify our previous breadth first search strategy and introduce the depth first search strategy. Instead of simulating the entire process, which is intractable for a large number of experiments, we provide a dynamic programming algorithm to analyze the behavior of the method for the entire ensemble. The ensemble analysis algorithm recursively calculates the probability of capturing every distinct genome and also the expected total sequenced nucleotides for a given population profile. Our results suggest that the expected total sequenced nucleotides grows proportional to log\log of the number of cells and proportional linearly with the number of distinct genomes

    Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

    Full text link
    We investigate three related and important problems connected to machine learning: approximating a submodular function everywhere, learning a submodular function (in a PAC-like setting [53]), and constrained minimization of submodular functions. We show that the complexity of all three problems depends on the 'curvature' of the submodular function, and provide lower and upper bounds that refine and improve previous results [3, 16, 18, 52]. Our proof techniques are fairly generic. We either use a black-box transformation of the function (for approximation and learning), or a transformation of algorithms to use an appropriate surrogate function (for minimization). Curiously, curvature has been known to influence approximations for submodular maximization [7, 55], but its effect on minimization, approximation and learning has hitherto been open. We complete this picture, and also support our theoretical claims by empirical results.Comment: 21 pages. A shorter version appeared in Advances of NIPS-201

    Targeted Undersmoothing

    Full text link
    This paper proposes a post-model selection inference procedure, called targeted undersmoothing, designed to construct uniformly valid confidence sets for a broad class of functionals of sparse high-dimensional statistical models. These include dense functionals, which may potentially depend on all elements of an unknown high-dimensional parameter. The proposed confidence sets are based on an initially selected model and two additionally selected models, an upper model and a lower model, which enlarge the initially selected model. We illustrate application of the procedure in two empirical examples. The first example considers estimation of heterogeneous treatment effects using data from the Job Training Partnership Act of 1982, and the second example looks at estimating profitability from a mailing strategy based on estimated heterogeneous treatment effects in a direct mail marketing campaign. We also provide evidence on the finite sample performance of the proposed targeted undersmoothing procedure through a series of simulation experiments
    corecore