22 research outputs found
Learning Set Functions that are Sparse in Non-Orthogonal Fourier Bases
Many applications of machine learning on discrete domains, such as learning
preference functions in recommender systems or auctions, can be reduced to
estimating a set function that is sparse in the Fourier domain. In this work,
we present a new family of algorithms for learning Fourier-sparse set
functions. They require at most queries (set function
evaluations), under mild conditions on the Fourier coefficients, where is
the size of the ground set and the number of non-zero Fourier coefficients.
In contrast to other work that focused on the orthogonal Walsh-Hadamard
transform, our novel algorithms operate with recently introduced non-orthogonal
Fourier transforms that offer different notions of Fourier-sparsity. These
naturally arise when modeling, e.g., sets of items forming substitutes and
complements. We demonstrate effectiveness on several real-world applications
Fourier Analysis-based Iterative Combinatorial Auctions
Recent advances in Fourier analysis have brought new tools to efficiently
represent and learn set functions. In this paper, we bring the power of Fourier
analysis to the design of combinatorial auctions (CAs). The key idea is to
approximate bidders' value functions using Fourier-sparse set functions, which
can be computed using a relatively small number of queries. Since this number
is still too large for real-world CAs, we propose a new hybrid design: we first
use neural networks to learn bidders' values and then apply Fourier analysis to
the learned representations. On a technical level, we formulate a Fourier
transform-based winner determination problem and derive its mixed integer
program formulation. Based on this, we devise an iterative CA that asks
Fourier-based queries. We experimentally show that our hybrid ICA achieves
higher efficiency than prior auction designs, leads to a fairer distribution of
social welfare, and significantly reduces runtime. With this paper, we are the
first to leverage Fourier analysis in CA design and lay the foundation for
future work in this area
Ensemble Analysis of Adaptive Compressed Genome Sequencing Strategies
Acquiring genomes at single-cell resolution has many applications such as in
the study of microbiota. However, deep sequencing and assembly of all of
millions of cells in a sample is prohibitively costly. A property that can come
to rescue is that deep sequencing of every cell should not be necessary to
capture all distinct genomes, as the majority of cells are biological
replicates. Biologically important samples are often sparse in that sense. In
this paper, we propose an adaptive compressed method, also known as distilled
sensing, to capture all distinct genomes in a sparse microbial community with
reduced sequencing effort. As opposed to group testing in which the number of
distinct events is often constant and sparsity is equivalent to rarity of an
event, sparsity in our case means scarcity of distinct events in comparison to
the data size. Previously, we introduced the problem and proposed a distilled
sensing solution based on the breadth first search strategy. We simulated the
whole process which constrained our ability to study the behavior of the
algorithm for the entire ensemble due to its computational intensity. In this
paper, we modify our previous breadth first search strategy and introduce the
depth first search strategy. Instead of simulating the entire process, which is
intractable for a large number of experiments, we provide a dynamic programming
algorithm to analyze the behavior of the method for the entire ensemble. The
ensemble analysis algorithm recursively calculates the probability of capturing
every distinct genome and also the expected total sequenced nucleotides for a
given population profile. Our results suggest that the expected total sequenced
nucleotides grows proportional to of the number of cells and
proportional linearly with the number of distinct genomes
Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
We investigate three related and important problems connected to machine
learning: approximating a submodular function everywhere, learning a submodular
function (in a PAC-like setting [53]), and constrained minimization of
submodular functions. We show that the complexity of all three problems depends
on the 'curvature' of the submodular function, and provide lower and upper
bounds that refine and improve previous results [3, 16, 18, 52]. Our proof
techniques are fairly generic. We either use a black-box transformation of the
function (for approximation and learning), or a transformation of algorithms to
use an appropriate surrogate function (for minimization). Curiously, curvature
has been known to influence approximations for submodular maximization [7, 55],
but its effect on minimization, approximation and learning has hitherto been
open. We complete this picture, and also support our theoretical claims by
empirical results.Comment: 21 pages. A shorter version appeared in Advances of NIPS-201
Targeted Undersmoothing
This paper proposes a post-model selection inference procedure, called
targeted undersmoothing, designed to construct uniformly valid confidence sets
for a broad class of functionals of sparse high-dimensional statistical models.
These include dense functionals, which may potentially depend on all elements
of an unknown high-dimensional parameter. The proposed confidence sets are
based on an initially selected model and two additionally selected models, an
upper model and a lower model, which enlarge the initially selected model. We
illustrate application of the procedure in two empirical examples. The first
example considers estimation of heterogeneous treatment effects using data from
the Job Training Partnership Act of 1982, and the second example looks at
estimating profitability from a mailing strategy based on estimated
heterogeneous treatment effects in a direct mail marketing campaign. We also
provide evidence on the finite sample performance of the proposed targeted
undersmoothing procedure through a series of simulation experiments