27 research outputs found
Entropy and information in neural spike trains: Progress on the sampling problem
The major problem in information theoretic analysis of neural responses and
other biological data is the reliable estimation of entropy--like quantities
from small samples. We apply a recently introduced Bayesian entropy estimator
to synthetic data inspired by experiments, and to real experimental spike
trains. The estimator performs admirably even very deep in the undersampled
regime, where other techniques fail. This opens new possibilities for the
information theoretic analysis of experiments, and may be of general interest
as an example of learning from limited data.Comment: 7 pages, 4 figures; referee suggested changes, accepted versio
Quantum algorithms for testing properties of distributions
Suppose one has access to oracles generating samples from two unknown
probability distributions P and Q on some N-element set. How many samples does
one need to test whether the two distributions are close or far from each other
in the L_1-norm ? This and related questions have been extensively studied
during the last years in the field of property testing. In the present paper we
study quantum algorithms for testing properties of distributions. It is shown
that the L_1-distance between P and Q can be estimated with a constant
precision using approximately N^{1/2} queries in the quantum settings, whereas
classical computers need \Omega(N) queries. We also describe quantum algorithms
for testing Uniformity and Orthogonality with query complexity O(N^{1/3}). The
classical query complexity of these problems is known to be \Omega(N^{1/2}).Comment: 20 page
Testing probability distributions underlying aggregated data
In this paper, we analyze and study a hybrid model for testing and learning
probability distributions. Here, in addition to samples, the testing algorithm
is provided with one of two different types of oracles to the unknown
distribution over . More precisely, we define both the dual and
cumulative dual access models, in which the algorithm can both sample from
and respectively, for any ,
- query the probability mass (query access); or
- get the total mass of , i.e. (cumulative
access)
These two models, by generalizing the previously studied sampling and query
oracle models, allow us to bypass the strong lower bounds established for a
number of problems in these settings, while capturing several interesting
aspects of these problems -- and providing new insight on the limitations of
the models. Finally, we show that while the testing algorithms can be in most
cases strictly more efficient, some tasks remain hard even with this additional
power
The Design of Arbitrage-Free Data Pricing Schemes
Motivated by a growing market that involves buying and selling data over the
web, we study pricing schemes that assign value to queries issued over a
database. Previous work studied pricing mechanisms that compute the price of a
query by extending a data seller's explicit prices on certain queries, or
investigated the properties that a pricing function should exhibit without
detailing a generic construction. In this work, we present a formal framework
for pricing queries over data that allows the construction of general families
of pricing functions, with the main goal of avoiding arbitrage. We consider two
types of pricing schemes: instance-independent schemes, where the price depends
only on the structure of the query, and answer-dependent schemes, where the
price also depends on the query output. Our main result is a complete
characterization of the structure of pricing functions in both settings, by
relating it to properties of a function over a lattice. We use our
characterization, together with information-theoretic methods, to construct a
variety of arbitrage-free pricing functions. Finally, we discuss various
tradeoffs in the design space and present techniques for efficient computation
of the proposed pricing functions.Comment: full pape