3,035 research outputs found
Bounding Embeddings of VC Classes into Maximum Classes
One of the earliest conjectures in computational learning theory-the Sample
Compression conjecture-asserts that concept classes (equivalently set systems)
admit compression schemes of size linear in their VC dimension. To-date this
statement is known to be true for maximum classes---those that possess maximum
cardinality for their VC dimension. The most promising approach to positively
resolving the conjecture is by embedding general VC classes into maximum
classes without super-linear increase to their VC dimensions, as such
embeddings would extend the known compression schemes to all VC classes. We
show that maximum classes can be characterised by a local-connectivity property
of the graph obtained by viewing the class as a cubical complex. This geometric
characterisation of maximum VC classes is applied to prove a negative embedding
result which demonstrates VC-d classes that cannot be embedded in any maximum
class of VC dimension lower than 2d. On the other hand, we show that every VC-d
class C embeds in a VC-(d+D) maximum class where D is the deficiency of C,
i.e., the difference between the cardinalities of a maximum VC-d class and of
C. For VC-2 classes in binary n-cubes for 4 <= n <= 6, we give best possible
results on embedding into maximum classes. For some special classes of Boolean
functions, relationships with maximum classes are investigated. Finally we give
a general recursive procedure for embedding VC-d classes into VC-(d+k) maximum
classes for smallest k.Comment: 22 pages, 2 figure
Maximal compression of the redshift space galaxy power spectrum and bispectrum
We explore two methods of compressing the redshift space galaxy power
spectrum and bispectrum with respect to a chosen set of cosmological
parameters. Both methods involve reducing the dimension of the original
data-vector ( e.g. 1000 elements ) to the number of cosmological parameters
considered ( e.g. seven ) using the Karhunen-Lo\`eve algorithm. In the first
case, we run MCMC sampling on the compressed data-vector in order to recover
the one-dimensional (1D) and two-dimensional (2D) posterior distributions. The
second option, approximately 2000 times faster, works by orthogonalising the
parameter space through diagonalisation of the Fisher information matrix before
the compression, obtaining the posterior distributions without the need of MCMC
sampling. Using these methods for future spectroscopic redshift surveys like
DESI, EUCLID and PFS would drastically reduce the number of simulations needed
to compute accurate covariance matrices with minimal loss of constraining
power. We consider a redshift bin of a DESI-like experiment. Using the power
spectrum combined with the bispectrum as a data-vector, both compression
methods on average recover the 68% credible regions to within 0.7% and 2% of
those resulting from standard MCMC sampling respectively. These confidence
intervals are also smaller than the ones obtained using only the power spectrum
by (81%, 80%, 82%) respectively for the bias parameter b_1, the growth rate f
and the scalar amplitude parameter A_s.Comment: 27 pages, 8 figures, 1 table, Accepted 2018 January 28. Received 2018
January 25; in original form 2017 September 11. Added clarifications in the
text on the bias modelling and compression limits following referee's
comments. Removed tetraspectrum term from the pk-bk cross covariance +
correction in the appendi
- …