3,035 research outputs found

    Bounding Embeddings of VC Classes into Maximum Classes

    Full text link
    One of the earliest conjectures in computational learning theory-the Sample Compression conjecture-asserts that concept classes (equivalently set systems) admit compression schemes of size linear in their VC dimension. To-date this statement is known to be true for maximum classes---those that possess maximum cardinality for their VC dimension. The most promising approach to positively resolving the conjecture is by embedding general VC classes into maximum classes without super-linear increase to their VC dimensions, as such embeddings would extend the known compression schemes to all VC classes. We show that maximum classes can be characterised by a local-connectivity property of the graph obtained by viewing the class as a cubical complex. This geometric characterisation of maximum VC classes is applied to prove a negative embedding result which demonstrates VC-d classes that cannot be embedded in any maximum class of VC dimension lower than 2d. On the other hand, we show that every VC-d class C embeds in a VC-(d+D) maximum class where D is the deficiency of C, i.e., the difference between the cardinalities of a maximum VC-d class and of C. For VC-2 classes in binary n-cubes for 4 <= n <= 6, we give best possible results on embedding into maximum classes. For some special classes of Boolean functions, relationships with maximum classes are investigated. Finally we give a general recursive procedure for embedding VC-d classes into VC-(d+k) maximum classes for smallest k.Comment: 22 pages, 2 figure

    Maximal compression of the redshift space galaxy power spectrum and bispectrum

    Get PDF
    We explore two methods of compressing the redshift space galaxy power spectrum and bispectrum with respect to a chosen set of cosmological parameters. Both methods involve reducing the dimension of the original data-vector ( e.g. 1000 elements ) to the number of cosmological parameters considered ( e.g. seven ) using the Karhunen-Lo\`eve algorithm. In the first case, we run MCMC sampling on the compressed data-vector in order to recover the one-dimensional (1D) and two-dimensional (2D) posterior distributions. The second option, approximately 2000 times faster, works by orthogonalising the parameter space through diagonalisation of the Fisher information matrix before the compression, obtaining the posterior distributions without the need of MCMC sampling. Using these methods for future spectroscopic redshift surveys like DESI, EUCLID and PFS would drastically reduce the number of simulations needed to compute accurate covariance matrices with minimal loss of constraining power. We consider a redshift bin of a DESI-like experiment. Using the power spectrum combined with the bispectrum as a data-vector, both compression methods on average recover the 68% credible regions to within 0.7% and 2% of those resulting from standard MCMC sampling respectively. These confidence intervals are also smaller than the ones obtained using only the power spectrum by (81%, 80%, 82%) respectively for the bias parameter b_1, the growth rate f and the scalar amplitude parameter A_s.Comment: 27 pages, 8 figures, 1 table, Accepted 2018 January 28. Received 2018 January 25; in original form 2017 September 11. Added clarifications in the text on the bias modelling and compression limits following referee's comments. Removed tetraspectrum term from the pk-bk cross covariance + correction in the appendi
    • …
    corecore