225 research outputs found

    Competing with stationary prediction strategies

    Get PDF
    In this paper we introduce the class of stationary prediction strategies and construct a prediction algorithm that asymptotically performs as well as the best continuous stationary strategy. We make mild compactness assumptions but no stochastic assumptions about the environment. In particular, no assumption of stationarity is made about the environment, and the stationarity of the considered strategies only means that they do not depend explicitly on time; we argue that it is natural to consider only stationary strategies even for highly non-stationary environments.Comment: 20 page

    On-line PCA with Optimal Regrets

    Full text link
    We carefully investigate the on-line version of PCA, where in each trial a learning algorithm plays a k-dimensional subspace, and suffers the compression loss on the next instance when projected into the chosen subspace. In this setting, we analyze two popular on-line algorithms, Gradient Descent (GD) and Exponentiated Gradient (EG). We show that both algorithms are essentially optimal in the worst-case. This comes as a surprise, since EG is known to perform sub-optimally when the instances are sparse. This different behavior of EG for PCA is mainly related to the non-negativity of the loss in this case, which makes the PCA setting qualitatively different from other settings studied in the literature. Furthermore, we show that when considering regret bounds as function of a loss budget, EG remains optimal and strictly outperforms GD. Next, we study the extension of the PCA setting, in which the Nature is allowed to play with dense instances, which are positive matrices with bounded largest eigenvalue. Again we can show that EG is optimal and strictly better than GD in this setting

    Improved algorithms for online load balancing

    Full text link
    We consider an online load balancing problem and its extensions in the framework of repeated games. On each round, the player chooses a distribution (task allocation) over KK servers, and then the environment reveals the load of each server, which determines the computation time of each server for processing the task assigned. After all rounds, the cost of the player is measured by some norm of the cumulative computation-time vector. The cost is the makespan if the norm is LL_\infty-norm. The goal is to minimize the regret, i.e., minimizing the player's cost relative to the cost of the best fixed distribution in hindsight. We propose algorithms for general norms and prove their regret bounds. In particular, for LL_\infty-norm, our regret bound matches the best known bound and the proposed algorithm runs in polynomial time per trial involving linear programming and second order programming, whereas no polynomial time algorithm was previously known to achieve the bound.Comment: 16 pages; typos correcte

    Leading strategies in competitive on-line prediction

    Get PDF
    We start from a simple asymptotic result for the problem of on-line regression with the quadratic loss function: the class of continuous limited-memory prediction strategies admits a "leading prediction strategy", which not only asymptotically performs at least as well as any continuous limited-memory strategy but also satisfies the property that the excess loss of any continuous limited-memory strategy is determined by how closely it imitates the leading strategy. More specifically, for any class of prediction strategies constituting a reproducing kernel Hilbert space we construct a leading strategy, in the sense that the loss of any prediction strategy whose norm is not too large is determined by how closely it imitates the leading strategy. This result is extended to the loss functions given by Bregman divergences and by strictly proper scoring rules.Comment: 20 pages; a conference version is to appear in the ALT'2006 proceeding

    Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity

    Get PDF
    We study the problem of aggregation under the squared loss in the model of regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds for aggregates defined via exponential weights, under general assumptions on the distribution of errors and on the functions to aggregate. We then apply these results to derive sparsity oracle inequalities

    Self-Reported Functional Status as Predictor of Observed Functional Capacity in Subjects with Early Osteoarthritis of the Hip and Knee: A Diagnostic Study in the CHECK Cohort

    Get PDF
    Objectives Patients with hip or knee osteoarthritis (OA) may experience functional limitations in work settings. In the Cohort Hip and Cohort Knee study (CHECK) physical function was both self-reported and measured performance-based, using Functional Capacity Evaluation (FCE). Relations between self-reported scores on SF-36 and WOMAC (Western Ontario and McMaster Arthritis Index, function scales) and FCE performance were studied, and their diagnostic value for clinicians in predicting observed physical work limitations was assessed. Methods Ninety-two subjects scored physical function on SF-36 (scale 0–100, 100 indicating the best health level) and WOMAC (scale 0–68, 68 indicates maximum restriction) and performed the FCE. Correlations were calculated between all scores. Cross-tables were constructed using both questionnaires as diagnostic tests to identify work limitations. Subjects lifting <22.5 kg on the FCE-test ‘lifting-low’ were labeled as having physical work limitations. Diagnostic aspects at different cut-off scores for both questionnaires were analysed. Results Statistically significant correlations (Spearman’s ρ 0.34–0.49) were found between questionnaire scores and lifting and carrying tests. Results of a diagnostic cross-table with cut-off point <60 on SF-36 ‘physical functioning’ were: sensitivity 0.34, specificity 0.97 and positive predictive value (PV+) 0.95. Cut-off point ≥21 on WOMAC ‘function’ resulted in sensitivity 0.51, specificity 0.88 and PV+ 0.88. Conclusion Low self-reported function scores on SF-36 and WOMAC diagnosed subjects with limitations on the FCE. However, high scores did not guarantee performance without physical work limitations. These results are specific to the tested persons with early OA, in populations with a different prevalence of limitations, different diagnostic values will be found. FCE may be indicated to help clinicians to assess actual work capacity

    The African Genome Variation Project shapes medical genetics in Africa.

    Get PDF
    Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa
    corecore