225 research outputs found
Competing with stationary prediction strategies
In this paper we introduce the class of stationary prediction strategies and
construct a prediction algorithm that asymptotically performs as well as the
best continuous stationary strategy. We make mild compactness assumptions but
no stochastic assumptions about the environment. In particular, no assumption
of stationarity is made about the environment, and the stationarity of the
considered strategies only means that they do not depend explicitly on time; we
argue that it is natural to consider only stationary strategies even for highly
non-stationary environments.Comment: 20 page
On-line PCA with Optimal Regrets
We carefully investigate the on-line version of PCA, where in each trial a
learning algorithm plays a k-dimensional subspace, and suffers the compression
loss on the next instance when projected into the chosen subspace. In this
setting, we analyze two popular on-line algorithms, Gradient Descent (GD) and
Exponentiated Gradient (EG). We show that both algorithms are essentially
optimal in the worst-case. This comes as a surprise, since EG is known to
perform sub-optimally when the instances are sparse. This different behavior of
EG for PCA is mainly related to the non-negativity of the loss in this case,
which makes the PCA setting qualitatively different from other settings studied
in the literature. Furthermore, we show that when considering regret bounds as
function of a loss budget, EG remains optimal and strictly outperforms GD.
Next, we study the extension of the PCA setting, in which the Nature is allowed
to play with dense instances, which are positive matrices with bounded largest
eigenvalue. Again we can show that EG is optimal and strictly better than GD in
this setting
Improved algorithms for online load balancing
We consider an online load balancing problem and its extensions in the
framework of repeated games. On each round, the player chooses a distribution
(task allocation) over servers, and then the environment reveals the load
of each server, which determines the computation time of each server for
processing the task assigned. After all rounds, the cost of the player is
measured by some norm of the cumulative computation-time vector. The cost is
the makespan if the norm is -norm. The goal is to minimize the
regret, i.e., minimizing the player's cost relative to the cost of the best
fixed distribution in hindsight. We propose algorithms for general norms and
prove their regret bounds. In particular, for -norm, our regret bound
matches the best known bound and the proposed algorithm runs in polynomial time
per trial involving linear programming and second order programming, whereas no
polynomial time algorithm was previously known to achieve the bound.Comment: 16 pages; typos correcte
Leading strategies in competitive on-line prediction
We start from a simple asymptotic result for the problem of on-line
regression with the quadratic loss function: the class of continuous
limited-memory prediction strategies admits a "leading prediction strategy",
which not only asymptotically performs at least as well as any continuous
limited-memory strategy but also satisfies the property that the excess loss of
any continuous limited-memory strategy is determined by how closely it imitates
the leading strategy. More specifically, for any class of prediction strategies
constituting a reproducing kernel Hilbert space we construct a leading
strategy, in the sense that the loss of any prediction strategy whose norm is
not too large is determined by how closely it imitates the leading strategy.
This result is extended to the loss functions given by Bregman divergences and
by strictly proper scoring rules.Comment: 20 pages; a conference version is to appear in the ALT'2006
proceeding
Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity
We study the problem of aggregation under the squared loss in the model of
regression with deterministic design. We obtain sharp PAC-Bayesian risk bounds
for aggregates defined via exponential weights, under general assumptions on
the distribution of errors and on the functions to aggregate. We then apply
these results to derive sparsity oracle inequalities
Self-Reported Functional Status as Predictor of Observed Functional Capacity in Subjects with Early Osteoarthritis of the Hip and Knee: A Diagnostic Study in the CHECK Cohort
Objectives Patients with hip or knee osteoarthritis (OA) may experience functional limitations in work settings. In the Cohort Hip and Cohort Knee study (CHECK) physical function was both self-reported and measured performance-based, using Functional Capacity Evaluation (FCE). Relations between self-reported scores on SF-36 and WOMAC (Western Ontario and McMaster Arthritis Index, function scales) and FCE performance were studied, and their diagnostic value for clinicians in predicting observed physical work limitations was assessed. Methods Ninety-two subjects scored physical function on SF-36 (scale 0–100, 100 indicating the best health level) and WOMAC (scale 0–68, 68 indicates maximum restriction) and performed the FCE. Correlations were calculated between all scores. Cross-tables were constructed using both questionnaires as diagnostic tests to identify work limitations. Subjects lifting <22.5 kg on the FCE-test ‘lifting-low’ were labeled as having physical work limitations. Diagnostic aspects at different cut-off scores for both questionnaires were analysed. Results Statistically significant correlations (Spearman’s ρ 0.34–0.49) were found between questionnaire scores and lifting and carrying tests. Results of a diagnostic cross-table with cut-off point <60 on SF-36 ‘physical functioning’ were: sensitivity 0.34, specificity 0.97 and positive predictive value (PV+) 0.95. Cut-off point ≥21 on WOMAC ‘function’ resulted in sensitivity 0.51, specificity 0.88 and PV+ 0.88. Conclusion Low self-reported function scores on SF-36 and WOMAC diagnosed subjects with limitations on the FCE. However, high scores did not guarantee performance without physical work limitations. These results are specific to the tested persons with early OA, in populations with a different prevalence of limitations, different diagnostic values will be found. FCE may be indicated to help clinicians to assess actual work capacity
The African Genome Variation Project shapes medical genetics in Africa.
Given the importance of Africa to studies of human origins and disease susceptibility, detailed characterization of African genetic diversity is needed. The African Genome Variation Project provides a resource with which to design, implement and interpret genomic studies in sub-Saharan Africa and worldwide. The African Genome Variation Project represents dense genotypes from 1,481 individuals and whole-genome sequences from 320 individuals across sub-Saharan Africa. Using this resource, we find novel evidence of complex, regionally distinct hunter-gatherer and Eurasian admixture across sub-Saharan Africa. We identify new loci under selection, including loci related to malaria susceptibility and hypertension. We show that modern imputation panels (sets of reference genotypes from which unobserved or missing genotypes in study sets can be inferred) can identify association signals at highly differentiated loci across populations in sub-Saharan Africa. Using whole-genome sequencing, we demonstrate further improvements in imputation accuracy, strengthening the case for large-scale sequencing efforts of diverse African haplotypes. Finally, we present an efficient genotype array design capturing common genetic variation in Africa
- …