7 research outputs found
On-line PCA with Optimal Regrets
We carefully investigate the on-line version of PCA, where in each trial a
learning algorithm plays a k-dimensional subspace, and suffers the compression
loss on the next instance when projected into the chosen subspace. In this
setting, we analyze two popular on-line algorithms, Gradient Descent (GD) and
Exponentiated Gradient (EG). We show that both algorithms are essentially
optimal in the worst-case. This comes as a surprise, since EG is known to
perform sub-optimally when the instances are sparse. This different behavior of
EG for PCA is mainly related to the non-negativity of the loss in this case,
which makes the PCA setting qualitatively different from other settings studied
in the literature. Furthermore, we show that when considering regret bounds as
function of a loss budget, EG remains optimal and strictly outperforms GD.
Next, we study the extension of the PCA setting, in which the Nature is allowed
to play with dense instances, which are positive matrices with bounded largest
eigenvalue. Again we can show that EG is optimal and strictly better than GD in
this setting
Leading strategies in competitive on-line prediction
We start from a simple asymptotic result for the problem of on-line
regression with the quadratic loss function: the class of continuous
limited-memory prediction strategies admits a "leading prediction strategy",
which not only asymptotically performs at least as well as any continuous
limited-memory strategy but also satisfies the property that the excess loss of
any continuous limited-memory strategy is determined by how closely it imitates
the leading strategy. More specifically, for any class of prediction strategies
constituting a reproducing kernel Hilbert space we construct a leading
strategy, in the sense that the loss of any prediction strategy whose norm is
not too large is determined by how closely it imitates the leading strategy.
This result is extended to the loss functions given by Bregman divergences and
by strictly proper scoring rules.Comment: 20 pages; a conference version is to appear in the ALT'2006
proceeding
Annotation of the modular polyketide synthase and nonribosomal peptide synthetase gene clusters in the genome of Streptomyces tsukubaensis NRRL18488
et al.The high G+C content and large genome size make the sequencing and assembly of Streptomyces genomes more difficult than for other bacteria. Many pharmaceutically important natural products are synthesized by modular polyketide synthases (PKSs) and nonribosomal peptide synthetases (NRPSs). The analysis of such gene clusters is difficult if the genome sequence is not of the highest quality, because clusters can be distributed over several contigs, and sequencing errors can introduce apparent frameshifts into the large PKS and NRPS proteins. An additional problem is that the modular nature of the clusters results in the presence of imperfect repeats, which may cause assembly errors. The genome sequence of Streptomyces tsukubaensis NRRL18488 was scanned for potential PKS and NRPS modular clusters. A phylogenetic approach was used to identify multiple contigs belonging to the same cluster. Four PKS clusters and six NRPS clusters were identified. Contigs containing cluster sequences were analyzed in detail by using the ClustScan program, which suggested the order and orientation of the contigs. The sequencing of the appropriate PCR products confirmed the ordering and allowed the correction of apparent frameshifts resulting from sequencing errors. The product chemistry of such correctly assembled clusters could also be predicted. The analysis of one PKS cluster showed that it should produce a bafilomycin-like compound, and reverse transcription (RT)-PCR was used to show that the cluster was transcribed. © 2012, American Society for Microbiology.We thank the Government of Slovenia, Ministry of Higher Education, Science and Technology (Slovenian Research Agency [ARRS]), for the award of grant no. J4-9331 and L4-2188 to H.P. We also thank the Ministry of the Economy, the JAPTI Agency, and the European Social Fund (contract no. 102/2008) for the funds awarded for the employment of G.K. This work was also funded by a cooperation grant of the German Academic Exchange Service (DAAD) and the Ministry of Science, Education, and Sports, Republic of Croatia (to J.C. and D.H.), and by grant 09/5 (to D.H.) from the Croatian Science Foundation.Peer Reviewe
Clustering with Lower Bound on Similarity ⋆
Abstract. We propose a new method, called SimClus, for clustering with lower bound on similarity. Instead of accepting k the number of clusters to find, the alternative similarity-based approach imposes a lower bound on the similarity between an object and its corresponding cluster representative (with one representative per cluster). SimClus achieves a O(log n) approximation bound on the number of clusters, whereas for the best previous algorithm the bound can be as poor as O(n). Experiments on real and synthetic datasets show that our algorithm produces more than 40 % fewer representative objects, yet offers the same or better clustering quality. We also propose a dynamic variant of the algorithm, which can be effectively used in an on-line setting.
Loss bounds for online category ranking
Abstract. Category ranking is the task of ordering labels with respect to their relevance to an input instance. In this paper we describe and analyze several algorithms for online category ranking where the instances are revealed in a sequential manner. We describe additive and multiplicative updates which constitute the core of the learning algorithms. The updates are derived by casting a constrained optimization problem for each new instance. We derive loss bounds for the algorithms by using the properties of the dual solution while imposing additional constraints on the dual form. Finally, we outline and analyze the convergence of a general update that can be employed with any Bregman divergence.