Search CORE

135 research outputs found

Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

Author: Anandkumar Anima
Barak Boaz
Bhaskara Aditya
Henrion Didier
Parrilo Pablo A
Shor NZ
Theodoros Evgeniou Andreas Argyriou
Publication venue
Publication date: 07/11/2014
Field of study

We give a new approach to the dictionary learning (also known as "sparse coding") problem of recovering an unknown

n\times m

matrix

A

(for

m \geq n

) from examples of the form

y = Ax + e,

where

x

is a random vector in

\mathbb R^m

with at most

\tau m

nonzero coordinates, and

e

is a random noise vector in

\mathbb R^n

with bounded magnitude. For the case

m=O(n)

, our algorithm recovers every column of

A

within arbitrarily good constant accuracy in time

m^{O(\log m/\log(\tau^{-1}))}

, in particular achieving polynomial time if

\tau = m^{-\delta}

for any

\delta>0

, and time

m^{O(\log m)}

\tau

is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector

x

to be much sparser---at most

\sqrt{n}

nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser

x

. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor

T

, given access to a tensor

T'

that is

\tau

-close to

T

in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of

T

and

T'

have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems

arXiv.org e-Print Archive

CiteSeerX

DSpace@MIT

Crossref

Vector Field Learning via Spectral Filtering

Author: A. Argyriou
A. Smola
A.N. Tikhonov
H.W. Engl
L. Devroye
M.L. Stein
T. Evgeniou
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Robustness and Generalization

We derive generalization bounds for learning algorithms based on their robustness: the property that if a testing sample is "similar" to a training sample, then the testing error is close to the training error. This provides a novel approach, different from the complexity or stability arguments, to study generalization of learning algorithms. We further show that a weak notion of robustness is both sufficient and necessary for generalizability, which implies that robustness is a fundamental property for learning algorithms to work

arXiv.org e-Print Archive

CiteSeerX

Crossref

ScholarBank@NUS

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research

arXiv.org e-Print Archive

Crossref

Ghent University Academic Bibliography

Multiple functional regression with both discrete and continuous covariates

Author: A Cuevas
C Preda
F Ferraty
F Rossi
G He
G James
H Cardot
H Lian
H Matsui
H. Cardot
H.G. Müller
J Ramsay
J Ramsay
J Ramsay
J. Faraway
J.M. Chiou
L Prchal
MJ Valderrama
T Evgeniou
Publication venue: Physica-Verlag/Springer
Publication date: 16/06/2011
Field of study

International audienceIn this paper we present a nonparametric method for extending functional regression methodology to the situation where more than one functional covariate is used to predict a functional response. Borrowing the idea from Kadri et al. (2010a), the method, which support mixed discrete and continuous explanatory variables, is based on estimating a function-valued function in reproducing kernel Hilbert spaces by virtue of positive operator-valued kernels

HAL - Normandie Université

HAL - Lille 3

Crossref

INRIA a CCSD electronic archive server

Using Both Latent and Supervised Shared Topics for Multitask Learning

Author: A. Torralba
C. Wang
D.G. Lowe
D.M. Blei
I. Biederman
J. Zhang
K.D. Bollacker
R. Ando
R. Caruana
R. Jenatton
R.-E. Fan
S. Ben-David
S. Bickel
S.J. Pan
T. Evgeniou
T. Evgeniou
Y. Xue
Y.W. Teh
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Genetic sequence-based prediction of long-range chromatin interactions suggests a potential role of short tandem repeat sequences in genome organization

Author: A Malaspina
A Sanyal
A Tanay
BE Boser
C Elkan
C Widmer
E de Wit
E Lieberman-Aiden
F Ay
G Rätsch
G Rätsch
H Hamada
J Dekker
J Dekker
J Dostie
J Harrow
JO Yáñez-Cuna
JR Dixon
JR Hughes
KJ Brookes
L Jacob
M Simonis
MJ Fullwood
MJ Zeitz
N Cope
N Heidari
N Varoquaux
Nico Pfeifer
P Meinicke
P Vogt
R Edgar
S Ramamoorthy
Sarvesh Nikumbh
SSP Rao
T Evgeniou
T Evgeniou
T Lingner
TD Schneider
WA Bickmore
Z Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Does peer learning or higher levels of e-learning improve learning abilities? A randomized controlled trial

Author: Bjarne Skjødt Worm
Bloom BS
Boulos MN
Carroll C
Choudhury B
Cook DA
Evgeniou E
Grijpink-van den Biggelaar K
Kavadella A
Kenneth Jensen
Kim S
Mazzoleni MC
Moule P
Ofstad W
Otero WRI
Qi B
Sander B
Stone R
Sucha M
Teasdale G
Wutoh R
Publication venue: Taylor & Francis Group
Publication date: 01/11/2013
Field of study

Background and aims : The fast development of e-learning and social forums demands us to update our understanding of e-learning and peer learning. We aimed to investigate if higher, pre-defined levels of e-learning or social interaction in web forums improved students’ learning ability. Methods : One hundred and twenty Danish medical students were randomized to six groups all with 20 students (eCases level 1, eCases level 2, eCases level 2+, eTextbook level 1, eTextbook level 2, and eTextbook level 2+). All students participated in a pre-test, Group 1 participated in an interactive case-based e-learning program, while Group 2 was presented with textbook material electronically. The 2+ groups were able to discuss the material between themselves in a web forum. The subject was head injury and associated treatment and observation guidelines in the emergency room. Following the e-learning, all students completed a post-test. Pre- and post-tests both consisted of 25 questions randomly chosen from a pool of 50 different questions. Results : All students concluded the study with comparable pre-test results. Students at Level 2 (in both groups) improved statistically significant compared to students at level 1 (p>0.05). There was no statistically significant difference between level 2 and level 2+. However, level 2+ was associated with statistically significant greater student's satisfaction than the rest of the students (p>0.05). Conclusions : This study applies a new way of comparing different types of e-learning using a pre-defined level division and the possibility of peer learning. Our findings show that higher levels of e-learning does in fact provide better results when compared with the same type of e-learning at lower levels. While social interaction in web forums increase student satisfaction, learning ability does not seem to change. Both findings are relevant when designing new e-learning materials

Crossref

Directory of Open Access Journals

Efficient Training of Graph-Regularized Multitask SVMs

Author: A. Torralba
C. Cortes
D. Bertsekas
K.R. Müller
M. Kloft
R. Fan
R.M. Rifkin
S. Sonnenburg
S. Sonnenburg
T. Evgeniou
T. Joachims
T.W.T.C.C. Consortium
W. Samek
Y. Xue
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

We present an optimization framework for graph-regularized multi-task SVMs based on the primal formulation of the problem. Previous approaches employ a so-called multi-task kernel (MTK) and thus are inapplicable when the numbers of training examples n is large (typically n < 20,000, even for just a few tasks). In this paper, we present a primal optimization criterion, allowing for general loss functions, and derive its dual representation. Building on the work of Hsieh et al. [1,2], we derive an algorithm for optimizing the large-margin objective and prove its convergence. Our computational experiments show a speedup of up to three orders of magnitude over LibSVM and SVMLight for several standard benchmarks as well as challenging data sets from the application domain of computational biology. Combining our optimization methodology with the COFFIN large-scale learning framework [3], we are able to train a multi-task SVM using over 1,000,000 training points stemming from 4 different tasks. An efficient C++ implementation of our algorithm is being made publicly available as a part of the SHOGUN machine learning toolbox [4]

Crossref

MPG.PuRe

ProDiGe: Prioritization Of Disease Genes with multitask machine learning from positive and unlabeled examples

Author: A Su
B Brancotte
B Calvo
B Linghu
B Liu
B Schölkopf
B Schölkopf
B Schölkopf
C Giallourakis
C Perez-Iratxeta
C Son
CC Chang
EA Adie
F Denis
F Mordelet
Fantine Mordelet
FS Turner
G Lanckriet
GRG Lanckriet
J Freudenberg
Jean-Philippe Vert
K Bleakley
K Lage
L Jacob
L Jacob
LC Tranchevent
M van Driel
N López-Bigas
N Tiffin
O Vanunu
P Pavlidis
RI Kondor
S Aerts
S Köhler
S Yu
T De Bie
T Evgeniou
T Hwang
U Ala
V McKusick
X Wu
Y Yamanishi
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Elucidating the genetic basis of human diseases is a central goal of genetics and molecular biology. While traditional linkage analysis and modern high-throughput techniques often provide long lists of tens or hundreds of disease gene candidates, the identification of disease genes among the candidates remains time-consuming and expensive. Efficient computational methods are therefore needed to prioritize genes within the list of candidates, by exploiting the wealth of information available about the genes in various databases. Results We propose ProDiGe, a novel algorithm for Prioritization of Disease Genes. ProDiGe implements a novel machine learning strategy based on learning from positive and unlabeled examples, which allows to integrate various sources of information about the genes, to share information about known disease genes across diseases, and to perform genome-wide searches for new disease genes. Experiments on real data show that ProDiGe outperforms state-of-the-art methods for the prioritization of genes in human diseases. Conclusions ProDiGe implements a new machine learning paradigm for gene prioritization, which could help the identification of new disease genes. It is freely available at <url>http://cbio.ensmp.fr/prodige</url>.</p

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals