Search CORE

3 research outputs found

Learning and Data Selection in Big Datasets

Author: Fischione Carlo
Ghauch Hadi
Shokri-Ghadikolaei Hossein
Skoglund Mikael
Publication venue: COMELEC Department, Telecom ParisTech, Paris, France
Publication date: 01/01/2019
Field of study

Finding a dataset of minimal cardinality to characterize the optimal parameters of a model is of paramount importance in machine learning and distributed optimization over a network. This paper investigates the compressibility of large datasets. More specifically, we propose a framework that jointly learns the input-output mapping as well as the most representative samples of the dataset (sufficient dataset). Our analytical results show that the cardinality of the sufficient dataset increases sub-linearly with respect to the original dataset size. Numerical evaluations of real datasets reveal a large compressibility, up to 95%, without a noticeable drop in the learnability performance, measured by the generalization error.QC 20191008</p

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Learning and Data Selection in Big Datasets

Author: Fischione Carlo
Ghauch Hadi
Shokri-Ghadikolaei Hossein
Skoglund Mikael
Publication venue: COMELEC Department, Telecom ParisTech, Paris, France
Publication date: 01/01/2019
Field of study

Publikationer från KTH