Search CORE

4,033 research outputs found

Distributed Online Big Data Classification Using Context Information

Author: Tekin Cem
van der Schaar Mihaela
Publication venue
Publication date: 02/07/2013
Field of study

Distributed, online data mining systems have emerged as a result of applications requiring analysis of large amounts of correlated and high-dimensional data produced by multiple distributed data sources. We propose a distributed online data classification framework where data is gathered by distributed data sources and processed by a heterogeneous set of distributed learners which learn online, at run-time, how to classify the different data streams either by using their locally available classification functions or by helping each other by classifying each other's data. Importantly, since the data is gathered at different locations, sending the data to another learner to process incurs additional costs such as delays, and hence this will be only beneficial if the benefits obtained from a better classification will exceed the costs. We model the problem of joint classification by the distributed and heterogeneous learners from multiple data sources as a distributed contextual bandit problem where each data is characterized by a specific context. We develop a distributed online learning algorithm for which we can prove sublinear regret. Compared to prior work in distributed online data mining, our work is the first to provide analytic regret results characterizing the performance of the proposed algorithm

arXiv.org e-Print Archive

Crossref

The Dual JL Transforms and Superfast Matrix Algorithms

Author: Luan Qi
Pan Victor Y.
Svadlenka John
Publication venue
Publication date: 02/04/2021
Field of study

We call a matrix algorithm superfast (aka running at sublinear cost) if it involves much fewer flops and memory cells than the matrix has entries. Using such algorithms is highly desired or even imperative in computations for Big Data, which involve immense matrices and are quite typically reduced to solving linear least squares problem and/or computation of low rank approximation of an input matrix. The known algorithms for these problems are not superfast, but we prove that their certain superfast modifications output reasonable or even nearly optimal solutions for large input classes. We also propose, analyze, and test a novel superfast algorithm for iterative refinement of any crude but sufficiently close low rank approximation of a matrix. The results of our numerical tests are in good accordance with our formal study.Comment: 36.1 pages, 5 figures, and 1 table. arXiv admin note: text overlap with arXiv:1710.07946, arXiv:1906.0411

arXiv.org e-Print Archive

Lepskii Principle in Supervised Learning

Author: Blanchard Gilles
Mathé Peter
Mücke Nicole
Publication venue
Publication date: 26/05/2019
Field of study

In the setting of supervised learning using reproducing kernel methods, we propose a data-dependent regularization parameter selection rule that is adaptive to the unknown regularity of the target function and is optimal both for the least-square (prediction) error and for the reproducing kernel Hilbert space (reconstruction) norm error. It is based on a modified Lepskii balancing principle using a varying family of norms

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Echo State Networks for Proactive Caching in Cloud-Based Radio Access Networks with Mobile Users

Author: Chen Mingzhe
Debbah Mérouane
Saad Walid
Yin Changchuan
Publication venue
Publication date: 31/03/2017
Field of study

In this paper, the problem of proactive caching is studied for cloud radio access networks (CRANs). In the studied model, the baseband units (BBUs) can predict the content request distribution and mobility pattern of each user, determine which content to cache at remote radio heads and BBUs. This problem is formulated as an optimization problem which jointly incorporates backhaul and fronthaul loads and content caching. To solve this problem, an algorithm that combines the machine learning framework of echo state networks with sublinear algorithms is proposed. Using echo state networks (ESNs), the BBUs can predict each user's content request distribution and mobility pattern while having only limited information on the network's and user's state. In order to predict each user's periodic mobility pattern with minimal complexity, the memory capacity of the corresponding ESN is derived for a periodic input. This memory capacity is shown to be able to record the maximum amount of user information for the proposed ESN model. Then, a sublinear algorithm is proposed to determine which content to cache while using limited content request distribution samples. Simulation results using real data from Youku and the Beijing University of Posts and Telecommunications show that the proposed approach yields significant gains, in terms of sum effective capacity, that reach up to 27.8% and 30.7%, respectively, compared to random caching with clustering and random caching without clustering algorithm.Comment: Accepted in the IEEE Transactions on Wireless Communication

arXiv.org e-Print Archive

Estimating the weight of metric minimum spanning trees in sublinear time

Author: Artur Czumaj
Christian Sohler
Czumaj A.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 05/02/2008
Field of study

In this paper we present a sublinear-time

(1+\varepsilon)

-approximation randomized algorithm to estimate the weight of the minimum spanning tree of an

n

-point metric space. The running time of the algorithm is

\widetilde{\mathcal{O}}(n/\varepsilon^{\mathcal{O}(1)})

. Since the full description of an

n

-point metric space is of size

\Theta(n^2)

, the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as it is not possible to approximate in

o(n)

time the weight of the minimum spanning tree to within any factor. We also show that no deterministic algorithm can achieve a

B

-approximation in

o(n^2/B^3)

time. Furthermore, it has been previously shown that no

o(n^2)

algorithm exists that returns a spanning tree whose weight is within a constant times the optimum

CiteSeerX

Crossref

Warwick Research Archives Portal Repository

Probabilistic Spectral Sparsification In Sublinear Time

Author: Lee Yin Tat
Publication venue
Publication date: 30/12/2013
Field of study

In this paper, we introduce a variant of spectral sparsification, called probabilistic

(\varepsilon,\delta)

-spectral sparsification. Roughly speaking, it preserves the cut value of any cut

(S,S^{c})

with an

1\pm\varepsilon

multiplicative error and a

\delta\left|S\right|

additive error. We show how to produce a probabilistic

(\varepsilon,\delta)

-spectral sparsifier with

O(n\log n/\varepsilon^{2})

edges in time

\tilde{O}(n/\varepsilon^{2}\delta)

time for unweighted undirected graph. This gives fastest known sub-linear time algorithms for different cut problems on unweighted undirected graph such as - An

\tilde{O}(n/OPT+n^{3/2+t})

time

O(\sqrt{\log n/t})

-approximation algorithm for the sparsest cut problem and the balanced separator problem. - A

n^{1+o(1)}/\varepsilon^{4}

time approximation minimum s-t cut algorithm with an

\varepsilon n

additive error

arXiv.org e-Print Archive

CiteSeerX