4,033 research outputs found
Distributed Online Big Data Classification Using Context Information
Distributed, online data mining systems have emerged as a result of
applications requiring analysis of large amounts of correlated and
high-dimensional data produced by multiple distributed data sources. We propose
a distributed online data classification framework where data is gathered by
distributed data sources and processed by a heterogeneous set of distributed
learners which learn online, at run-time, how to classify the different data
streams either by using their locally available classification functions or by
helping each other by classifying each other's data. Importantly, since the
data is gathered at different locations, sending the data to another learner to
process incurs additional costs such as delays, and hence this will be only
beneficial if the benefits obtained from a better classification will exceed
the costs. We model the problem of joint classification by the distributed and
heterogeneous learners from multiple data sources as a distributed contextual
bandit problem where each data is characterized by a specific context. We
develop a distributed online learning algorithm for which we can prove
sublinear regret. Compared to prior work in distributed online data mining, our
work is the first to provide analytic regret results characterizing the
performance of the proposed algorithm
The Dual JL Transforms and Superfast Matrix Algorithms
We call a matrix algorithm superfast (aka running at sublinear cost) if it
involves much fewer flops and memory cells than the matrix has entries. Using
such algorithms is highly desired or even imperative in computations for Big
Data, which involve immense matrices and are quite typically reduced to solving
linear least squares problem and/or computation of low rank approximation of an
input matrix. The known algorithms for these problems are not superfast, but we
prove that their certain superfast modifications output reasonable or even
nearly optimal solutions for large input classes. We also propose, analyze, and
test a novel superfast algorithm for iterative refinement of any crude but
sufficiently close low rank approximation of a matrix. The results of our
numerical tests are in good accordance with our formal study.Comment: 36.1 pages, 5 figures, and 1 table. arXiv admin note: text overlap
with arXiv:1710.07946, arXiv:1906.0411
Lepskii Principle in Supervised Learning
In the setting of supervised learning using reproducing kernel methods, we
propose a data-dependent regularization parameter selection rule that is
adaptive to the unknown regularity of the target function and is optimal both
for the least-square (prediction) error and for the reproducing kernel Hilbert
space (reconstruction) norm error. It is based on a modified Lepskii balancing
principle using a varying family of norms
Echo State Networks for Proactive Caching in Cloud-Based Radio Access Networks with Mobile Users
In this paper, the problem of proactive caching is studied for cloud radio
access networks (CRANs). In the studied model, the baseband units (BBUs) can
predict the content request distribution and mobility pattern of each user,
determine which content to cache at remote radio heads and BBUs. This problem
is formulated as an optimization problem which jointly incorporates backhaul
and fronthaul loads and content caching. To solve this problem, an algorithm
that combines the machine learning framework of echo state networks with
sublinear algorithms is proposed. Using echo state networks (ESNs), the BBUs
can predict each user's content request distribution and mobility pattern while
having only limited information on the network's and user's state. In order to
predict each user's periodic mobility pattern with minimal complexity, the
memory capacity of the corresponding ESN is derived for a periodic input. This
memory capacity is shown to be able to record the maximum amount of user
information for the proposed ESN model. Then, a sublinear algorithm is proposed
to determine which content to cache while using limited content request
distribution samples. Simulation results using real data from Youku and the
Beijing University of Posts and Telecommunications show that the proposed
approach yields significant gains, in terms of sum effective capacity, that
reach up to 27.8% and 30.7%, respectively, compared to random caching with
clustering and random caching without clustering algorithm.Comment: Accepted in the IEEE Transactions on Wireless Communication
Estimating the weight of metric minimum spanning trees in sublinear time
In this paper we present a sublinear-time -approximation randomized algorithm to estimate the weight of the minimum spanning tree of an -point metric space. The running time of the algorithm is . Since the full description of an -point metric space is of size , the complexity of our algorithm is sublinear with respect to the input size. Our algorithm is almost optimal as it is not possible to approximate in time the weight of the minimum spanning tree to within any factor. We also show that no deterministic algorithm can achieve a -approximation in time. Furthermore, it has been previously shown that no algorithm exists that returns a spanning tree whose weight is within a constant times the optimum
Probabilistic Spectral Sparsification In Sublinear Time
In this paper, we introduce a variant of spectral sparsification, called
probabilistic -spectral sparsification. Roughly speaking,
it preserves the cut value of any cut with an
multiplicative error and a additive error. We show how
to produce a probabilistic -spectral sparsifier with
edges in time
time for unweighted undirected graph. This gives fastest known sub-linear time
algorithms for different cut problems on unweighted undirected graph such as
- An time -approximation
algorithm for the sparsest cut problem and the balanced separator problem.
- A time approximation minimum s-t cut algorithm
with an additive error
- âŠ