356,190 research outputs found
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
Randomized Dynamic Mode Decomposition
This paper presents a randomized algorithm for computing the near-optimal
low-rank dynamic mode decomposition (DMD). Randomized algorithms are emerging
techniques to compute low-rank matrix approximations at a fraction of the cost
of deterministic algorithms, easing the computational challenges arising in the
area of `big data'. The idea is to derive a small matrix from the
high-dimensional data, which is then used to efficiently compute the dynamic
modes and eigenvalues. The algorithm is presented in a modular probabilistic
framework, and the approximation quality can be controlled via oversampling and
power iterations. The effectiveness of the resulting randomized DMD algorithm
is demonstrated on several benchmark examples of increasing complexity,
providing an accurate and efficient approach to extract spatiotemporal coherent
structures from big data in a framework that scales with the intrinsic rank of
the data, rather than the ambient measurement dimension. For this work we
assume that the dynamics of the problem under consideration is evolving on a
low-dimensional subspace that is well characterized by a fast decaying singular
value spectrum
Improved Heterogeneous Distance Functions
Instance-based learning techniques typically handle continuous and linear
input values well, but often do not handle nominal input attributes
appropriately. The Value Difference Metric (VDM) was designed to find
reasonable distance values between nominal attribute values, but it largely
ignores continuous attributes, requiring discretization to map continuous
values into nominal values. This paper proposes three new heterogeneous
distance functions, called the Heterogeneous Value Difference Metric (HVDM),
the Interpolated Value Difference Metric (IVDM), and the Windowed Value
Difference Metric (WVDM). These new distance functions are designed to handle
applications with nominal attributes, continuous attributes, or both. In
experiments on 48 applications the new distance metrics achieve higher
classification accuracy on average than three previous distance functions on
those datasets that have both nominal and continuous attributes.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
- …