9,950 research outputs found
Algorithmic patterns for -matrices on many-core processors
In this work, we consider the reformulation of hierarchical ()
matrix algorithms for many-core processors with a model implementation on
graphics processing units (GPUs). matrices approximate specific
dense matrices, e.g., from discretized integral equations or kernel ridge
regression, leading to log-linear time complexity in dense matrix-vector
products. The parallelization of matrix operations on many-core
processors is difficult due to the complex nature of the underlying algorithms.
While previous algorithmic advances for many-core hardware focused on
accelerating existing matrix CPU implementations by many-core
processors, we here aim at totally relying on that processor type. As main
contribution, we introduce the necessary parallel algorithmic patterns allowing
to map the full matrix construction and the fast matrix-vector
product to many-core hardware. Here, crucial ingredients are space filling
curves, parallel tree traversal and batching of linear algebra operations. The
resulting model GPU implementation hmglib is the, to the best of the authors
knowledge, first entirely GPU-based Open Source matrix library of
this kind. We conclude this work by an in-depth performance analysis and a
comparative performance study against a standard matrix library,
highlighting profound speedups of our many-core parallel approach
Tensor-on-tensor regression
We propose a framework for the linear prediction of a multi-way array (i.e.,
a tensor) from another multi-way array of arbitrary dimension, using the
contracted tensor product. This framework generalizes several existing
approaches, including methods to predict a scalar outcome from a tensor, a
matrix from a matrix, or a tensor from a scalar. We describe an approach that
exploits the multiway structure of both the predictors and the outcomes by
restricting the coefficients to have reduced CP-rank. We propose a general and
efficient algorithm for penalized least-squares estimation, which allows for a
ridge (L_2) penalty on the coefficients. The objective is shown to give the
mode of a Bayesian posterior, which motivates a Gibbs sampling algorithm for
inference. We illustrate the approach with an application to facial image data.
An R package is available at https://github.com/lockEF/MultiwayRegression .Comment: 33 pages, 3 figure
Unconventional machine learning of genome-wide human cancer data
Recent advances in high-throughput genomic technologies coupled with
exponential increases in computer processing and memory have allowed us to
interrogate the complex aberrant molecular underpinnings of human disease from
a genome-wide perspective. While the deluge of genomic information is expected
to increase, a bottleneck in conventional high-performance computing is rapidly
approaching. Inspired in part by recent advances in physical quantum
processors, we evaluated several unconventional machine learning (ML)
strategies on actual human tumor data. Here we show for the first time the
efficacy of multiple annealing-based ML algorithms for classification of
high-dimensional, multi-omics human cancer data from the Cancer Genome Atlas.
To assess algorithm performance, we compared these classifiers to a variety of
standard ML methods. Our results indicate the feasibility of using
annealing-based ML to provide competitive classification of human cancer types
and associated molecular subtypes and superior performance with smaller
training datasets, thus providing compelling empirical evidence for the
potential future application of unconventional computing architectures in the
biomedical sciences
Signal and data processing for machine olfaction and chemical sensing: A review
Signal and data processing are essential elements in electronic noses as well as in most chemical sensing instruments. The multivariate responses obtained by chemical sensor arrays require signal and data processing to carry out the fundamental tasks of odor identification (classification), concentration estimation (regression), and grouping of similar odors (clustering). In the last decade, important advances have shown that proper processing can improve the robustness of the instruments against diverse perturbations, namely, environmental variables, background changes, drift, etc. This article reviews the advances made in recent years in signal and data processing for machine olfaction and chemical sensing
Reciprocity Calibration for Massive MIMO: Proposal, Modeling and Validation
This paper presents a mutual coupling based calibration method for
time-division-duplex massive MIMO systems, which enables downlink precoding
based on uplink channel estimates. The entire calibration procedure is carried
out solely at the base station (BS) side by sounding all BS antenna pairs. An
Expectation-Maximization (EM) algorithm is derived, which processes the
measured channels in order to estimate calibration coefficients. The EM
algorithm outperforms current state-of-the-art narrow-band calibration schemes
in a mean squared error (MSE) and sum-rate capacity sense. Like its
predecessors, the EM algorithm is general in the sense that it is not only
suitable to calibrate a co-located massive MIMO BS, but also very suitable for
calibrating multiple BSs in distributed MIMO systems.
The proposed method is validated with experimental evidence obtained from a
massive MIMO testbed. In addition, we address the estimated narrow-band
calibration coefficients as a stochastic process across frequency, and study
the subspace of this process based on measurement data. With the insights of
this study, we propose an estimator which exploits the structure of the process
in order to reduce the calibration error across frequency. A model for the
calibration error is also proposed based on the asymptotic properties of the
estimator, and is validated with measurement results.Comment: Submitted to IEEE Transactions on Wireless Communications,
21/Feb/201
A concept for a fuel efficient flight planning aid for general aviation
A core equation for estimation of fuel burn from path profile data was developed. This equation was used as a necessary ingredient in a dynamic program to define a fuel efficient flight path. The resultant algorithm is oriented toward use by general aviation. The pilot provides a description of the desired ground track, standard aircraft parameters, and weather at selected waypoints. The algorithm then derives the fuel efficient altitudes and velocities at the waypoints
autoAx: An Automatic Design Space Exploration and Circuit Building Methodology utilizing Libraries of Approximate Components
Approximate computing is an emerging paradigm for developing highly
energy-efficient computing systems such as various accelerators. In the
literature, many libraries of elementary approximate circuits have already been
proposed to simplify the design process of approximate accelerators. Because
these libraries contain from tens to thousands of approximate implementations
for a single arithmetic operation it is intractable to find an optimal
combination of approximate circuits in the library even for an application
consisting of a few operations. An open problem is "how to effectively combine
circuits from these libraries to construct complex approximate accelerators".
This paper proposes a novel methodology for searching, selecting and combining
the most suitable approximate circuits from a set of available libraries to
generate an approximate accelerator for a given application. To enable fast
design space generation and exploration, the methodology utilizes machine
learning techniques to create computational models estimating the overall
quality of processing and hardware cost without performing full synthesis at
the accelerator level. Using the methodology, we construct hundreds of
approximate accelerators (for a Sobel edge detector) showing different but
relevant tradeoffs between the quality of processing and hardware cost and
identify a corresponding Pareto-frontier. Furthermore, when searching for
approximate implementations of a generic Gaussian filter consisting of 17
arithmetic operations, the proposed approach allows us to identify
approximately highly important implementations from possible
solutions in a few hours, while the exhaustive search would take four months on
a high-end processor.Comment: Accepted for publication at the Design Automation Conference 2019
(DAC'19), Las Vegas, Nevada, US
Lecture notes on ridge regression
The linear regression model cannot be fitted to high-dimensional data, as the
high-dimensionality brings about empirical non-identifiability. Penalized
regression overcomes this non-identifiability by augmentation of the loss
function by a penalty (i.e. a function of regression coefficients). The ridge
penalty is the sum of squared regression coefficients, giving rise to ridge
regression. Here many aspect of ridge regression are reviewed e.g. moments,
mean squared error, its equivalence to constrained estimation, and its relation
to Bayesian regression. Finally, its behaviour and use are illustrated in
simulation and on omics data. Subsequently, ridge regression is generalized to
allow for a more general penalty. The ridge penalization framework is then
translated to logistic regression and its properties are shown to carry over.
To contrast ridge penalized estimation, the final chapter introduces its lasso
counterpart
- …