367,743 research outputs found
Convolutional Dictionary Learning through Tensor Factorization
Tensor methods have emerged as a powerful paradigm for consistent learning of
many latent variable models such as topic models, independent component
analysis and dictionary learning. Model parameters are estimated via CP
decomposition of the observed higher order input moments. However, in many
domains, additional invariances such as shift invariances exist, enforced via
models such as convolutional dictionary learning. In this paper, we develop
novel tensor decomposition algorithms for parameter estimation of convolutional
models. Our algorithm is based on the popular alternating least squares method,
but with efficient projections onto the space of stacked circulant matrices.
Our method is embarrassingly parallel and consists of simple operations such as
fast Fourier transforms and matrix multiplications. Our algorithm converges to
the dictionary much faster and more accurately compared to the alternating
minimization over filters and activation maps
Energy efficient clustering using the AMHC (adoptive multi-hop clustering) technique
IoT has gained fine attention in several field such as in industry applications, agriculture, monitoring, surveillance, similarly parallel growth has been observed in field of WSN. WSN is one of the primary component of IoT when it comes to sensing the data in various environment. Clustering is one of the basic approach in order to obtain the measurable performance in WSNs, Several algorithms of clustering aims to obtain the efficient data collection, data gathering and the routing. In this paper, a novel AMHC (Adaptive Multi-Hop Clustering) algorithm is proposed for the homogenous model, the main aim of algorithm is to obtain the higher efficiency and make it energy efficient. Our algorithm mainly contains the three stages: namely assembling, coupling and discarding. First stage involves the assembling of independent sets (maximum), second stage involves the coupling of independent sets and at last stage the superfluous nodes are discarded. Discarding superfluous nodes helps in achieving higher efficiency. Since our algorithm is a coloring algorithm, different color are used at the different stages for coloring the nodes. Afterwards our algorithm (AMHC) is compared with the existing system which is a combination of Second order data CC(Coupled Clustering) and Compressive-Projection PCA(Principal Component Analysis), and results shows that our algorithm excels in terms of several parameters such as energy efficiency, network lifetime, number of rounds performed
Hybrid implementation of the fastICA algorithm for high-density EEG using the capabilities of the Intel architecture and CUDA programming
High-density electroencephalographic (EEG) systems are utilized in the study of the human brain and its underlying behaviors. However, working with EEG data requires a well-cleaned signal, which is often achieved through the use of independent component analysis (ICA) methods. The calculation time for these types of algorithms is the longer the more data we have. This article presents a hybrid implementation of the fastICA algorithm that uses parallel programming techniques (libraries and extensions of the Intel processors and CUDA programming), which results in a significant acceleration of execution time on selected architectures
Dimensionality reduction using parallel ICA and its implementation on FPGA in hyperspectral image analysis
Hyperspectral images, although providing abundant information of the object, also bring high computational burden to data processing. This thesis studies the challenging problem of dimensionality reduction in Hyperspectral Image (HSI) analysis. Currently, there are two methods to reduce the dimension: band selection and feature extraction. This thesis presents a band selection technique based on Independent Component Analysis (ICA), an unsupervised signal separation algorithm. Given only the observations of hyperspectral images, the ICA –based band selection picks the independent bands which contain most of the spectral information of the original images. Due to the high volume of hyperspectral images, ICA -based band selection is a time consuming process. This thesis develops a parallel ICA algorithm which divides the decorrelation process into internal decorrelation and external decorrelation such that computation burden can be distributed from single processor to multiple processors, and the ICA process can be run in a parallel mode. Hardware implementation is always a faster and real -time solution to HSI analysis. Until now, there are few hardware designs for ICA -related processes. This thesis synthesizes the parallel ICA -based band selection on Field Programmable Gate Array (FPGA), which is the best choice for moderate designs and fast implementations. Compared to other design syntheses, the synthesis present in this thesis develops three ICA re-configurable components for the purpose of reusability. In addition, this thesis demonstrates the relationship between the design and the capacity utilization of a single FPGA, then discusses the features of High Performance Reconfigurable Computing (HPRC) to accomodate large capacity and design requirements. Experiments are conducted on three data sets obtained from different sources. Experimental results show the effectiveness of the proposed ICA -based band selection, parallel ICA and its synthesis on FPGA
Fast Local Computation Algorithms
For input , let denote the set of outputs that are the "legal"
answers for a computational problem . Suppose and members of are
so large that there is not time to read them in their entirety. We propose a
model of {\em local computation algorithms} which for a given input ,
support queries by a user to values of specified locations in a legal
output . When more than one legal output exists for a given
, the local computation algorithm should output in a way that is consistent
with at least one such . Local computation algorithms are intended to
distill the common features of several concepts that have appeared in various
algorithmic subfields, including local distributed computation, local
algorithms, locally decodable codes, and local reconstruction.
We develop a technique, based on known constructions of small sample spaces
of -wise independent random variables and Beck's analysis in his algorithmic
approach to the Lov{\'{a}}sz Local Lemma, which under certain conditions can be
applied to construct local computation algorithms that run in {\em
polylogarithmic} time and space. We apply this technique to maximal independent
set computations, scheduling radio network broadcasts, hypergraph coloring and
satisfying -SAT formulas.Comment: A preliminary version of this paper appeared in ICS 2011, pp. 223-23
Theoretically Efficient Parallel Graph Algorithms Can Be Fast and Scalable
There has been significant recent interest in parallel graph processing due
to the need to quickly analyze the large graphs available today. Many graph
codes have been designed for distributed memory or external memory. However,
today even the largest publicly-available real-world graph (the Hyperlink Web
graph with over 3.5 billion vertices and 128 billion edges) can fit in the
memory of a single commodity multicore server. Nevertheless, most experimental
work in the literature report results on much smaller graphs, and the ones for
the Hyperlink graph use distributed or external memory. Therefore, it is
natural to ask whether we can efficiently solve a broad class of graph problems
on this graph in memory.
This paper shows that theoretically-efficient parallel graph algorithms can
scale to the largest publicly-available graphs using a single machine with a
terabyte of RAM, processing them in minutes. We give implementations of
theoretically-efficient parallel algorithms for 20 important graph problems. We
also present the optimizations and techniques that we used in our
implementations, which were crucial in enabling us to process these large
graphs quickly. We show that the running times of our implementations
outperform existing state-of-the-art implementations on the largest real-world
graphs. For many of the problems that we consider, this is the first time they
have been solved on graphs at this scale. We have made the implementations
developed in this work publicly-available as the Graph-Based Benchmark Suite
(GBBS).Comment: This is the full version of the paper appearing in the ACM Symposium
on Parallelism in Algorithms and Architectures (SPAA), 201
Fast Distributed Approximation for Max-Cut
Finding a maximum cut is a fundamental task in many computational settings.
Surprisingly, it has been insufficiently studied in the classic distributed
settings, where vertices communicate by synchronously sending messages to their
neighbors according to the underlying graph, known as the or
models. We amend this by obtaining almost optimal
algorithms for Max-Cut on a wide class of graphs in these models. In
particular, for any , we develop randomized approximation
algorithms achieving a ratio of to the optimum for Max-Cut on
bipartite graphs in the model, and on general graphs in the
model.
We further present efficient deterministic algorithms, including a
-approximation for Max-Dicut in our models, thus improving the best known
(randomized) ratio of . Our algorithms make non-trivial use of the greedy
approach of Buchbinder et al. (SIAM Journal on Computing, 2015) for maximizing
an unconstrained (non-monotone) submodular function, which may be of
independent interest
- …