557 research outputs found
Scalable Scientific Computing Algorithms Using MapReduce
Cloud computing systems, like MapReduce and Pregel, provide a scalable and fault tolerant environment for running computations at massive scale. However, these systems are designed primarily for data intensive computational tasks, while a large class of problems in
scientific computing and business analytics are computationally intensive (i.e., they require a lot of CPU in addition to I/O). In this thesis, we investigate the use of cloud computing systems, in particular MapReduce, for computationally intensive problems, focusing on two classic problems that arise in scienti c computing and also in analytics: maximum clique and matrix inversion.
The key contribution that enables us to e ectively use MapReduce to solve the maximum clique problem on dense graphs is a recursive partitioning method that partitions the graph into several subgraphs of similar size and running time complexity. After partitioning, the maximum cliques of the di erent partitions can be computed independently, and the computation is sped up using a branch and bound method. Our experiments show that our approach leads to good scalability, which is unachievable by other partitioning methods since they result in partitions of di erent sizes and hence lead to load imbalance. Our method is more scalable than an MPI algorithm, and is simpler and more fault tolerant.
For the matrix inversion problem, we show that a recursive block LU decomposition allows us to e ectively compute in parallel both the lower triangular (L) and upper triangular
(U) matrices using MapReduce. After computing the L and U matrices, their inverses are computed using MapReduce. The inverse of the original matrix, which is the product
of the inverses of the L and U matrices, is also obtained using MapReduce. Our technique is the rst matrix inversion technique that uses MapReduce. We show experimentally that our technique has good scalability, and it is simpler and more fault tolerant than MPI implementations such as ScaLAPACK
MLI: An API for Distributed Machine Learning
MLI is an Application Programming Interface designed to address the
challenges of building Machine Learn- ing algorithms in a distributed setting
based on data-centric computing. Its primary goal is to simplify the
development of high-performance, scalable, distributed algorithms. Our initial
results show that, relative to existing systems, this interface can be used to
build distributed implementations of a wide variety of common Machine Learning
algorithms with minimal complexity and highly competitive performance and
scalability
A new processing approach for reducing computational complexity in cloud-RAN mobile networks
Cloud computing is considered as one of the key drivers for the next generation of mobile
networks (e.g. 5G). This is combined with the dramatic expansion in mobile networks, involving millions
(or even billions) of subscribers with a greater number of current and future mobile applications
(e.g. IoT). Cloud Radio Access Network (C-RAN) architecture has been proposed as a novel concept to
gain the benefits of cloud computing as an efficient computing resource, to meet the requirements of future
cellular networks. However, the computational complexity of obtaining the channel state information in
the full-centralized C-RAN increases as the size of the network is scaled up, as a result of enlargement in
channel information matrices. To tackle this problem of complexity and latency, MapReduce framework
and fast matrix algorithms are proposed. This paper presents two levels of complexity reduction in the
process of estimating the channel information in cellular networks. The results illustrate that complexity
can be minimized from O(N3) to O((N/k)3), where N is the total number of RRHs and k is the number of
RRHs per group, by dividing the processing of RRHs into parallel groups and harnessing the MapReduce
parallel algorithm in order to process them. The second approach reduces the computation complexity
from O((N/k)3) to O((N/k)2:807) using the algorithms of fast matrix inversion. The reduction in complexity
and latency leads to a significant improvement in both the estimation time and in the scalability of
C-RAN networks
Evolving Large-Scale Data Stream Analytics based on Scalable PANFIS
Many distributed machine learning frameworks have recently been built to
speed up the large-scale data learning process. However, most distributed
machine learning used in these frameworks still uses an offline algorithm model
which cannot cope with the data stream problems. In fact, large-scale data are
mostly generated by the non-stationary data stream where its pattern evolves
over time. To address this problem, we propose a novel Evolving Large-scale
Data Stream Analytics framework based on a Scalable Parsimonious Network based
on Fuzzy Inference System (Scalable PANFIS), where the PANFIS evolving
algorithm is distributed over the worker nodes in the cloud to learn
large-scale data stream. Scalable PANFIS framework incorporates the active
learning (AL) strategy and two model fusion methods. The AL accelerates the
distributed learning process to generate an initial evolving large-scale data
stream model (initial model), whereas the two model fusion methods aggregate an
initial model to generate the final model. The final model represents the
update of current large-scale data knowledge which can be used to infer future
data. Extensive experiments on this framework are validated by measuring the
accuracy and running time of four combinations of Scalable PANFIS and other
Spark-based built in algorithms. The results indicate that Scalable PANFIS with
AL improves the training time to be almost two times faster than Scalable
PANFIS without AL. The results also show both rule merging and the voting
mechanisms yield similar accuracy in general among Scalable PANFIS algorithms
and they are generally better than Spark-based algorithms. In terms of running
time, the Scalable PANFIS training time outperforms all Spark-based algorithms
when classifying numerous benchmark datasets.Comment: 20 pages, 5 figure
- …