17,895 research outputs found
BigFCM: Fast, Precise and Scalable FCM on Hadoop
Clustering plays an important role in mining big data both as a modeling
technique and a preprocessing step in many data mining process implementations.
Fuzzy clustering provides more flexibility than non-fuzzy methods by allowing
each data record to belong to more than one cluster to some degree. However, a
serious challenge in fuzzy clustering is the lack of scalability. Massive
datasets in emerging fields such as geosciences, biology and networking do
require parallel and distributed computations with high performance to solve
real-world problems. Although some clustering methods are already improved to
execute on big data platforms, but their execution time is highly increased for
large datasets. In this paper, a scalable Fuzzy C-Means (FCM) clustering named
BigFCM is proposed and designed for the Hadoop distributed data platform. Based
on the map-reduce programming model, it exploits several mechanisms including
an efficient caching design to achieve several orders of magnitude reduction in
execution time. Extensive evaluation over multi-gigabyte datasets shows that
BigFCM is scalable while it preserves the quality of clustering
Elastic Business Process Management: State of the Art and Open Challenges for BPM in the Cloud
With the advent of cloud computing, organizations are nowadays able to react
rapidly to changing demands for computational resources. Not only individual
applications can be hosted on virtual cloud infrastructures, but also complete
business processes. This allows the realization of so-called elastic processes,
i.e., processes which are carried out using elastic cloud resources. Despite
the manifold benefits of elastic processes, there is still a lack of solutions
supporting them.
In this paper, we identify the state of the art of elastic Business Process
Management with a focus on infrastructural challenges. We conceptualize an
architecture for an elastic Business Process Management System and discuss
existing work on scheduling, resource allocation, monitoring, decentralized
coordination, and state management for elastic processes. Furthermore, we
present two representative elastic Business Process Management Systems which
are intended to counter these challenges. Based on our findings, we identify
open issues and outline possible research directions for the realization of
elastic processes and elastic Business Process Management.Comment: Please cite as: S. Schulte, C. Janiesch, S. Venugopal, I. Weber, and
P. Hoenisch (2015). Elastic Business Process Management: State of the Art and
Open Challenges for BPM in the Cloud. Future Generation Computer Systems,
Volume NN, Number N, NN-NN., http://dx.doi.org/10.1016/j.future.2014.09.00
A Short Survey on Data Clustering Algorithms
With rapidly increasing data, clustering algorithms are important tools for
data analytics in modern research. They have been successfully applied to a
wide range of domains; for instance, bioinformatics, speech recognition, and
financial analysis. Formally speaking, given a set of data instances, a
clustering algorithm is expected to divide the set of data instances into the
subsets which maximize the intra-subset similarity and inter-subset
dissimilarity, where a similarity measure is defined beforehand. In this work,
the state-of-the-arts clustering algorithms are reviewed from design concept to
methodology; Different clustering paradigms are discussed. Advanced clustering
algorithms are also discussed. After that, the existing clustering evaluation
metrics are reviewed. A summary with future insights is provided at the end
- …