2 research outputs found
Bivariate, Cluster and Suitability Analysis of NoSQL Solutions for Different Application Areas
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for development
of an optimized solution to a specific real world problem, big data systems are
not an exception to any such rule. As far as the storage aspect of any big data
system is concerned, the primary facet in this regard is a storage
infrastructure and NoSQL is the right technology that fulfills its
requirements. However, every big data application has variable data
characteristics and thus, the corresponding data fits into a different data
model. Moreover, the requirements of different applications vary on the basis
of budget and functionality. This paper presents a feature analysis of 80 NoSQL
solutions, elaborating on the criteria and points that a developer must
consider while making a possible choice. Bivariate analysis of dataset created
for the identified NoSQL solutions was performed to establish relationship
between 9 features. Furthermore, cluster analysis of the dataset was used to
create categories of solutions to present a statistically supported
classification scheme. Finally, applications for different solutions were
reviewed and classified under domain-specific categories. Random forest
classification was used to determine the most relevant features for
applications and correspondingly a decision tree-based prediction model was
proposed, implemented and deployed in the form of a web application to
determine the suitability of a NoSQL solution for an application area.Comment: arXiv admin note: substantial text overlap with arXiv:1904.1149
An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations
Cloud Computing researches involve a tremendous amount of entities such as
users, applications, and virtual machines. Due to the limited access and often
variable availability of such resources, researchers have their prototypes
tested against the simulation environments, opposed to the real cloud
environments. Existing cloud simulation environments such as CloudSim and
EmuSim are executed sequentially, where a more advanced cloud simulation tool
could be created extending them, leveraging the latest technologies as well as
the availability of multi-core computers and the clusters in the research
laboratories. While computing has been evolving with multi-core programming,
MapReduce paradigms, and middleware platforms, cloud and MapReduce simulations
still fail to exploit these developments themselves. This research develops
Cloud2Sim, which tries to fill the gap between the simulations and the actual
technology that they are trying to simulate.
First, Cloud2Sim provides a concurrent and distributed cloud simulator, by
extending CloudSim cloud simulator, using Hazelcast in-memory key-value store.
Then, it also provides a quick assessment to MapReduce implementations of
Hazelcast and Infinispan, adaptively distributing the execution to a cluster,
providing means of simulating MapReduce executions. The dynamic scaler solution
scales out the cloud and MapReduce simulations to multiple nodes running
Hazelcast and Infinispan, based on load. The distributed execution model and
adaptive scaling solution could be leveraged as a general purpose auto scaler
middleware for a multi-tenanted deployment.Comment: Thesis to obtain the Master of Science Degree in Information Systems
and Computer Engineering, Instituto Superior Tecnico, Universidade de Lisboa.
2014 Septembe