2 research outputs found

    Bivariate, Cluster and Suitability Analysis of NoSQL Solutions for Different Application Areas

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for development of an optimized solution to a specific real world problem, big data systems are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL is the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. Moreover, the requirements of different applications vary on the basis of budget and functionality. This paper presents a feature analysis of 80 NoSQL solutions, elaborating on the criteria and points that a developer must consider while making a possible choice. Bivariate analysis of dataset created for the identified NoSQL solutions was performed to establish relationship between 9 features. Furthermore, cluster analysis of the dataset was used to create categories of solutions to present a statistically supported classification scheme. Finally, applications for different solutions were reviewed and classified under domain-specific categories. Random forest classification was used to determine the most relevant features for applications and correspondingly a decision tree-based prediction model was proposed, implemented and deployed in the form of a web application to determine the suitability of a NoSQL solution for an application area.Comment: arXiv admin note: substantial text overlap with arXiv:1904.1149

    An Elastic Middleware Platform for Concurrent and Distributed Cloud and MapReduce Simulations

    Full text link
    Cloud Computing researches involve a tremendous amount of entities such as users, applications, and virtual machines. Due to the limited access and often variable availability of such resources, researchers have their prototypes tested against the simulation environments, opposed to the real cloud environments. Existing cloud simulation environments such as CloudSim and EmuSim are executed sequentially, where a more advanced cloud simulation tool could be created extending them, leveraging the latest technologies as well as the availability of multi-core computers and the clusters in the research laboratories. While computing has been evolving with multi-core programming, MapReduce paradigms, and middleware platforms, cloud and MapReduce simulations still fail to exploit these developments themselves. This research develops Cloud2Sim, which tries to fill the gap between the simulations and the actual technology that they are trying to simulate. First, Cloud2Sim provides a concurrent and distributed cloud simulator, by extending CloudSim cloud simulator, using Hazelcast in-memory key-value store. Then, it also provides a quick assessment to MapReduce implementations of Hazelcast and Infinispan, adaptively distributing the execution to a cluster, providing means of simulating MapReduce executions. The dynamic scaler solution scales out the cloud and MapReduce simulations to multiple nodes running Hazelcast and Infinispan, based on load. The distributed execution model and adaptive scaling solution could be leveraged as a general purpose auto scaler middleware for a multi-tenanted deployment.Comment: Thesis to obtain the Master of Science Degree in Information Systems and Computer Engineering, Instituto Superior Tecnico, Universidade de Lisboa. 2014 Septembe
    corecore