1 research outputs found
Bivariate, Cluster and Suitability Analysis of NoSQL Solutions for Different Application Areas
Big data systems development is full of challenges in view of the variety of
application areas and domains that this technology promises to serve.
Typically, fundamental design decisions involved in big data systems design
include choosing appropriate storage and computing infrastructures. In this age
of heterogeneous systems that integrate different technologies for development
of an optimized solution to a specific real world problem, big data systems are
not an exception to any such rule. As far as the storage aspect of any big data
system is concerned, the primary facet in this regard is a storage
infrastructure and NoSQL is the right technology that fulfills its
requirements. However, every big data application has variable data
characteristics and thus, the corresponding data fits into a different data
model. Moreover, the requirements of different applications vary on the basis
of budget and functionality. This paper presents a feature analysis of 80 NoSQL
solutions, elaborating on the criteria and points that a developer must
consider while making a possible choice. Bivariate analysis of dataset created
for the identified NoSQL solutions was performed to establish relationship
between 9 features. Furthermore, cluster analysis of the dataset was used to
create categories of solutions to present a statistically supported
classification scheme. Finally, applications for different solutions were
reviewed and classified under domain-specific categories. Random forest
classification was used to determine the most relevant features for
applications and correspondingly a decision tree-based prediction model was
proposed, implemented and deployed in the form of a web application to
determine the suitability of a NoSQL solution for an application area.Comment: arXiv admin note: substantial text overlap with arXiv:1904.1149