    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Open Source Platforms for Big Data Analytics

    O conceito de Big Data tem tido um grande impacto no campo da tecnologia, em particular na gestão e análise de enormes volumes de informação. Atualmente, as organizações consideram o Big Data como uma oportunidade para gerir e explorar os seus dados o máximo possível, com o objetivo de apoiar as suas decisões dentro das diferentes áreas operacionais. Assim, é necessário analisar vários conceitos sobre o Big Data e o Big Data Analytics, incluindo definições, características, vantagens e desafios. As ferramentas de Business Intelligence (BI), juntamente com a geração de conhecimento, são conceitos fundamentais para o processo de tomada de decisão e transformação da informação. Ao investigar as plataformas de Big Data, as práticas industriais atuais e as tendências relacionadas com o mundo da investigação, é possível entender o impacto do Big Data Analytics nas pequenas organizações. Este trabalho pretende propor soluções para as micro, pequenas ou médias empresas (PME) que têm um grande impacto na economia portuguesa, dado que representam a maioria do tecido empresarial. As plataformas de código aberto para o Big Data Analytics oferecem uma grande oportunidade de inovação nas PMEs. Este trabalho de pesquisa apresenta uma análise comparativa das funcionalidades e características das plataformas e os passos a serem tomados para uma análise mais profunda e comparativa. Após a análise comparativa, apresentamos uma avaliação e seleção de plataformas Big Data Analytics (BDA) usando e adaptando a metodologia QSOS (Qualification and Selection of software Open Source) para qualificação e seleção de software open-source. O resultado desta avaliação e seleção traduziu-se na eleição de duas plataformas para os testes experimentais. Nas plataformas de software livre de BDA foi usado o mesmo conjunto de dados assim como a mesma configuração de hardware e software. Na comparação das duas plataformas, demonstrou que a HPCC Systems Platform é mais eficiente e confiável que a Hortonworks Data Platform. Em particular, as PME portuguesas devem considerar as plataformas BDA como uma oportunidade de obter vantagem competitiva e melhorar os seus processos e, consequentemente, definir uma estratégia de TI e de negócio. Por fim, este é um trabalho sobre Big Data, que se espera que sirva como um convite e motivação para novos trabalhos de investigação.The concept of Big Data has been having a great impact in the field of technology, particularly in the management and analysis of huge volumes of information. Nowadays organizations look for Big Data as an opportunity to manage and explore their data the maximum they can, with the objective of support decisions within its different operational areas. Thus, it is necessary to analyse several concepts about Big Data and Big Data Analytics, including definitions, features, advantages and disadvantages. Business intelligence along with the generation of knowledge are fundamental concepts for the process of decision-making and transformation of information. By investigate today's big data platforms, current industrial practices and related trends in the research world, it is possible to understand the impact of Big Data Analytics on small organizations. This research intends to propose solutions for micro, small or médium enterprises (SMEs) that have a great impact on the Portuguese economy since they represente approximately 90% of the companies in Portugal. The open source platforms for Big Data Analytics offers a great opportunity for SMEs. This research work presents a comparative analysis of those platforms features and functionalities and the steps that will be taken for a more profound and comparative analysis. After the comparative analysis, we present an evaluation and selection of Big Data Analytics (BDA) platforms using and adapting the Qualification and Selection of software Open Source (QSOS) method. The result of this evaluation and selection was the selection of two platforms for the empirical experiment and tests. The same testbed and dataset was used in the two Open Source Big Data Analytics platforms. When comparing two BDA platforms, HPCC Systems Platform is found to be more efficient and reliable than Hortonworks Data Platform. In particular, Portuguese SMEs should consider for BDA platforms an opportunity to obtain competitive advantage and improve their processes and consequently define an IT and business strategy. Finally, this is a research work on Big Data; it is hoped that this will serve as an invitation and motivation for new research

    Big Data in the Cloud: A Survey

    Big Data has become a hot topic across several business areas requiring the storage and processing of huge volumes of data. Cloud computing leverages Big Data by providing high storage and processing capabilities and enables corporations to consume resources in a pay-as-you-go model making clouds the optimal environment for storing and processing huge quantities of data. By using virtualized resources, Cloud can scale very easily, be highly available and provide massive storage capacity and processing power. This paper surveys existing databases models to store and process Big Data within a Cloud environment. Particularly, we detail the following traditional NoSQL databases: BigTable, Cassandra, DynamoDB, HBase, Hypertable, and MongoDB. The MapReduce framework and its developments Apache Spark, HaLoop, Twister, and other alternatives such as Apache Giraph, GraphLab, Pregel and MapD - a novel platform that uses GPU processing to accelerate Big Data processing - are also analyzed. Finally, we present two case studies that demonstrate the successful use of Big Data within Cloud environments and the challenges that must be addressed in the future