14 research outputs found

    Cassandra File System Over Hadoop Distributed File System

    Get PDF
    Cassandra is an open source distributed database management system is designed to handle large amounts of data across many commodity servers, provides a high availability with no single point of failure. Cassandra will be offering the robust support for clusters spanning multiple data centers with asynchronous masterless replica which allow low latency operations for all the clients. N oSQL data stores target the unstructured data, which nature has dynamic and a key focus area for "Big Data" research. New generation data can prove costly and also unpractical to administer with databases SQL, due to lack of structure, high scalability and needs for the elasticity. N oSQL data stores such as MongoDB and Cassandra provide a desirable platform for fast and efficient for data queries. The Hadoop Distributed File System is one of many different components and projects contained within the community Hadoop ecosystem. The Apache Hadoop project defines Had oop - DFS as "the primary storage system which is used by Hadoop applications" that enables "reliable, extremely rapid computations". This paper was providing high - level overview of how Hadoop - styled analytics (MapReduce, Pig, Mahout and Hive) can be run on data contained in Apache Cassandra wit hout the need for Hadoop - DFS

    Impact de la réplication sur la latence dans les bases de données distribuées et application à Cassandra

    Get PDF
    International audienceMinimiser la latence est un enjeu dans les bases de données distribuées. La réplication permet, selon certaines conditions , d'obtenir des gains de performance. Nous étudions des bornes théoriques et proposons un algorithme distribué d'équilibrage de charge qui s'avère très compétitif pour la base de données distribuée NoSQL Cassandra

    PPS-ADS: A Framework for Privacy-Preserved and Secured Distributed System Architecture for Handling Big Data

    Get PDF
    The exponential expansion of Big Data in 7V’s (velocity, variety, veracity, value, variability and visualization) brings forth new challenges to security, reliability, availability and privacy of these data sets. Traditional security techniques and algorithms fail to complement this gigantic big data. This paper aims to improve the recently proposed Atrain Distributed System (ADS) by incorporating new features which will cater to the end-to-end availability and security aspects of the big data in the distributed system. The paper also integrates the concept of Software Defined Networking (SDN) in ADS to effectively control and manage the routing of the data item in the ADS. The storage of data items in the ADS is done on the basis of the type of data (structured or unstructured), the capacity of the distributed system (or coach) and the distance of coach from the pilot computer (PC). In order to maintain the consistency of data and to eradicate the possible loss of data, the concept of “forward positive” and “backward positive” acknowledgment is proposed. Furthermore, we have incorporated “Twofish” cryptographic technique to encrypt the big data in the ADS. Issues like “data ownership”, “data security, “data privacy” and data reliability” are pivotal while handling the big data. The current paper presents a framework for a privacy-preserved architecture for handling the big data in an effective manner

    Benchmarking Scalability of NoSQL Databases for Geospatial Queries

    Get PDF
    NoSQL databases provide an edge when it comes to dealing with big unstructured data. Flexibility, agility, and scalability offered by NoSQL databases become increasingly essential when dealing with geospatial data. The proliferation of geospatial applications has tremendously increased the variety, velocity, and volume of data that the data stores must manage. Such characteristics of big spatial data surpassed the capability and anticipated use cases of relational databases. Because we can choose from an extensive collection of NoSQL databases these days, it becomes vital for organizations to make an informed decision. NoSQL Database benchmarks provide system architects, who shoulder a considerable burden of selecting the right technology for their data stores, with a vital start point and source of information. The major utility of these benchmarks is reproducing experiments on similar experimental data that can verify and optimize the process of selecting an optimum tool for data management needs in the early phases of the development. The goal of this research is to develop a benchmark that can compare the performance of NoSQL databases for querying complex geospatial data. We have analyzed throughputs, latencies, and runtime of MongoDB and Couchbase to identify the correct fit for our use case. This way we have also demonstrated a systematic process that can be followed to make an optimum choice of datastore. This benchmark can be extended easily to any NoSQL database that supports geospatial querying

    Big Data in the Cloud: A Survey

    Get PDF
    Big Data has become a hot topic across several business areas requiring the storage and processing of huge volumes of data. Cloud computing leverages Big Data by providing high storage and processing capabilities and enables corporations to consume resources in a pay-as-you-go model making clouds the optimal environment for storing and processing huge quantities of data. By using virtualized resources, Cloud can scale very easily, be highly available and provide massive storage capacity and processing power. This paper surveys existing databases models to store and process Big Data within a Cloud environment. Particularly, we detail the following traditional NoSQL databases: BigTable, Cassandra, DynamoDB, HBase, Hypertable, and MongoDB. The MapReduce framework and its developments Apache Spark, HaLoop, Twister, and other alternatives such as Apache Giraph, GraphLab, Pregel and MapD - a novel platform that uses GPU processing to accelerate Big Data processing - are also analyzed. Finally, we present two case studies that demonstrate the successful use of Big Data within Cloud environments and the challenges that must be addressed in the future

    Big data and social media: A scientometrics analysis

    Get PDF
    The purpose of this research is to investigate the status and the evolution of the scientific studies for the effect of social networks on big data and usage of big data for modeling the social net-works users’ behavior. This paper presents a comprehensive review of the studies associated with big data in social media. The study uses Scopus database as a primary search engine and covers 2000 of highly cited articles over the period 2012-2019. The records are statistically analyzed and categorized in terms of different criteria. The findings show that researches have grown exponentially since 2014 and the trend has continued at relatively stable rates. Based on the survey, decision support systems is the keyword which has carried the highest densities followed by heuristics methods. Among the most cited articles, papers published by researchers in United States have received the highest citations (7548), followed by United Kingdom (588) and China with 543 citations. Thematic analysis shows that the subject nearly maintained an important and well-developed research field and for better results we can merge our research with “big data analytics” and “twitter” that are important topics in this field but not developed well

    Social media and e-commerce: A scientometrics analysis

    Get PDF
    he purpose of this research is to investigate the status and the evolution of the scientific studies on the effect of social networks on e-commerce. The study seeks to address the status of a set of scientific productions of researchers in the world indexed in Scopus based on scientometrics indicators. In total, 1926 articles were found and the collected data were analyzed using quantitative and qualitative indicators of scientometrics with bibliometrix R software package. The findings show that researches have grown exponentially since 2009 and the trend has continued at relatively stable rates. Thematic analysis shows that the subject had a significant but not well-developed research field. There is a high rate of cooperation with a rich research network among institutions in United States, European and Asian countries. Studies also show that research interest in this area is prevalent in developed countries. In addition, the lack of funds and complex analytical tools may be due to lack of studies in developing countries, especially in Africa. The study of the global trend of research through scientometrics helps managers and researchers in identifying countries and institutions with the greatest potential for scientific production, which allows them to develop their professions

    Big Data: an exploration of research, technologies and application cases

    Get PDF
    Big Data se ha convertido en una tendencia a nivel mundial y aunque aún no cuenta con un concepto científico o académico consensuado, se augura cada día mayor crecimiento del mercado que lo envuelve y de las áreas de investigación asociadas. En este artículo se reporta una exploración de literatura sobre Big Data, que comprende un estado del arte de las técnicas y tecnologías asociadas a Big Data, las cuales abarcan captura, procesamiento, análisis y visualización de datos. Se exploran también las características, fortalezas, debilidades y oportunidades de algunas aplicaciones y modelos que incluyen Big Data, principalmente para el soporte al modelado de datos, análisis y minería de datos. Asimismo, se introducen algunas de las tendencias futuras para el desarrollo de Big Data por medio de la definición de aspectos básicos, alcance e importancia de cada una. La metodología empleada para la exploración incluye la aplicación de dos estrategias, una primera corresponde a un análisis cienciométrico; y la segunda, una categorización de documentos por medio de una herramienta web de apoyo a los procesos de revisión literaria. Como resultados se obtiene una síntesis y conclusiones en torno a la temática y se plantean posibles escenarios para trabajos investigativos en el campo de dominio.Big Data has become a worldwide trend and although still lacks a scientific or academic consensual concept, every day it portends greater market growth that surrounds and the associated research areas. This paper reports a systematic review of the literature on Big Data considering a state of the art about techniques and technologies associated with Big Data, which include capture, processing, analysis and data visualization. The characteristics, strengths, weaknesses and opportunities for some applications and Big Data models that include support mainly for modeling, analysis, and data mining are explored. Likewise, some of the future trends for the development of Big Data are introduced by basic aspects, scope, and importance of each one. The methodology used for exploration involves the application of two strategies, the first corresponds to a scientometric analysis and the second corresponds to a categorization of documents through a web tool to support the process of literature review. As results, a summary and conclusions about the subject are generated and possible scenarios arise for research work in the field
    corecore