3,575 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    TailoredRE: A Personalized Cloud-based Traffic Redundancy Elimination for Smartphones

    Get PDF
    The exceptional rise in usages of mobile devices such as smartphones and tablets has contributed to a massive increase in wireless network trac both Cellular (3G/4G/LTE) and WiFi. The unprecedented growth in wireless network trac not only strain the battery of the mobile devices but also bogs down the last-hop wireless access links. Interestingly, a signicant part of this data trac exhibits high level of redundancy in them due to repeated access of popular contents in the web. Hence, a good amount of research both in academia and in industries has studied, analyzed and designed diverse systems that attempt to eliminate redundancy in the network trac. Several of the existing Trac Redundancy Elimination (TRE) solutions either does not improve last-hop wireless access links or involves inecient use of compute resources from resource-constrained mobile devices. In this research, we propose TailoredRE, a personalized cloud-based trac redundancy elimination system. The main objective of TailoredRE is to tailor TRE mechanism such that TRE is performed against selected applications rather than application agnostically, thus improving eciency by avoiding caching of unnecessary data chunks. In our system, we leverage the rich resources of the cloud to conduct TRE by ooading most of the operational cost from the smartphones or mobile devices to its clones (proxies) available in the cloud. We cluster the multiple individual user clones in the cloud based on the factors of connectedness among users such as usage of similar applications, common interests in specic web contents etc., to improve the eciency of caching in the cloud. This thesis encompasses motivation, system design along with detailed analysis of the results obtained through simulation and real implementation of TailoredRE system

    Exploiting Data Mining Techniques for Broadcasting Data in Mobile Computing Environments

    Get PDF
    Cataloged from PDF version of article.Mobile computers can be equipped with wireless communication devices that enable users to access data services from any location. In wireless communication, the server-to-client (downlink) communication bandwidth is much higher than the client-to-server (uplink) communication bandwidth. This asymmetry makes the dissemination of data to client machines a desirable approach. However, dissemination of data by broadcasting may induce high access latency in case the number of broadcast data items is large. In this paper, we propose two methods aiming to reduce client access latency of broadcast data. Our methods are based on analyzing the broadcast history (i.e., the chronological sequence of items that have been requested by clients) using data mining techniques. With the first method, the data items in the broadcast disk are organized in such a way that the items requested subsequently are placed close to each other. The second method focuses on improving the cache hit ratio to be able to decrease the access latency. It enables clients to prefetch the data from the broadcast disk based on the rules extracted from previous data request patterns. The proposed methods are implemented on a Web log to estimate their effectiveness. It is shown through performance experiments that the proposed rule-based methods are effective in improving the system performance in terms of the average latency as well as the cache hit ratio of mobile clients
    corecore