13,815 research outputs found

    Storage Solutions for Big Data Systems: A Qualitative Study and Comparison

    Full text link
    Big data systems development is full of challenges in view of the variety of application areas and domains that this technology promises to serve. Typically, fundamental design decisions involved in big data systems design include choosing appropriate storage and computing infrastructures. In this age of heterogeneous systems that integrate different technologies for optimized solution to a specific real world problem, big data system are not an exception to any such rule. As far as the storage aspect of any big data system is concerned, the primary facet in this regard is a storage infrastructure and NoSQL seems to be the right technology that fulfills its requirements. However, every big data application has variable data characteristics and thus, the corresponding data fits into a different data model. This paper presents feature and use case analysis and comparison of the four main data models namely document oriented, key value, graph and wide column. Moreover, a feature analysis of 80 NoSQL solutions has been provided, elaborating on the criteria and points that a developer must consider while making a possible choice. Typically, big data storage needs to communicate with the execution engine and other processing and visualization technologies to create a comprehensive solution. This brings forth second facet of big data storage, big data file formats, into picture. The second half of the research paper compares the advantages, shortcomings and possible use cases of available big data file formats for Hadoop, which is the foundation for most big data computing technologies. Decentralized storage and blockchain are seen as the next generation of big data storage and its challenges and future prospects have also been discussed

    Data as a Service (DaaS) for sharing and processing of large data collections in the cloud

    Get PDF
    Data as a Service (DaaS) is among the latest kind of services being investigated in the Cloud computing community. The main aim of DaaS is to overcome limitations of state-of-the-art approaches in data technologies, according to which data is stored and accessed from repositories whose location is known and is relevant for sharing and processing. Besides limitations for the data sharing, current approaches also do not achieve to fully separate/decouple software services from data and thus impose limitations in inter-operability. In this paper we propose a DaaS approach for intelligent sharing and processing of large data collections with the aim of abstracting the data location (by making it relevant to the needs of sharing and accessing) and to fully decouple the data and its processing. The aim of our approach is to build a Cloud computing platform, offering DaaS to support large communities of users that need to share, access, and process the data for collectively building knowledge from data. We exemplify the approach from large data collections from health and biology domains.Peer ReviewedPostprint (author's final draft

    Semantic reasoning on the edge of internet of things

    Get PDF
    Abstract. The Internet of Things (IoT) is a paradigm where physical objects are connected with each other with identifying, sensing, networking and processing capabilities over the Internet. Millions of new devices will be added into IoT network thus generating huge amount of data. How to represent, store, interconnect, search, and organize information generated by IoT devices become a challenge. Semantic technologies could play an important role by encoding meaning into data to enable a computer system to possess knowledge and reasoning. The vast amount of devices and data are also challenges. Edge Computing reduces both network latency and resource consumptions by deploying services and distributing computing tasks from the core network to the edge. We recognize four challenges from IoT systems. First the centralized server may generate long latency because of physical distances. Second concern is that the resource-constrained IoT devices have limited computing ability in processing heavy tasks. Third, the data generated by heterogeneous devices can hardly be understood and utilized by other devices or systems. Our research focuses on these challenges and provide a solution based on Edge computing and semantic technologies. We utilize Edge computing and semantic reasoning into IoT. Edge computing distributes tasks to the reasoning devices, which we call the Edge nodes. They are close to the terminal devices and provide services. The newly added resources could balance the workload of the systems and improve the computing capability. We annotate meaning into the data with Resource Description Framework thus providing an approach for heterogeneous machines to understand and utilize the data. We use semantic reasoning as a general purpose intelligent processing method. The thesis work focuses on studying semantic reasoning performance in IoT system with Edge computing paradigm. We develop an Edge based IoT system with semantic technologies. The system deploys semantic reasoning services on Edge nodes. Based on IoT system, we design five experiments to evaluate the performance of the integrated IoT system. We demonstrate how could the Edge computing paradigm facilitate IoT in terms of data transforming, semantic reasoning and service experience. We analyze how to improve the performance by properly distributing the task for Cloud and Edge nodes. The thesis work result shows that the Edge computing could improve the performance of the semantic reasoning in IoT

    MOSDEN: A Scalable Mobile Collaborative Platform for Opportunistic Sensing Applications

    Get PDF
    Mobile smartphones along with embedded sensors have become an efficient enabler for various mobile applications including opportunistic sensing. The hi-tech advances in smartphones are opening up a world of possibilities. This paper proposes a mobile collaborative platform called MOSDEN that enables and supports opportunistic sensing at run time. MOSDEN captures and shares sensor data across multiple apps, smartphones and users. MOSDEN supports the emerging trend of separating sensors from application-specific processing, storing and sharing. MOSDEN promotes reuse and re-purposing of sensor data hence reducing the efforts in developing novel opportunistic sensing applications. MOSDEN has been implemented on Android-based smartphones and tablets. Experimental evaluations validate the scalability and energy efficiency of MOSDEN and its suitability towards real world applications. The results of evaluation and lessons learned are presented and discussed in this paper.Comment: Accepted to be published in Transactions on Collaborative Computing, 2014. arXiv admin note: substantial text overlap with arXiv:1310.405

    Active data-centric framework for data protection in cloud environment

    Get PDF
    Cloud computing is an emerging evolutionary computing model that provides highly scalable services over highspeed Internet on a pay-as-usage model. However, cloud-based solutions still have not been widely deployed in some sensitive areas, such as banking and healthcare. The lack of widespread development is related to users&rsquo; concern that their confidential data or privacy would leak out in the cloud&rsquo;s outsourced environment. To address this problem, we propose a novel active data-centric framework to ultimately improve the transparency and accountability of actual usage of the users&rsquo; data in cloud. Our data-centric framework emphasizes &ldquo;active&rdquo; feature which packages the raw data with active properties that enforce data usage with active defending and protection capability. To achieve the active scheme, we devise the Triggerable Data File Structure (TDFS). Moreover, we employ the zero-knowledge proof scheme to verify the request&rsquo;s identification without revealing any vital information. Our experimental outcomes demonstrate the efficiency, dependability, and scalability of our framework.<br /

    On Evaluating Commercial Cloud Services: A Systematic Review

    Full text link
    Background: Cloud Computing is increasingly booming in industry with many competing providers and services. Accordingly, evaluation of commercial Cloud services is necessary. However, the existing evaluation studies are relatively chaotic. There exists tremendous confusion and gap between practices and theory about Cloud services evaluation. Aim: To facilitate relieving the aforementioned chaos, this work aims to synthesize the existing evaluation implementations to outline the state-of-the-practice and also identify research opportunities in Cloud services evaluation. Method: Based on a conceptual evaluation model comprising six steps, the Systematic Literature Review (SLR) method was employed to collect relevant evidence to investigate the Cloud services evaluation step by step. Results: This SLR identified 82 relevant evaluation studies. The overall data collected from these studies essentially represent the current practical landscape of implementing Cloud services evaluation, and in turn can be reused to facilitate future evaluation work. Conclusions: Evaluation of commercial Cloud services has become a world-wide research topic. Some of the findings of this SLR identify several research gaps in the area of Cloud services evaluation (e.g., the Elasticity and Security evaluation of commercial Cloud services could be a long-term challenge), while some other findings suggest the trend of applying commercial Cloud services (e.g., compared with PaaS, IaaS seems more suitable for customers and is particularly important in industry). This SLR study itself also confirms some previous experiences and reveals new Evidence-Based Software Engineering (EBSE) lessons
    • …
    corecore