11,431 research outputs found

    A Comparative Study of Association Rule Mining Algorithms on Grid and Cloud Platform

    Full text link
    Association rule mining is a time consuming process due to involving both data intensive and computation intensive nature. In order to mine large volume of data and to enhance the scalability and performance of existing sequential association rule mining algorithms, parallel and distributed algorithms are developed. These traditional parallel and distributed algorithms are based on homogeneous platform and are not lucrative for heterogeneous platform such as grid and cloud. This requires design of new algorithms which address the issues of good data set partition and distribution, load balancing strategy, optimization of communication and synchronization technique among processors in such heterogeneous system. Grid and cloud are the emerging platform for distributed data processing and various association rule mining algorithms have been proposed on such platforms. This survey article integrates the brief architectural aspect of distributed system, various recent approaches of grid based and cloud based association rule mining algorithms with comparative perception. We differentiate between approaches of association rule mining algorithms developed on these architectures on the basis of data locality, programming paradigm, fault tolerance, communication cost, partition and distribution of data sets. Although it is not complete in order to cover all algorithms, yet it can be very useful for the new researchers working in the direction of distributed association rule mining algorithms.Comment: 8 pages, preprin

    IBBE-SGX: Cryptographic Group Access Control using Trusted Execution Environments

    Full text link
    While many cloud storage systems allow users to protect their data by making use of encryption, only few support collaborative editing on that data. A major challenge for enabling such collaboration is the need to enforce cryptographic access control policies in a secure and efficient manner. In this paper, we introduce IBBE-SGX, a new cryptographic access control extension that is efficient both in terms of computation and storage even when processing large and dynamic workloads of membership operations, while at the same time offering zero knowledge guarantees. IBBE-SGX builds upon Identity-Based Broadcasting Encryption (IBBE). We address IBBE's impracticality for cloud deployments by exploiting Intel Software Guard Extensions (SGX) to derive cuts in the computational complexity. Moreover, we propose a group partitioning mechanism such that the computational cost of membership update is bound to a fixed constant partition size rather than the size of the whole group. We have implemented and evaluated our new access control extension. Results highlight that IBBE-SGX performs membership changes 1.2 orders of magnitude faster than the traditional approach of Hybrid Encryption (HE), producing group metadata that are 6 orders of magnitude smaller than HE, while at the same time offering zero knowledge guarantees

    Consistency models in distributed systems: A survey on definitions, disciplines, challenges and applications

    Full text link
    The replication mechanism resolves some challenges with big data such as data durability, data access, and fault tolerance. Yet, replication itself gives birth to another challenge known as the consistency in distributed systems. Scalability and availability are the challenging criteria on which the replication is based upon in distributed systems which themselves require the consistency. Consistency in distributed computing systems has been employed in three different applicable fields, such as system architecture, distributed database, and distributed systems. Consistency models based on their applicability could be sorted from strong to weak. Our goal is to propose a novel viewpoint to different consistency models utilized in the distributed systems. This research proposes two different categories of consistency models. Initially, consistency models are categorized into three groups of data-centric, client-centric and hybrid models. Each of which is then grouped into three subcategories of traditional, extended, and novel consistency models. Consequently, the concepts and procedures are expressed in mathematical terms, which are introduced in order to present our models' behavior without implementation. Moreover, we have surveyed different aspects of challenges with respect to the consistency i.e., availability, scalability, security, fault tolerance, latency, violation, and staleness, out of which the two latter i.e. violation and staleness, play the most pivotal roles in terms of consistency and trade-off balancing. Finally, the contribution extent of each of the consistency models and the growing need for them in distributed systems are investigated.Comment: 52 pages, 13 figure

    Internet of Things: An Overview

    Full text link
    As technology proceeds and the number of smart devices continues to grow substantially, need for ubiquitous context-aware platforms that support interconnected, heterogeneous, and distributed network of devices has given rise to what is referred today as Internet-of-Things. However, paving the path for achieving aforementioned objectives and making the IoT paradigm more tangible requires integration and convergence of different knowledge and research domains, covering aspects from identification and communication to resource discovery and service integration. Through this chapter, we aim to highlight researches in topics including proposed architectures, security and privacy, network communication means and protocols, and eventually conclude by providing future directions and open challenges facing the IoT development.Comment: Keywords: Internet of Things; IoT; Web of Things; Cloud of Thing

    FogStore: Toward a Distributed Data Store for Fog Computing

    Full text link
    Stateful applications and virtualized network functions (VNFs) can benefit from state externalization to increase their reliability, scalability, and inter-operability. To keep and share the externalized state, distributed data stores (DDSs) are a powerful tool allowing for the management of classical trade-offs in consistency, availability and partitioning tolerance. With the advent of Fog and Edge Computing, stateful applications and VNFs are pushed from the data centers toward the network edge. This poses new challenges on DDSs that are tailored to a deployment in Cloud data centers. In this paper, we propose two novel design goals for DDSs that are tailored to Fog Computing: (1) Fog-aware replica placement, and (2) context-sensitive differential consistency. To realize those design goals on top of existing DDSs, we propose the FogStore system. FogStore manages the needed adaptations in replica placement and consistency management transparently, so that existing DDSs can be plugged into the system. To show the benefits of FogStore, we perform a set of evaluations using the Yahoo Cloud Serving Benchmark.Comment: To appear in Proceedings of 2017 IEEE Fog World Congress (FWC '17

    A Survey on Large Scale Metadata Server for Big Data Storage

    Full text link
    Big Data is defined as high volume of variety of data with an exponential data growth rate. Data are amalgamated to generate revenue, which results a large data silo. Data are the oils of modern IT industries. Therefore, the data are growing at an exponential pace. The access mechanism of these data silos are defined by metadata. The metadata are decoupled from data server for various beneficial reasons. For instance, ease of maintenance. The metadata are stored in metadata server (MDS). Therefore, the study on the MDS is mandatory in designing of a large scale storage system. The MDS requires many parameters to augment with its architecture. The architecture of MDS depends on the demand of the storage system's requirements. Thus, MDS is categorized in various ways depending on the underlying architecture and design methodology. The article surveys on the various kinds of MDS architecture, designs, and methodologies. This article emphasizes on clustered MDS (cMDS) and the reports are prepared based on a) Bloom filter−-based MDS, b) Client−-funded MDS, c) Geo−-aware MDS, d) Cache−-aware MDS, e) Load−-aware MDS, f) Hash−-based MDS, and g) Tree−-based MDS. Additionally, the article presents the issues and challenges of MDS for mammoth sized data.Comment: Submitted to ACM for possible publicatio

    A Security Reference Architecture for Blockchains

    Full text link
    Due to their interesting features, blockchains have become popular in recent years. They are full-stack systems where security is a critical factor for their success. The main focus of this work is to systematize knowledge about security and privacy issues of blockchains. To this end, we propose a security reference architecture based on models that demonstrate the stacked hierarchy of various threats (similar to the ISO/OSI hierarchy) as well as threat-risk assessment using ISO/IEC 15408. In contrast to the previous surveys, we focus on the categorization of security incidents based on their origins and using the proposed architecture we present existing prevention and mitigation techniques. The scope of our work mainly covers aspects related to the decentralized nature of blockchains, while we mention common operational security issues and countermeasures only tangentially

    The CTTC 5G end-to-end experimental platform: Integrating heterogeneous wireless/optical networks, distributed cloud, and IoT devices

    Full text link
    The Internet of Things (IoT) will facilitate a wide variety of applications in different domains, such as smart cities, smart grids, industrial automation (Industry 4.0), smart driving, assistance of the elderly, and home automation. Billions of heterogeneous smart devices with different application requirements will be connected to the networks and will generate huge aggregated volumes of data that will be processed in distributed cloud infrastructures. On the other hand, there is also a general trend to deploy functions as software (SW) instances in cloud infrastructures [e.g., network function virtualization (NFV) or mobile edge computing (MEC)]. Thus, the next generation of mobile networks, the fifth-generation (5G), will need not only to develop new radio interfaces or waveforms to cope with the expected traffic growth but also to integrate heterogeneous networks from end to end (E2E) with distributed cloud resources to deliver E2E IoT and mobile services. This article presents the E2E 5G platform that is being developed by the Centre Tecnol\`ogic de Telecomunicacions de Catalunya (CTTC), the first known platform capable of reproducing such an ambitious scenario

    Pilot-Data: An Abstraction for Distributed Data

    Full text link
    Scientific problems that depend on processing large amounts of data require overcoming challenges in multiple areas: managing large-scale data distribution, controlling co-placement and scheduling of data with compute resources, and storing, transferring, and managing large volumes of data. Although there exist multiple approaches to addressing each of these challenges, an integrative approach is missing; furthermore, extending existing functionality or enabling interoperable capabilities remains difficult at best. We propose the concept of Pilot-Data to address the fundamental challenges of co-placement and scheduling of data and compute in heterogeneous and distributed environments with interoperability and extensibility as first-order concerns. Pilot-Data is an extension of the Pilot-Job abstraction for supporting the management of data in conjunction with compute tasks. Pilot-Data separates logical data units from physical storage, thereby providing the basis for efficient compute/data placement and scheduling. In this paper, we discuss the design and implementation of the Pilot-Data prototype, demonstrate its use by data-intensive applications on multiple production distributed cyberinfrastructure and illustrate the advantages arising from flexible execution modes enabled by Pilot-Data. Our experiments utilize an implementation of Pilot-Data in conjunction with a scalable Pilot-Job (BigJob) to establish the application performance that can be enabled by the use of Pilot-Data. We demonstrate how the concept of Pilot-Data also provides the basis upon which to build tools and support capabilities like affinity which in turn can be used for advanced data-compute co-placement and scheduling

    New Trends in Parallel and Distributed Simulation: from Many-Cores to Cloud Computing

    Full text link
    Recent advances in computing architectures and networking are bringing parallel computing systems to the masses so increasing the number of potential users of these kinds of systems. In particular, two important technological evolutions are happening at the ends of the computing spectrum: at the "small" scale, processors now include an increasing number of independent execution units (cores), at the point that a mere CPU can be considered a parallel shared-memory computer; at the "large" scale, the Cloud Computing paradigm allows applications to scale by offering resources from a large pool on a pay-as-you-go model. Multi-core processors and Clouds both require applications to be suitably modified to take advantage of the features they provide. In this paper, we analyze the state of the art of parallel and distributed simulation techniques, and assess their applicability to multi-core architectures or Clouds. It turns out that most of the current approaches exhibit limitations in terms of usability and adaptivity which may hinder their application to these new computing architectures. We propose an adaptive simulation mechanism, based on the multi-agent system paradigm, to partially address some of those limitations. While it is unlikely that a single approach will work well on both settings above, we argue that the proposed adaptive mechanism has useful features which make it attractive both in a multi-core processor and in a Cloud system. These features include the ability to reduce communication costs by migrating simulation components, and the support for adding (or removing) nodes to the execution architecture at runtime. We will also show that, with the help of an additional support layer, parallel and distributed simulations can be executed on top of unreliable resources.Comment: Simulation Modelling Practice and Theory (SIMPAT), Elsevier, vol. 49 (December 2014
    • …
    corecore