2,055 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Enhancing Scalability and Reliability in Semi-Decentralized Federated Learning With Blockchain: Trust Penalization and Asynchronous Functionality

    Full text link
    The paper presents an innovative approach to address the challenges of scalability and reliability in Distributed Federated Learning by leveraging the integration of blockchain technology. The paper focuses on enhancing the trustworthiness of participating nodes through a trust penalization mechanism while also enabling asynchronous functionality for efficient and robust model updates. By combining Semi-Decentralized Federated Learning with Blockchain (SDFL-B), the proposed system aims to create a fair, secure and transparent environment for collaborative machine learning without compromising data privacy. The research presents a comprehensive system architecture, methodologies, experimental results, and discussions that demonstrate the advantages of this novel approach in fostering scalable and reliable SDFL-B systems.Comment: To appear in 2023 IEEE Ubiquitous Computing, Electronics & Mobile Communication Conference (IEEE UEMCON

    CICM: A Collaborative Integrity Checking Blockchain Consensus Mechanism for Preserving the Originality of Data the Cloud for Forensic Investigation

    Get PDF
    The originality of data is very important for achieving correct results from forensic analysis of data for resolving the issue. Data may be analysed to resolve disputes or review issues by finding trends in the dataset that can give clues to the cause of the issue. Specially designed foolproof protection for data integrity is required for forensic purposes. Collaborative Integrity Checking Mechanism (CICM), for securing the chain-of-custody of data in a blockchain is proposed in this paper. Existing consensus mechanisms are fault-tolerant, allowing a threshold for faults. CICM avoids faults by using a transparent 100% agreement process for validating the originality of data in a blockchain. A group of agreement actors check and record the original status of data at its time of arrival. Acceptance is based on general agreement by all the participants in the consensus process. The solution was tested against practical byzantine fault tolerant (PBFT), Zyzzyva, and hybrid byzantine fault tolerant (hBFT) mechanisms for efficacy to yield correct results and operational performance costs. Binomial distribution was used to examine the CICM efficacy. CICM recorded zero probability of failure while the benchmarks recorded up to 8.44%. Throughput and latency were used to test its operational performance costs. The hBFT recorded the best performance among the benchmarks. CICM achieved 30.61% higher throughput and 21.47% lower latency than hBFT. In the robustness against faults tests, CICM performed better than hBFT with 16.5% higher throughput and 14.93% lower latency than the hBFT in the worst-case fault scenario

    HEC: Collaborative Research: SAM^2 Toolkit: Scalable and Adaptive Metadata Management for High-End Computing

    Get PDF
    The increasing demand for Exa-byte-scale storage capacity by high end computing applications requires a higher level of scalability and dependability than that provided by current file and storage systems. The proposal deals with file systems research for metadata management of scalable cluster-based parallel and distributed file storage systems in the HEC environment. It aims to develop a scalable and adaptive metadata management (SAM2) toolkit to extend features of and fully leverage the peak performance promised by state-of-the-art cluster-based parallel and distributed file storage systems used by the high performance computing community. There is a large body of research on data movement and management scaling, however, the need to scale up the attributes of cluster-based file systems and I/O, that is, metadata, has been underestimated. An understanding of the characteristics of metadata traffic, and an application of proper load-balancing, caching, prefetching and grouping mechanisms to perform metadata management correspondingly, will lead to a high scalability. It is anticipated that by appropriately plugging the scalable and adaptive metadata management components into the state-of-the-art cluster-based parallel and distributed file storage systems one could potentially increase the performance of applications and file systems, and help translate the promise and potential of high peak performance of such systems to real application performance improvements. The project involves the following components: 1. Develop multi-variable forecasting models to analyze and predict file metadata access patterns. 2. Develop scalable and adaptive file name mapping schemes using the duplicative Bloom filter array technique to enforce load balance and increase scalability 3. Develop decentralized, locality-aware metadata grouping schemes to facilitate the bulk metadata operations such as prefetching. 4. Develop an adaptive cache coherence protocol using a distributed shared object model for client-side and server-side metadata caching. 5. Prototype the SAM2 components into the state-of-the-art parallel virtual file system PVFS2 and a distributed storage data caching system, set up an experimental framework for a DOE CMS Tier 2 site at University of Nebraska-Lincoln and conduct benchmark, evaluation and validation studies

    Overview of Caching Mechanisms to Improve Hadoop Performance

    Full text link
    Nowadays distributed computing environments, large amounts of data are generated from different resources with a high velocity, rendering the data difficult to capture, manage, and process within existing relational databases. Hadoop is a tool to store and process large datasets in a parallel manner across a cluster of machines in a distributed environment. Hadoop brings many benefits like flexibility, scalability, and high fault tolerance; however, it faces some challenges in terms of data access time, I/O operation, and duplicate computations resulting in extra overhead, resource wastage, and poor performance. Many researchers have utilized caching mechanisms to tackle these challenges. For example, they have presented approaches to improve data access time, enhance data locality rate, remove repetitive calculations, reduce the number of I/O operations, decrease the job execution time, and increase resource efficiency. In the current study, we provide a comprehensive overview of caching strategies to improve Hadoop performance. Additionally, a novel classification is introduced based on cache utilization. Using this classification, we analyze the impact on Hadoop performance and discuss the advantages and disadvantages of each group. Finally, a novel hybrid approach called Hybrid Intelligent Cache (HIC) that combines the benefits of two methods from different groups, H-SVM-LRU and CLQLMRS, is presented. Experimental results show that our hybrid method achieves an average improvement of 31.2% in job execution time

    D^2PS: A Dependable Data Provisioning Service in Multi-tenant Cloud Environment

    Get PDF
    Software as a Service (SaaS) is a software delivery and business model widely used by Cloud computing. Instead of purchasing and maintaining a software suite permanently, customers only need to lease the software on-demand. The domain of high assurance distributed systems has focused greatly on the areas of fault tolerance and dependability. In a multi-tenant context, it is particularly important to store, manage and provision data services to customers in a highly efficient and dependable manner due to a large number of file operations involved in running such services. It is also desirable to allow a user group to share and cooperate (e.g., co-edit) on some specific data. In this paper we present a dependable data provisioning service in a multi-tenant Cloud environment. We describe a metadata management approach and leverage multiple replicated metadata caching to shorten the file access time, with the improved efficiency of data sharing. In order to reduce frequent data transmission and data access latency, we introduce a distributed cooperative disk cache mechanism that supports effective cache placement and pull-push cache synchronization. In addition, we use efficient component failover to enhance the service dependability whilst avoiding negative impact from system failures. Our experimental results show that our system can significantly reduce both unused data transmission and response latency. Specifically, over 50% network transmission and operational latency can be saved for random reads while 28.24% network traffic and 25% response latency can be reduced for random write operations. We believe that these findings are demonstrating positive results along the right direction of resolving storage-related challenges in a multi-tenant Cloud environment

    3rd EGEE User Forum

    Get PDF
    We have organized this book in a sequence of chapters, each chapter associated with an application or technical theme introduced by an overview of the contents, and a summary of the main conclusions coming from the Forum for the chapter topic. The first chapter gathers all the plenary session keynote addresses, and following this there is a sequence of chapters covering the application flavoured sessions. These are followed by chapters with the flavour of Computer Science and Grid Technology. The final chapter covers the important number of practical demonstrations and posters exhibited at the Forum. Much of the work presented has a direct link to specific areas of Science, and so we have created a Science Index, presented below. In addition, at the end of this book, we provide a complete list of the institutes and countries involved in the User Forum

    BECA: A Blockchain-Based Edge Computing Architecture for Internet of Things Systems

    Get PDF
    The scale of Internet of Things (IoT) systems has expanded in recent times and, in tandem with this, IoT solutions have developed symbiotic relationships with technologies, such as edge Computing. IoT has leveraged edge computing capabilities to improve the capabilities of IoT solutions, such as facilitating quick data retrieval, low latency response, and advanced computation, among others. However, in contrast with the benefits offered by edge computing capabilities, there are several detractors, such as centralized data storage, data ownership, privacy, data auditability, and security, which concern the IoT community. This study leveraged blockchain’s inherent capabilities, including distributed storage system, non-repudiation, privacy, security, and immutability, to provide a novel, advanced edge computing architecture for IoT systems. Specifically, this blockchain-based edge computing architecture addressed centralized data storage, data auditability, privacy, data ownership, and security. Following implementation, the performance of this solution was evaluated to quantify performance in terms of response time and resource utilization. The results show the viability of the proposed and implemented architecture, characterized by improved privacy, device data ownership, security, and data auditability while implementing decentralized storage

    Neuroimaging study designs, computational analyses and data provenance using the LONI pipeline.

    Get PDF
    Modern computational neuroscience employs diverse software tools and multidisciplinary expertise to analyze heterogeneous brain data. The classical problems of gathering meaningful data, fitting specific models, and discovering appropriate analysis and visualization tools give way to a new class of computational challenges--management of large and incongruous data, integration and interoperability of computational resources, and data provenance. We designed, implemented and validated a new paradigm for addressing these challenges in the neuroimaging field. Our solution is based on the LONI Pipeline environment [3], [4], a graphical workflow environment for constructing and executing complex data processing protocols. We developed study-design, database and visual language programming functionalities within the LONI Pipeline that enable the construction of complete, elaborate and robust graphical workflows for analyzing neuroimaging and other data. These workflows facilitate open sharing and communication of data and metadata, concrete processing protocols, result validation, and study replication among different investigators and research groups. The LONI Pipeline features include distributed grid-enabled infrastructure, virtualized execution environment, efficient integration, data provenance, validation and distribution of new computational tools, automated data format conversion, and an intuitive graphical user interface. We demonstrate the new LONI Pipeline features using large scale neuroimaging studies based on data from the International Consortium for Brain Mapping [5] and the Alzheimer's Disease Neuroimaging Initiative [6]. User guides, forums, instructions and downloads of the LONI Pipeline environment are available at http://pipeline.loni.ucla.edu
    • …
    corecore