4,260 research outputs found

    Crucial File Selection Strategy (CFSS) for Enhanced Download Response Time in Cloud Replication Environments

    Get PDF
    الحوسبة السحابية هي عبارة عن منصة ضخمة لتقديم بيانات كبيرة الحجم من أجهزة متعددة وتقنيات مختلفه. هناك طلب كبير من قبل مستأجري السحابة للوصول إلى بياناتهم بشكل أسرع دون أي انقطاع. يبدل مقدمو الخدمات السحابية كل جهدهم لضمان تأمين كل البيانات الفردية وإمكانية الوصول إليها دائمًا. ومن الملاحظ بإن استراتيجية النسخ المتماثل المناسبة القادرة على اختيار البيانات الأساسية مطلوبة في بيئات النسخ السحابي كأحد الحلول. اقترحت هذه الورقة استراتيجية اختيار الملفات الحاسمة (CFSS) لمعالجة وقت الاستجابة الضعيف في بيئة النسخ المتماثل السحابي. يتم استخدام محاكي سحابة يسمى CloudSim لإجراء التجارب اللازمة ، ويتم تقديم النتائج لإثبات التحسن في أداء النسخ المتماثل. تمت مناقشة الرسوم البيانية التحليلية التي تم الحصول عليها بدقة ، وأظهرت النتائج تفوق خوارزمية CFSS المقترحة على خوارزمية أخرى موجودة مع تحسن بنسبة 10.47 ٪ في متوسط ​​وقت الاستجابة لوظائف متعددة في كل جولة.Cloud Computing is a mass platform to serve high volume data from multi-devices and numerous technologies. Cloud tenants have a high demand to access their data faster without any disruptions. Therefore, cloud providers are struggling to ensure every individual data is secured and always accessible. Hence, an appropriate replication strategy capable of selecting essential data is required in cloud replication environments as the solution. This paper proposed a Crucial File Selection Strategy (CFSS) to address poor response time in a cloud replication environment. A cloud simulator called CloudSim is used to conduct the necessary experiments, and results are presented to evidence the enhancement on replication performance. The obtained analytical graphs are discussed thoroughly, and apparently, the proposed CFSS algorithm outperformed another existing algorithm with a 10.47% improvement in average response time for multiple jobs per round

    Trends in the Solution of Distributed Data Placement Problem

    Get PDF
    Data placement for optimal performance is an old problem. For example the problem dealt with the placement of relational data in distributed databases, to achieve optimal query processing time. Heterogeneous distributed systems with commodity processors evolved in response to requirement of storage and processing capacity of enormous scale. Reliability and availability are accomplished by appropriate level of data replication, and efficiency is achieved by suitable placement and processing techniques. Where to place which data, how many copies to keep, how to propagate updates so as to maximize the reliability, availability and performance are the issues addressed. In addition to processing costs, the network parameters of bandwidth limitation, speed and reliability have to be considered. This paper surveys the state of the art of published literature on these topics. We are confident that the placement problem will continue to be a research problem in the future also, with the parameters changing. Such situations will arise for example with the advance of mobile smart phones both in terms of the capability and applications

    On-Demand Big Data Integration: A Hybrid ETL Approach for Reproducible Scientific Research

    Full text link
    Scientific research requires access, analysis, and sharing of data that is distributed across various heterogeneous data sources at the scale of the Internet. An eager ETL process constructs an integrated data repository as its first step, integrating and loading data in its entirety from the data sources. The bootstrapping of this process is not efficient for scientific research that requires access to data from very large and typically numerous distributed data sources. a lazy ETL process loads only the metadata, but still eagerly. Lazy ETL is faster in bootstrapping. However, queries on the integrated data repository of eager ETL perform faster, due to the availability of the entire data beforehand. In this paper, we propose a novel ETL approach for scientific data integration, as a hybrid of eager and lazy ETL approaches, and applied both to data as well as metadata. This way, Hybrid ETL supports incremental integration and loading of metadata and data from the data sources. We incorporate a human-in-the-loop approach, to enhance the hybrid ETL, with selective data integration driven by the user queries and sharing of integrated data between users. We implement our hybrid ETL approach in a prototype platform, Obidos, and evaluate it in the context of data sharing for medical research. Obidos outperforms both the eager ETL and lazy ETL approaches, for scientific research data integration and sharing, through its selective loading of data and metadata, while storing the integrated data in a scalable integrated data repository.Comment: Pre-print Submitted to the DMAH Special Issue of the Springer DAPD Journa

    Policy-based SLA storage management model for distributed data storage services

    Get PDF
    There is  high demand for storage related services supporting scientists in their research activities. Those services are expected to provide not only capacity but also features allowing for more flexible and cost efficient usage. Such features include easy multiplatform data access, long term data retention, support for performance and cost differentiating of SLA restricted data access. The paper presents a policy-based SLA storage management model for distributed data storage services. The model allows for automated management of distributed data aimed at QoS provisioning with no strict resource reservation. The problem of providing  users with the required QoS requirements is complex, and therefore the model implements heuristic approach  for solving it. The corresponding system architecture, metrics and methods for SLA focused storage management are developed and tested in a real, nationwide environment

    Scalable service for flexible access to personal content

    Get PDF
    corecore