3,306 research outputs found

    Towards an accurate evaluation of deduplicated storage systems

    Get PDF
    Deduplication has proven to be a valuable technique for eliminating duplicate data in backup and archival systems and is now being applied to new storage environments with distinct requirements and performance trade-offs. Namely, deduplication system are now targeting large-scale cloud computing storage infrastructures holding unprecedented data volumes with a significant share of duplicate content. It is however hard to assess the usefulness of deduplication in particular settings and what techniques provide the best results. In fact, existing disk I/O benchmarks follow simplistic approaches for generating data content leading to unrealistic amounts of duplicates that do not evaluate deduplication systems accurately. Moreover, deduplication systems are now targeting heterogeneous storage environments, with specific duplication ratios, that benchmarks must also simulate. We address these issues with DEDISbench, a novel micro-benchmark for evaluating disk I/O performance of block based deduplication systems. As the main contribution, DEDISbench generates content by following realistic duplicate content distributions extracted from real datasets. Then, as a second contribution, we analyze and extract the duplicates found on three real storage systems, proving that DEDISbench can easily simulate several workloads. The usefulness of DEDISbench is shown by comparing it with Bonnie++ and IOzone open-source disk I/O micro-benchmarks on assessing two open-source deduplication systems, Opendedup and Lessfs, using Ext4 as a baseline. Our results lead to novel insight on the performance of these file systems.This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT - Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project RED FCOMP-01-0124-FEDER-010156 and FCT by Ph.D scholarship SFRH-BD-71372-2010

    Enhancing the Accuracy of Synthetic File System Benchmarks

    Get PDF
    File system benchmarking plays an essential part in assessing the file system’s performance. It is especially difficult to measure and study the file system’s performance as it deals with several layers of hardware and software. Furthermore, different systems have different workload characteristics so while a file system may be optimized based on one given workload it might not perform optimally based on other types of workloads. Thus, it is imperative that the file system under study be examined with a workload equivalent to its production workload to ensure that it is optimized according to its usage. The most widely used benchmarking method is synthetic benchmarking due to its ease of use and flexibility. The flexibility of synthetic benchmarks allows system designers to produce a variety of different workloads that will provide insight on how the file system will perform under slightly different conditions. The downside of synthetic workloads is that they produce generic workloads that do not have the same characteristics as production workloads. For instance, synthetic benchmarks do not take into consideration the effects of the cache that can greatly impact the performance of the underlying file system. In addition, they do not model the variation in a given workload. This can lead to file systems not optimally designed for their usage. This work enhanced synthetic workload generation methods by taking into consideration how the file system operations are satisfied by the lower level function calls. In addition, this work modeled the variations of the workload’s footprint when present. The first step in the methodology was to run a given workload and trace it by a tool called tracefs. The collected traces contained data on the file system operations and the lower level function calls that satisfied these operations. Then the trace was divided into chunks sufficiently small enough to consider the workload characteristics of that chunk to be uniform. Then the configuration file that modeled each chunk was generated and supplied to a synthetic workload generator tool that was created by this work called FileRunner. The workload definition for each chunk allowed FileRunner to generate a synthetic workload that produced the same workload footprint as the corresponding segment in the original workload. In other words, the synthetic workload would exercise the lower level function calls in the same way as the original workload. Furthermore, FileRunner generated a synthetic workload for each specified segment in the order that they appeared in the trace that would result in a in a final workload mimicking the variation present in the original workload. The results indicated that the methodology can create a workload with a throughput within 10% difference and with operation latencies, with the exception of the create latencies, to be within the allowable 10% difference and in some cases within the 15% maximum allowable difference. The work was able to accurately model the I/O footprint. In some cases the difference was negligible and in the worst case it was at 2.49% difference

    Data Management Strategies for Relative Quality of Service in Virtualised Storage Systems

    No full text
    The amount of data managed by organisations continues to grow relentlessly. Driven by the high costs of maintaining multiple local storage systems, there is a well established trend towards storage consolidation using multi-tier Virtualised Storage Systems (VSSs). At the same time, storage infrastructures are increasingly subject to stringent Quality of Service (QoS) demands. Within a VSS, it is challenging to match desired QoS with delivered QoS, considering the latter can vary dramatically both across and within tiers. Manual efforts to achieve this match require extensive and ongoing human intervention. Automated efforts are based on workload analysis, which ignores the business importance of infrequently accessed data. This thesis presents our design, implementation and evaluation of data maintenance strategies in an enhanced version of the popular Linux Extended 3 Filesystem which features support for the elegant specification of QoS metadata while maintaining compatibility with stock kernels. Users and applications specify QoS requirements using a chmod-like interface. System administrators are provided with a character device kernel interface that allows for profiling of the QoS delivered by the underlying storage. We propose a novel score-based metric, together with associated visualisation resources, to evaluate the degree of QoS matching achieved by any given data layout. We also design and implement new inode and datablock allocation and migration strategies which exploit this metric in seeking to match the QoS attributes set by users and/or applications on files and directories with the QoS actually delivered by each of the filesystem’s block groups. To create realistic test filesystems we have included QoS metadata support in the Impressions benchmarking framework. The effectiveness of the resulting data layout in terms of QoS matching is evaluated using a special kernel module that is capable of inspecting detailed filesystem data on-the-fly. We show that our implementations of the proposed inode and datablock allocation strategies are capable of dramatically improving data placement with respect to QoS requirements when compared to the default allocators

    FLINT: A Platform for Federated Learning Integration

    Full text link
    Cross-device federated learning (FL) has been well-studied from algorithmic, system scalability, and training speed perspectives. Nonetheless, moving from centralized training to cross-device FL for millions or billions of devices presents many risks, including performance loss, developer inertia, poor user experience, and unexpected application failures. In addition, the corresponding infrastructure, development costs, and return on investment are difficult to estimate. In this paper, we present a device-cloud collaborative FL platform that integrates with an existing machine learning platform, providing tools to measure real-world constraints, assess infrastructure capabilities, evaluate model training performance, and estimate system resource requirements to responsibly bring FL into production. We also present a decision workflow that leverages the FL-integrated platform to comprehensively evaluate the trade-offs of cross-device FL and share our empirical evaluations of business-critical machine learning applications that impact hundreds of millions of users.Comment: Preprint for MLSys 202

    Weather responsive internal roof shading systems for existing long-span glazed roof over large naturally ventilated and air-conditioned pedestrian concourses in the tropics

    Get PDF
    This research aims are to optimize weather responsive internal roof shading design systems and to recommend some design principles and guidelines for internal roof shading systems. Such systems would then provide a better building-centric thermal environment and energy performance, while maintaining adequate levels of natural lighting within the existing long-span glazed roofs over large naturally ventilated and air-conditioned pedestrian concourse in the tropics. Two shading configurations: low and high level shadings were tested both the physical indoor environment and energy performance using dynamic thermal and lighting models on the typical clear days and overcast day in summer and winter respectively. The thermal performance of these test cases was assessed using internal surface temperatures, air temperatures, mean radiant temperatures and operative temperatures. The energy performance of the tested cases was examined using solar heat gain and cooling loads as well as the visual performance using illuminance and daylight factors. These remedial solutions were also assessed the financial benefits using standard economic analysis methods to provide recommendation on the cost and payback periods. The predicted results of the large glazed naturally ventilated pedestrian concourse reveal that the internal roof shading device was very effective in reducing inner surface temperatures and consequently reducing radiant heat gain into the space. The low level shadings are more effective than the high level shadings in term of providing better energy, internal thermal and lighting performance. This configuration would reduce two third of the solar heat gain in the large glazed pedestrian concourse space. The predicted results of the large glazed air-conditioned pedestrian concourse reveal that only the low level shading can improve physical environment in terms of thermal, energy and lighting conditions. This configuration would reduce the ground floor heat gain and also the inner surface temperatures significantly. The buffer zone is a key reason that the low level shadings perform better than the high level shading. For the naturally ventilated case, creating a ventilated naturally thermal buffer space is critical to the design of an effective internal roof shading system. The large void space between the glazed roof and the low level shadings allows the free movement of the hot air to dissipate to the outdoors at a high level before it can enter the spaces below. For the air-conditioned case, a larger volume of air over the low level shadings allow for more accumulation of heat as compared to a smaller volume of air over the high shadings. In addition, high solar reflective property of the fabric decreases the solar heat by reflect a portion of the solar heat back out through the transparent roof, while some solar energy is also trapped within the air gap. According to the thermal environmental conditions required for comfort by the operative temperature recommended by ASHRAE (2004), the both shading options of the large glazed naturally ventilated case could only ease to some degree thermal discomfort. While the low level shading of the large glazed air-conditioned case also goes a long way to alleviating summer thermal discomfort. However the shadings could reduce the internal surface temperature significantly which are the main causes of the radiation heat gain in the large glazed naturally ventilated and air-conditioned pedestrian concourses. The visual performance results in both case studies reveal that the internal roof shading significantly reduced and maintained daylighting levels at an appropriate quality of light according to the CIBSE‘s recommendation only on hot clear days. Therefore retractable shading devices are recommended to provide sun screening only when required such as on summer clear days when solar gain is likely to result in overheating. Apart from the possible financial benefits with a present interest rate at 4.85% in China and 1.35% in Thailand over a life time of 30 years, the investment of the shading system could be financially beneficial due to the NPV>0 and the IRR was greater than interest rate in both forms of long-span glazed roofs over large pedestrian concourses; natural ventilation and air-condition

    Digitalization of the product development process at Scania engine assembly

    Get PDF
    The technology is constantly developing and companies are striving to work towards a more digital approach. Scania CV AB is a world leading Company manufacturing buses and trucks for heavy transport applications. To maintain their competitive position at the market the company has the ambition for the Product development process to become more digitalized. A goal is to implement a more simulation based and drawing free working method. This project has been carried out at the engine assembly department. The purpose with the thesis was to identify how parts of the product development process could be more digitalized. This included identifying the gap that will occur between the current working process and a more digital approach. Furthermore, it involved finding solutions for the gap and to present possible impacts of a digital working approach. The initial phase of the thesis was to find a suitable methodology for this type of study. The project proceeded with conducting a literature study to gain deeper insight of the subjects covered. A good foundation was obtained and the empirical study could commence. The data collection in the empirical study was gathered mainly within Scania through interviews, observations and archive analyses. Based on this information an analysis and result was carried out and presented. A gap was identified describing deficient areas in the current digital environment. The working method Model Based Definition (MBD) and a software called Industrial Path Solutions (IPS) are presented as solutions for the gap. Suggestions of how the working process should be modified have been set as prerequisites. Impacts including cost savings, quality improvements, shorter lead times and ergonomic benefits have been submitted.Tekniken utvecklas stĂ€ndigt och företag strĂ€var dĂ€rför att arbeta mot ett mer digitalt arbetssĂ€tt. Scania CV AB Ă€r ett vĂ€rldsledande företag som tillverkar bussar och lastbilar för tunga transporter. För att behĂ„lla sin konkurrenskraftiga position pĂ„ marknaden har företaget ambitionen att göra produktutvecklingsprocessen mer digitaliserad. Ett mĂ„l Ă€r att utveckla en mer simuleringsbaserad och ritningslös arbetsmetod. Detta projekt har genomförts pĂ„ produktionsavdelningen dĂ€r montering av motorer sker. Syftet med uppsatsen var att identifiera hur delar av den nuvarande produktutvecklingsprocessen skulle kunna bli mer digitaliserad. Detta innebar att identifiera det gap som kommer att uppstĂ„ mellan den nuvarande arbetsprocessen och ett mer digitaliserat tillvĂ€gagĂ„ngssĂ€tt. Lösningar pĂ„ gapet och effekterna av ett mer digitalt arbete skulle ocksĂ„ presenteras. Den inledande delen av arbetet innefattade att hitta en lĂ€mplig metod för denna typ av studie. Projektet fortskred med en litteraturstudie för att fĂ„ djupare inblick i de Ă€mnen som projektet kommer att grundas i. Med en bra grundförstĂ„else kunde en empirisk studie pĂ„börjas. Datainsamlingen till den empiriska studien samlades huvudsakligen in pĂ„ Scania genom intervjuer, observationer och arkivanalyser. Baserat pĂ„ denna information genomfördes och presenterades en analys och ett resultat. Ett gap som beskriver de bristfĂ€lliga omrĂ„den i den nuvarande digitala miljön identifierades. Arbetsmetoden Model Based Definition (MBD) och mjukvaran Industrial Path Solutions (IPS) presenterades som lösningar pĂ„ gapet. Även förslag pĂ„ hur arbetsprocessen kan Ă€ndras för att möjliggöra för ett mer digitalt tillvĂ€gagĂ„ngssĂ€tt har redogjorts. Följderna av detta som inkluderar kostnadsbesparingar, kvalitetsförbĂ€ttringar, kortare ledtider och ergonomifördelar har ocksĂ„ sammanstĂ€llts

    Ukraine's Destination Image as Perceived by U.S. College Students

    Get PDF
    The present study is an exploration of Ukraine's destination image as viewed by U.S. college students. Student market is a rapidly growing one and presents opportunities for emerging destinations like Ukraine. A combination of qualitative and quantitative approaches was utilized to investigate the image of Ukraine and build a three-dimensional destination image model. Respondents were asked to answer three open-ended questions and rate the level of their agreement with pre-developed statements that pertain to Ukraine. Concepts conveyed by both methods were distributed along three continuums that comprise the destination image model. Implications for promotional and marketing efforts were suggested.M.S
    • 

    corecore