56,141 research outputs found

    Lustre, Hadoop, Accumulo

    Full text link
    Data processing systems impose multiple views on data as it is processed by the system. These views include spreadsheets, databases, matrices, and graphs. There are a wide variety of technologies that can be used to store and process data through these different steps. The Lustre parallel file system, the Hadoop distributed file system, and the Accumulo database are all designed to address the largest and the most challenging data storage problems. There have been many ad-hoc comparisons of these technologies. This paper describes the foundational principles of each technology, provides simple models for assessing their capabilities, and compares the various technologies on a hypothetical common cluster. These comparisons indicate that Lustre provides 2x more storage capacity, is less likely to loose data during 3 simultaneous drive failures, and provides higher bandwidth on general purpose workloads. Hadoop can provide 4x greater read bandwidth on special purpose workloads. Accumulo provides 10,000x lower latency on random lookups than either Lustre or Hadoop but Accumulo's bulk bandwidth is 10x less. Significant recent work has been done to enable mix-and-match solutions that allow Lustre, Hadoop, and Accumulo to be combined in different ways.Comment: 6 pages; accepted to IEEE High Performance Extreme Computing conference, Waltham, MA, 201

    Performance Measurements of Supercomputing and Cloud Storage Solutions

    Full text link
    Increasing amounts of data from varied sources, particularly in the fields of machine learning and graph analytics, are causing storage requirements to grow rapidly. A variety of technologies exist for storing and sharing these data, ranging from parallel file systems used by supercomputers to distributed block storage systems found in clouds. Relatively few comparative measurements exist to inform decisions about which storage systems are best suited for particular tasks. This work provides these measurements for two of the most popular storage technologies: Lustre and Amazon S3. Lustre is an open-source, high performance, parallel file system used by many of the largest supercomputers in the world. Amazon's Simple Storage Service, or S3, is part of the Amazon Web Services offering, and offers a scalable, distributed option to store and retrieve data from anywhere on the Internet. Parallel processing is essential for achieving high performance on modern storage systems. The performance tests used span the gamut of parallel I/O scenarios, ranging from single-client, single-node Amazon S3 and Lustre performance to a large-scale, multi-client test designed to demonstrate the capabilities of a modern storage appliance under heavy load. These results show that, when parallel I/O is used correctly (i.e., many simultaneous read or write processes), full network bandwidth performance is achievable and ranged from 10 gigabits/s over a 10 GigE S3 connection to 0.35 terabits/s using Lustre on a 1200 port 10 GigE switch. These results demonstrate that S3 is well-suited to sharing vast quantities of data over the Internet, while Lustre is well-suited to processing large quantities of data locally.Comment: 5 pages, 4 figures, to appear in IEEE HPEC 201

    Development of a Fabric Lustre Scale

    Get PDF
    Fabric lustre is one of those attributes which affects the visual appearance of a fabric. It is the amount of  specular light the fabric reflects. So far, there is no simple and satisfactory method for either the subjective or objective assessment of fabric lustre since its measurement is complex. A series of experiments were  conducted for the development of a scale for the subjective measurement of fabric lustre. A number of woven fabric samples with varying luster were used for the subjective assessment of lustre by trained assessors. A glossmeter was then used to measure the fabric samples objectively. Simple regression analysis  technique was applied to relate the subjective to the objective lustre data and results indicated a high degree of  agreement between them. The instrumental data were further used to construct a lustre scale which was assessed statistically for its reliability using larger fabric sample population. Furthermore, the lustre of the fabric samples were measured spectrophotometrically and results showed a good correlation between the delta Y values and the grade values of the physical lustre scale.Keywords: Fabric lustre, lustre scale, glossmeter, spectrophotomete

    The Need for Health and Community Resources in Monterey County

    Get PDF
    The Monterey County Health Department in the Planning Evaluation and Policy Unit (PEP), focuses on three areas; facilitating the implementation of the Health Department Strategic Plan, aligning and monitoring the department\u27s performance standards. PEP has had an intern who noticed the needs of Monterey County residents facing barriers when accessing health and community resources. The Monterey County’s Population is at 444,732 and more than half of the population are people of color. With having such a diverse population there are a lot of barriers to consider when accessing health and community resources such as language barriers and navigating health insurance. The consequences of this are a shorter life span, receiving poor quality care and lack or no information which can lead to mistrust in the community. This capstone project will demonstrate some successes and challenges faced in the community by interviewing agencies and organizations on their experiences. Based on the interns\u27 findings, language barriers and community engagement play an important role as to why many Monterey County residents are facing barriers when it comes to accessing health and community resources. The intern recommends organizations and agencies to do more community outreach to help engaging community residents and build a connection

    LusRegTes: A Regression Testing Tool for Lustre Programs

    Get PDF
    Lustre is a synchronous data-flow declarative language widely used for safety-critical applications (avionics, energy, transport...). In such applications, the testing activity for detecting errors of the system plays a crucial role. During the development and maintenance processes, Lustre programs are often evolving, so regression testing should be performed to detect bugs. In this paper, we present a tool for automatic regression testing of Lustre programs. We have defined an approach to generate test cases in regression testing of Lustre programs.  In this approach, a Lustre program is represented by an operator network, then the set of paths is identified and the path activation conditions are symbolically computed for each version. Regression test cases are generated by comparing paths between versions. The approach was implemented in a tool, called LusRegTes, in order to automate the test process for Lustre programs
    corecore