144,967 research outputs found

    RAID-2: Design and implementation of a large scale disk array controller

    Get PDF
    We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly

    NASA Langley Research Center's distributed mass storage system

    Get PDF
    There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at NASA LaRC is building such a system and expects to put it into production use by the end of 1993. This paper presents the design of the DMSS, some experiences in its development and use, and a performance analysis of its capabilities. The special features of this system are: (1) workstation class file servers running UniTree software; (2) third party I/O; (3) HIPPI network; (4) HIPPI/IPI3 disk array systems; (5) Storage Technology Corporation (STK) ACS 4400 automatic cartridge system; (6) CRAY Research Incorporated (CRI) CRAY Y-MP and CRAY-2 clients; (7) file server redundancy provision; and (8) a transition mechanism from the existent mass storage system to the DMSS

    File Access Performance of Diskless Workstations

    Get PDF
    This paper studies the performance of single-user workstations that access files remotely over a local area network. From the environmental, economic, and administrative points of view, workstations that are diskless or that have limited secondary storage are desirable at the present time. Even with changing technology, access to shared data will continue to be important. It is likely that some performance penalty must be paid for remote rather than local file access. Our objectives are to assess this penalty and to explore a number of design alternatives that can serve to minimize it. Our approach is to use the results of measurement experiments to parameterize queuing network performance models. These models then are used to assess performance under load and to evahrate design alternatives. The major conclusions of our study are: (1) A system of diskless workstations with a shared file server can have satisfactory performance. By this, we mean performance comparable to that of a local disk in the lightly loaded case, and the ability to support substantial numbers of client workstations without significant degradation. As with any shared facility, good design is necessary to minimize queuing delays under high load. (2) The key to efficiency is protocols that allow volume transfers at every interface (e.g., between client and server, and between disk and memory at the server) and at every level (e.g., between client and server at the level of logical request/response and at the level of local area network packet size). However, the benefits of volume transfers are limited to moderate sizes (8-16 kbytes) by several factors. (3) From a performance point of view, augmenting the capabilities of the shared file server may be more cost effective than augmenting the capabilities of the client workstations. (4) Network contention should not be a performance problem for a lo-Mbit network and 100 active workstations in a software development environment

    Cloud Storage Performance and Security Analysis with Hadoop and GridFTP

    Get PDF
    Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP

    Design Architecture-Based on Web Server and Application Cluster in Cloud Environment

    Full text link
    Cloud has been a computational and storage solution for many data centric organizations. The problem today those organizations are facing from the cloud is in data searching in an efficient manner. A framework is required to distribute the work of searching and fetching from thousands of computers. The data in HDFS is scattered and needs lots of time to retrieve. The major idea is to design a web server in the map phase using the jetty web server which will give a fast and efficient way of searching data in MapReduce paradigm. For real time processing on Hadoop, a searchable mechanism is implemented in HDFS by creating a multilevel index in web server with multi-level index keys. The web server uses to handle traffic throughput. By web clustering technology we can improve the application performance. To keep the work down, the load balancer should automatically be able to distribute load to the newly added nodes in the server

    ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์—์„œ ๋…ธ๋“œ ๊ฐ„ ์ปจ์‹œ์Šคํ„ด์‹œ ์œ ์ง€ ์˜ค๋ฒ„ํ—ค๋“œ์˜ ์ œ๊ฑฐ๋ฅผ ์œ„ํ•œ per-core file allocation ๋ฐฉ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (์„์‚ฌ)-- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› ๊ณต๊ณผ๋Œ€ํ•™ ์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2017. 8. ์—ผํ—Œ์˜.Distributed File System (DFS) is a file system that allows access to multiple storage servers through a computer network. The modern DFS offers a variety of functions such as load balancing, location transparency, high availability, and fault tolerance. Among them, the fault tolerance is one of the most important functions required to protect data from server and disk failures. GlusterFS of a typical DFS supports replication, which replicates data for fault tolerance and stores it on a separate server, and Erasure code that stores the parity on another sever after encoding the data. When it comes to these two ways, it is essential to maintain the data consistency between the server nodes in order to maintain the consistency of the data because the information about one data is distributed and stored across multiple server nodes. If the data consistency is not maintained, each server node stores data with different contents, which leads to the destruction of fault tolerance. Therefore, the GlusterFS uses a method to aci quire a lock in all servers when performing each operation to solve the problem. The reason for using this method is because file operations can be delivered as intermixed between sever nodes. All file operations must be atomically applied to the entire sever node. However, in a current implementation of the GlusterFS, it can be operated in parallel in multiple io-thread and event-thread even in operations on the same file, so that it requires a concurrency control. This can cause up to two additional round trips as well as overheads such as managing locks. Therefore, we propose a method to maintain data consistency between server nodes without an additional concurrency control by keeping the order of operations on the same file in the whole system by making the operations of the same file performed on the same core all the time. In this way, we could achieve mean 63% and up to 83% performance improvements in randread, and mean 60% and up to 69% performance improvements in randwrite.Chapter 1 Introduction 1 Chapter 2 Background & Motivation 3 2.1 Dispersed Volume 4 2.2 The maintenance of Data Consistency in Dispersed Volume 6 2.3 Dispersed Volume Cluster Lock 7 2.4 Structure of current GlusterFS 9 Chapter 3 Design & Implementation 12 Chapter 4 Evaluation 15 Chapter 5 Conclusion 18 Bibliography 19 ์š”์•ฝ 21Maste
    • โ€ฆ
    corecore