144,967 research outputs found
RAID-2: Design and implementation of a large scale disk array controller
We describe the implementation of a large scale disk array controller and subsystem incorporating over 100 high performance 3.5 inch disk drives. It is designed to provide 40 MB/s sustained performance and 40 GB capacity in three 19 inch racks. The array controller forms an integral part of a file server that attaches to a Gb/s local area network. The controller implements a high bandwidth interconnect between an interleaved memory, an XOR calculation engine, the network interface (HIPPI), and the disk interfaces (SCSI). The system is now functionally operational, and we are tuning its performance. We review the design decisions, history, and lessons learned from this three year university implementation effort to construct a truly large scale system assembly
NASA Langley Research Center's distributed mass storage system
There is a trend in institutions with high performance computing and data management requirements to explore mass storage systems with peripherals directly attached to a high speed network. The Distributed Mass Storage System (DMSS) Project at NASA LaRC is building such a system and expects to put it into production use by the end of 1993. This paper presents the design of the DMSS, some experiences in its development and use, and a performance analysis of its capabilities. The special features of this system are: (1) workstation class file servers running UniTree software; (2) third party I/O; (3) HIPPI network; (4) HIPPI/IPI3 disk array systems; (5) Storage Technology Corporation (STK) ACS 4400 automatic cartridge system; (6) CRAY Research Incorporated (CRI) CRAY Y-MP and CRAY-2 clients; (7) file server redundancy provision; and (8) a transition mechanism from the existent mass storage system to the DMSS
File Access Performance of Diskless Workstations
This paper studies the performance of single-user workstations that access files remotely over a local area network. From the environmental, economic, and administrative points of view, workstations that are diskless or that have limited secondary storage are desirable at the present time. Even with changing technology, access to shared data will continue to be important. It is likely that some performance penalty must be paid for remote rather than local file access. Our objectives are to assess this penalty and to explore a number of design alternatives that can serve to minimize it. Our approach is to use the results of measurement experiments to parameterize queuing network performance models. These models then are used to assess performance under load and to evahrate design alternatives. The major conclusions of our study are: (1) A system of diskless workstations with a shared file server can have satisfactory performance. By this, we mean performance comparable to that of a local disk in the lightly loaded case, and the ability to support substantial numbers of client workstations without significant degradation. As with any shared facility, good design is necessary to minimize queuing delays under high load. (2) The key to efficiency is protocols that allow volume transfers at every interface (e.g., between client and server, and between disk and memory at the server) and at every level (e.g., between client and server at the level of logical request/response and at the level of local area network packet size). However, the benefits of volume transfers are limited to moderate sizes (8-16 kbytes) by several factors. (3) From a performance point of view, augmenting the capabilities of the shared file server may be more cost effective than augmenting the capabilities of the client workstations. (4) Network contention should not be a performance problem for a lo-Mbit network and 100 active workstations in a software development environment
Cloud Storage Performance and Security Analysis with Hadoop and GridFTP
Even though cloud server has been around for a few years, most of the web hosts today have not converted to cloud yet. If the purpose of the cloud server is distributing and storing files on the internet, FTP servers were much earlier than the cloud. FTP server is sufficient to distribute content on the internet. Therefore, is it worth to shift from FTP server to cloud server? The cloud storage provider declares high durability and availability for their users, and the ability to scale up for more storage space easily could save users tons of money. However, does it provide higher performance and better security features? Hadoop is a very popular platform for cloud computing. It is free software under Apache License. It is written in Java and supports large data processing in a distributed environment. Characteristics of Hadoop include partitioning of data, computing across thousands of hosts, and executing application computations in parallel. Hadoop Distributed File System allows rapid data transfer up to thousands of terabytes, and is capable of operating even in the case of node failure. GridFTP supports high-speed data transfer for wide-area networks. It is based on the FTP and features multiple data channels for parallel transfers. This report describes the technology behind HDFS and enhancement to the Hadoop security features with Kerberos. Based on data transfer performance and security features of HDFS and GridFTP server, we can decide if we should replace GridFTP server with HDFS. According to our experiment result, we conclude that GridFTP server provides better throughput than HDFS, and Kerberos has minimal impact to HDFS performance. We proposed a solution which users authenticate with HDFS first, and get the file from HDFS server to the client using GridFTP
Design Architecture-Based on Web Server and Application Cluster in Cloud Environment
Cloud has been a computational and storage solution for many data centric
organizations. The problem today those organizations are facing from the cloud
is in data searching in an efficient manner. A framework is required to
distribute the work of searching and fetching from thousands of computers. The
data in HDFS is scattered and needs lots of time to retrieve. The major idea is
to design a web server in the map phase using the jetty web server which will
give a fast and efficient way of searching data in MapReduce paradigm. For real
time processing on Hadoop, a searchable mechanism is implemented in HDFS by
creating a multilevel index in web server with multi-level index keys. The web
server uses to handle traffic throughput. By web clustering technology we can
improve the application performance. To keep the work down, the load balancer
should automatically be able to distribute load to the newly added nodes in the
server
๋ถ์ฐ ํ์ผ ์์คํ ์์ ๋ ธ๋ ๊ฐ ์ปจ์์คํด์ ์ ์ง ์ค๋ฒํค๋์ ์ ๊ฑฐ๋ฅผ ์ํ per-core file allocation ๋ฐฉ๋ฒ
ํ์๋
ผ๋ฌธ (์์ฌ)-- ์์ธ๋ํ๊ต ๋ํ์ ๊ณต๊ณผ๋ํ ์ปดํจํฐ๊ณตํ๋ถ, 2017. 8. ์ผํ์.Distributed File System (DFS) is a file system that allows access to multiple
storage servers through a computer network. The modern DFS offers a variety
of functions such as load balancing, location transparency, high availability, and
fault tolerance. Among them, the fault tolerance is one of the most important
functions required to protect data from server and disk failures. GlusterFS of a
typical DFS supports replication, which replicates data for fault tolerance and
stores it on a separate server, and Erasure code that stores the parity on another
sever after encoding the data. When it comes to these two ways, it is essential to
maintain the data consistency between the server nodes in order to maintain the
consistency of the data because the information about one data is distributed
and stored across multiple server nodes. If the data consistency is not maintained,
each server node stores data with different contents, which leads to the
destruction of fault tolerance. Therefore, the GlusterFS uses a method to aci
quire a lock in all servers when performing each operation to solve the problem.
The reason for using this method is because file operations can be delivered as
intermixed between sever nodes. All file operations must be atomically applied
to the entire sever node. However, in a current implementation of the GlusterFS,
it can be operated in parallel in multiple io-thread and event-thread even in operations
on the same file, so that it requires a concurrency control. This can
cause up to two additional round trips as well as overheads such as managing
locks. Therefore, we propose a method to maintain data consistency between
server nodes without an additional concurrency control by keeping the order of
operations on the same file in the whole system by making the operations of
the same file performed on the same core all the time. In this way, we could
achieve mean 63% and up to 83% performance improvements in randread, and
mean 60% and up to 69% performance improvements in randwrite.Chapter 1 Introduction 1
Chapter 2 Background & Motivation 3
2.1 Dispersed Volume 4
2.2 The maintenance of Data Consistency in Dispersed Volume 6
2.3 Dispersed Volume Cluster Lock 7
2.4 Structure of current GlusterFS 9
Chapter 3 Design & Implementation 12
Chapter 4 Evaluation 15
Chapter 5 Conclusion 18
Bibliography 19
์์ฝ 21Maste
- โฆ