11 research outputs found

    A framework for evolving grid computing systems.

    Get PDF
    Grid computing was born in the 1990s, when researchers were looking for a way to share expensive computing resources and experiment equipment. Grid computing is becoming increasingly popular because it promotes the sharing of distributed resources that may be heterogeneous in nature, and it enables scientists and engineering professionals to solve large scale computing problems. In reality, there are already huge numbers of grid computing facilities distributed around the world, each one having been created to serve a particular group of scientists such as weather forecasters, or a group of users such as stock markets. However, the need to extend the functionalities of current grid systems lends itself to the consideration of grid evolution. This allows the combination of many disjunct grids into a single powerful grid that can operate as one vast computational resource, as well as for grid environments to be flexible, to be able to change and to evolve. The rationale for grid evolution is the current rapid and increasing advances in both software and hardware. Evolution means adding or removing capabilities. This research defines grid evolution as adding new functions and/or equipment and removing unusable resources that affect the performance of some nodes. This thesis produces a new technique for grid evolution, allowing it to be seamless and to operate at run time. Within grid computing, evolution is an integration of software and hardware and can be of two distinct types, external and internal. Internal evolution occurs inside the grid boundary by migrating special resources such as application software from node to node inside the grid. While external evolution occurs between grids. This thesis develops a framework for grid evolution that insulates users from the complexities of grids. This framework has at its core a resource broker together with a grid monitor to cope with internal and external evolution, advance reservation, fault tolerance, the monitoring of the grid environment, increased resource utilisation and the high availability of grid resources. The starting point for the present framework of grid evolution is when the grid receives a job whose requirements do not exist on the required node which triggers grid evolution. If the grid has all the requirements scattered across its nodes, internal evolution enabling the grid to migrate the required resources to the required node in order to satisfy job requirements ensues, but if the grid does not have these resources, external evolution enables the grid either to collect them from other grids (permanent evolution) or to send the job to other grids for execution (just in time) evolution. Finally a simulation tool called (EVOSim) has been designed, developed and tested. It is written in Oracle 10g and has been used for the creation of four grids, each of which has a different setup including different nodes, application software, data and polices. Experiments were done by submitting jobs to the grid at run time, and then comparing the results and analysing the performance of those grids that use the approach of evolution with those that do not. The results of these experiments have demonstrated that these features significantly improve the performance of grid environments and provide excellent scheduling results, with a decreasing number of rejected jobs

    Evaluation des liens 10 GbE de Grid'5000

    Get PDF
    L'instrument Grid5000 est destinรฉ ร  l'รฉtude des problรฉmatiques, des solutions et des logiciels de grille pour le calcul et le stockage distribuรฉ ร  large รฉchelle. En 2006, Grid5000 s'est dotรฉ d'un rรฉseau privรฉ virtuel composรฉ de liens d'accรจs ร  1 ou 10Gb/s et de longueurs d'onde ร  10Gb/s dรฉdiรฉes dans l'infrastructure DWDM de RENATER 4. Ce rapport prรฉsente une รฉtude de l'apport potentiel de cette infrastructure pour les applications distribuรฉes via une รฉvaluation des performances de TCP, protocole prรฉpondรฉrant dans ces applications. Cette รฉtude met d'abord en lumiรจre l'incidence trรจs importante du paramรฉtrage du protocole dans un tel contexte et explique le faible dรฉbit observรฉ tant par l'opรฉrateur que par les utilisateurs. Les rรฉsultats obtenus via un calibrage adรฉquat ou l'utilisation de flux parallรจles sont ensuite prรฉsentรฉs. Enfin, plusieurs anomalies de configuration et de comportement de l'infrastructure sont exposรฉes

    VLAM-G: Interactive Data Driven Workflow Engine for Grid-Enabled Resources

    Get PDF

    Service-Oriented Ad Hoc Grid Computing

    Get PDF
    Subject of this thesis are the design and implementation of an ad hoc Grid infrastructure. The vision of an ad hoc Grid further evolves conventional service-oriented Grid systems into a more robust, more flexible and more usable environment that is still standards compliant and interoperable with other Grid systems. A lot of work in current Grid middleware systems is focused on providing transparent access to high performance computing (HPC) resources (e.g. clusters) in virtual organizations spanning multiple institutions. The ad hoc Grid vision presented in this thesis exceeds this view in combining classical Grid components with more flexible components and usage models, allowing to form an environment combining dedicated HPC-resources with a large number of personal computers forming a "Desktop Grid". Three examples from medical research, media research and mechanical engineering are presented as application scenarios for a service-oriented ad hoc Grid infrastructure. These sample applications are also used to derive requirements for the runtime environment as well as development tools for such an ad hoc Grid environment. These requirements form the basis for the design and implementation of the Marburg ad hoc Grid Environment (MAGE) and the Grid Development Tools for Eclipse (GDT). MAGE is an implementation of a WSRF-compliant Grid middleware, that satisfies the criteria for an ad hoc Grid middleware presented in the introduction to this thesis. GDT extends the popular Eclipse integrated development environment by components that support application development both for traditional service-oriented Grid middleware systems as well as ad hoc Grid infrastructures such as MAGE. These development tools represent the first fully model driven approach to Grid service development integrated with infrastructure management components in service-oriented Grid computing. This thesis is concluded by a quantitative discussion of the performance overhead imposed by the presented extensions to a service-oriented Grid middleware as well as a discussion of the qualitative improvements gained by the overall solution. The conclusion of this thesis also gives an outlook on future developments and areas for further research. One of these qualitative improvements is "hot deployment" the ability to install and remove Grid services in a running node without interrupt to other active services on the same node. Hot deployment has been introduced as a novelty in service-oriented Grid systems as a result of the research conducted for this thesis. It extends service-oriented Grid computing with a new paradigm, making installation of individual application components a functional aspect of the application. This thesis further explores the idea of using peer-to-peer (P2P networking for Grid computing by combining a general purpose P2P framework with a standard compliant Grid middleware. In previous work the application of P2P systems has been limited to replica location and use of P2P index structures for discovery purposes. The work presented in this thesis also uses P2P networking to realize seamless communication accross network barriers. Even though the web service standards have been designed for the internet, the two-way communication requirement introduced by the WSRF-standards and particularly the notification pattern is not well supported by the web service standards. This defficiency can be answered by mechanisms that are part of such general purpose P2P communication frameworks. Existing security infrastructures for Grid systems focus on protection of data during transmission and access control to individual resources or the overall Grid environment. This thesis focuses on security issues within a single node of a dynamically changing service-oriented Grid environment. To counter the security threads arising from the new capabilities of an ad hoc Grid, a number of novel isolation solutions are presented. These solutions address security issues and isolation on a fine-grained level providing a range of applicable basic mechanisms for isolation, ranging from lightweight system call interposition to complete para-virtualization of the operating systems

    Policies for Web Services

    Get PDF
    Web services are predominantly used to implement service-oriented architectures (SOA). However, there are several areas such as temporal dimensions, real-time, streaming, or efficient and flexible file transfers where web service functionality should be extended. These extensions can, for example, be achieved by using policies. Since there are often alternative solutions to provide functionality (e.g., different protocols can be used to achieve the transfer of data), the WS-Policy standard is especially useful to extend web services with policies. It allows to create policies to generally state the properties under which a service is provided and to explicitly express alternative properties. To extend the functionality of web services, two policies are introduced in this thesis: the Temporal Policy and the Communication Policy. The temporal policy is the foundation for adding temporal dimensions to a WS-Policy. The temporal policy itself is not a WS-Policy but an independent policy language that describes temporal dimensions of and dependencies between temporal policies and WS-Policies. Switching of protocol dependencies, pricing of services, quality of service, and security are example areas for using a temporal policy. To describe protocol dependencies of a service for streaming, real-time and file transfers, a communication policy can be utilized. The communication policy is a concrete WS-Policy. With the communication policy, a service can expose the protocols it depends on for a communication after its invocation. Thus, a web service client knows the protocols required to support a communication with the service. Therefore, it is possible to evaluate beforehand whether an invocation of a service is reasonable. On top of the newly introduced policies, novel mechanisms and tools are provided to alleviate service use and enable flexible and efficient data handling. Furthermore, the involvement of the end user in the development process can be achieved more easily. The Flex-SwA architecture, the first component in this thesis based on the newly introduced policies, implements the actual file transfers and streaming protocols that are described as dependencies in a communication policy. Several communication patterns support the flexible handling of the communication. A reference concept enables seamless message forwarding with reduced data movement. Based on the Flex-SwA implementation and the communication policy, it is possible to improve usability - especially in the area of service-oriented Grids - by integrating data transfers into an automatically generated web and Grid service client. The Web and Grid Service Browser is introduced in this thesis as such a generic client. It provides a familiar environment for using services by offering the client generation as part of the browser. Data transfers are directly integrated into service invocation without having to perform data transmissions explicitly. For multimedia MIME types, special plugins allow the consumption of multimedia data. To enable an end user to build applications that also leverage high performance computing resources, the Service-enabled Mashup Editor is presented that lets the user combine popular web applications with web and Grid services. Again, the communication policy provides descriptive means for file transfers and Flex-SwAs reference concept is used for data exchange. To show the applicability of these novel concepts, several use cases from the area of multimedia processing have been selected. Based on the temporal policy, the communication policy, Flex-SwA, the Web and Grid Service Browser, and the Service-enabled Mashup Editor, the development of a scalable service-oriented multimedia architecture is presented. The multimedia SOA offers, among others, a face detection workflow, a video-on-demand service, and an audio resynthesis service. More precisely, a video-on-demand service describes its dependency on a multicast protocol by using a communication policy. A temporal policy is then used to perform the description of a protocol switch from one multicast protocol to another one by changing the communication policy at the end of its validity period. The Service-enabled Mashup Editor is used as a client for the new multicast protocol after the multicast protocol has been switched. To stream single frames from a frame decoder service to a face detection service (which are both part of the face detection workflow) and to transfer audio files with the different Flex-SwA communication patterns to an audio resynthesis service, Flex-SwA is used. The invocation of the face detection workflow and the audio resynthesis service is realized with the Web and Grid Service Browser

    ํ”Œ๋ž˜์‹œ ๊ธฐ๋ฐ˜์˜ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ์ž…์ถœ๋ ฅ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•

    Get PDF
    ํ•™์œ„๋…ผ๋ฌธ (๋ฐ•์‚ฌ) -- ์„œ์šธ๋Œ€ํ•™๊ต ๋Œ€ํ•™์› : ๊ณต๊ณผ๋Œ€ํ•™ ์ „๊ธฐยท์ปดํ“จํ„ฐ๊ณตํ•™๋ถ€, 2020. 8. ์—„ํ˜„์ƒ.Most I/O traffic in high performance computing (HPC) storage systems is dominated by checkpoints and the restarts of HPC applications. For such a bursty I/O, new all-flash HPC storage systems with an integrated burst buffer (BB) and parallel file system (PFS) have been proposed. However, most of the distributed file systems (DFS) used to configure the storage systems provide a single connection between a compute node and a server node, which hinders users from utilizing the high I/O bandwidth provided by an all-flash server node. To provide multiple connections, DFSs must be modified to increase the number of sockets, which is an extremely difficult and time-consuming task owing to their complicated structures. Users can increase the number of daemons in the DFSs to forcibly increase the number of connections without a DFS modification. Because each daemon has a mount point for its connection, there are multiple mount points in the compute nodes, resulting in significant effort required for users to distribute file I/O requests to multiple mount points. In addition, to avoid access to a PFS composed of low-speed storage devices, such as hard disks, dedicated BB allocation is preferred despite its severe underutilization. However, a BB allocation method may be inappropriate because all-flash HPC storage systems speed up access to the PFS. To handle such problems, we propose an efficient user-transparent I/O management scheme for all-flash HPC storage systems. The first scheme, I/O transfer management, provides multiple connections between a compute node and a server node without additional effort from DFS developers and users. To do so, we modified a mount procedure and I/O processing procedures in a virtual file system (VFS). In the second scheme, data management between BB and PFS, a BB over-subscription allocation method is adopted to improve the BB utilization. Unfortunately, the allocation method aggravates the I/O interference and demotion overhead from the BB to the PFS, resulting in a degraded checkpoint and restart performance. To minimize this degradation, we developed an I/O scheduler and a new data management based on the checkpoint and restart characteristics. To prove the effectiveness of our proposed schemes, we evaluated our I/O transfer and data management schemes between the BB and PFS. The I/O transfer management scheme improves the write and read I/O throughputs for the checkpoint and restart by up to 6- and 3-times, that of a DFS using the original kernel, respectively. Based on the data management scheme, we found that the BB utilization is improved by at least 2.2-fold, and a stabler and higher checkpoint performance is guaranteed. In addition, we achieved up to a 96.4\% hit ratio of the restart requests on the BB and up to a 3.1-times higher restart performance than that of other existing methods.๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์˜ ์ž…์ถœ๋ ฅ ๋Œ€์—ญํญ์˜ ๋Œ€๋ถ€๋ถ„์€ ๊ณ ์„ฑ๋Šฅ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ์ฒดํฌํฌ์ธํŠธ์™€ ์žฌ์‹œ์ž‘์ด ์ฐจ์ง€ํ•˜๊ณ  ์žˆ๋‹ค. ์ด๋Ÿฐ ๊ณ ์„ฑ๋Šฅ ์–ดํ”Œ๋ฆฌ์ผ€์ด์…˜์˜ ํญ๋ฐœ์ ์ธ ์ž…์ถœ๋ ฅ์„ ์›ํ™œํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•˜๊ฒŒ ์œ„ํ•˜์—ฌ, ๊ณ ๊ธ‰ ํ”Œ๋ž˜์‹œ ์ €์žฅ ์žฅ์น˜์™€ ์ €๊ธ‰ ํ”Œ๋ž˜์‹œ ์ €์žฅ ์žฅ์น˜๋ฅผ ์ด์šฉํ•˜์—ฌ ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ์™€ PFS๋ฅผ ํ•ฉ์นœ ์ƒˆ๋กœ์šด ํ”Œ๋ž˜์‹œ ๊ธฐ๋ฐ˜์˜ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์ด ์ œ์•ˆ๋˜์—ˆ๋‹ค. ํ•˜์ง€๋งŒ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์„ ๊ตฌ์„ฑํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์‚ฌ์šฉ๋˜๋Š” ๋Œ€๋ถ€๋ถ„์˜ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ๋“ค์€ ๋…ธ๋“œ๊ฐ„ ํ•˜๋‚˜์˜ ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ์„ ์ œ๊ณตํ•˜๊ณ  ์žˆ์–ด ์„œ๋ฒ„ ๋…ธ๋“œ์—์„œ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ๋Š” ๋†’์€ ํ”Œ๋ž˜์‹œ๋“ค์˜ ์ž…์ถœ๋ ฅ ๋Œ€์—ญํญ์„ ํ™œ์šฉํ•˜์ง€ ๋ชปํ•œ๋‹ค. ์—ฌ๋Ÿฌ๊ฐœ์˜ ๋„คํŠธ์›Œํฌ ์—ฐ๊ฒฐ์„ ์ œ๊ณตํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์ด ์ˆ˜์ •๋˜์–ด์•ผ ํ•˜๊ฑฐ๋‚˜, ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ํด๋ผ์ด์–ธํŠธ ๋ฐ๋ชฌ๊ณผ ์„œ๋ฒ„ ๋ฐ๋ชฌ์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์ด ์‚ฌ์šฉ๋˜์–ด์•ผ ํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ์€ ๋งค์šฐ ๋ณต์žกํ•œ ๊ตฌ์กฐ๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋งŽ์€ ์‹œ๊ฐ„๊ณผ ๋…ธ๋ ฅ์ด ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์ž๋“ค์—๊ฒŒ ์š”๊ตฌ๋œ๋‹ค. ๋ฐ๋ชฌ์˜ ๊ฐฏ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์€ ๊ฐ ๋„คํŠธ์›Œํฌ ์ปค๋„ฅ์…˜๋งˆ๋‹ค ์ƒˆ๋กœ์šด ๋งˆ์šดํŠธ ํฌ์ธํŠธ๊ฐ€ ์กด์žฌํ•˜๊ธฐ ๋•Œ๋ฌธ์—, ์ง์ ‘ ํŒŒ์ผ ์ž…์ถœ๋ ฅ ๋ฆฌํ€˜์ŠคํŠธ๋ฅผ ์—ฌ๋Ÿฌ ๋งˆ์šดํŠธ ํฌ์ธํŠธ๋กœ ๋ถ„์‚ฐ์‹œ์ผœ์•ผ ํ•˜๋Š” ์—„์ฒญ๋‚œ ๋…ธ๋ ฅ์ด ์‚ฌ์šฉ์ž์—๊ฒŒ ์š”๊ตฌ๋œ๋‹ค. ์„œ๋ฒ„ ๋ฐ๋ชฌ์˜ ๊ฐœ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œ์ผœ ๋„คํŠธ์›Œํฌ ์ปค๋„ฅ์…˜์˜ ์ˆ˜๋ฅผ ์ฆ๊ฐ€์‹œํ‚ฌ ๊ฒฝ์šฐ์—”, ์„œ๋ฒ„ ๋ฐ๋ชฌ์ด ์„œ๋กœ ๋‹ค๋ฅธ ํŒŒ์ผ ์‹œ์Šคํ…œ ๋””๋ ‰ํ† ๋ฆฌ ๊ด€์ ์„ ๊ฐ–๊ธฐ ๋•Œ๋ฌธ์— ์‚ฌ์šฉ์ž๊ฐ€ ์ง์ ‘ ์„œ๋กœ ๋‹ค๋ฅธ ์„œ๋ฒ„ ๋ฐ๋ชฌ์„ ์ธ์‹ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ์ถฉ๋Œ์ด ์ผ์–ด๋‚˜์ง€ ์•Š๋„๋ก ์ฃผ์˜ํ•ด์•ผ ํ•œ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€, ๊ธฐ์กด์—๋Š” ์‚ฌ์šฉ์ž๋“ค์ด ํ•˜๋“œ๋””์Šคํฌ์™€ ๊ฐ™์€ ์ €์† ์ €์žฅ ์žฅ์น˜๋กœ ๊ตฌ์„ฑ๋œ PFS๋กœ์˜ ์ ‘๊ทผ์„ ํ”ผํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ์˜ ํšจ์œจ์„ฑ์„ ํฌ๊ธฐํ•˜๋ฉด์„œ๋„ ์ „์šฉ ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ ํ• ๋‹น ๋ฐฉ์‹ (Dedicated BB allocation method)์„ ์„ ํ˜ธํ–ˆ๋‹ค. ํ•˜์ง€๋งŒ ์ƒˆ๋กœ์šด ํ”Œ๋ž˜์‹œ ๊ธฐ๋ฐ˜์˜ ๊ณ ์„ฑ๋Šฅ ์ปดํ“จํŒ… ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์—์„œ๋Š” ๋ณ‘๋ ฌ ํŒŒ์ผ ์‹œ์Šคํ…œ์œผ๋กœ์˜ ์ ‘๊ทผ์ด ๋น ๋ฅด๊ธฐ๋•Œ๋ฌธ์—, ํ•ด๋‹น ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ ํ• ๋‹น ๋ฐฉ์‹์„ ์‚ฌ์šฉํ•˜๋Š”๊ฒƒ์€ ์ ์ ˆ์น˜ ์•Š๋‹ค. ์ด๋Ÿฐ ๋ฌธ์ œ๋“ค์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ๋ณธ ๋…ผ๋ฌธ์—์„œ ์‚ฌ์šฉ์ž์—๊ฒŒ ๋‚ด๋ถ€ ์ฒ˜๋ฆฌ๊ณผ์ •์ด ๋…ธ์ถœ ๋˜์ง€์•Š๋Š” ์ƒˆ๋กœ์šด ํ”Œ๋ž˜์‹œ ๊ธฐ๋ฐ˜์˜ ๊ณ ์„ฑ๋Šฅ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์„ ์œ„ํ•œ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฒ•๋“ค์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ฒซ๋ฒˆ์งธ ๊ธฐ๋ฒ•์ธ ์ž…์ถœ๋ ฅ ์ „์†ก ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์€ ๋ถ„์‚ฐ ํŒŒ์ผ ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์ž์™€ ์‚ฌ์šฉ์ž๋“ค์˜ ์ถ”๊ฐ€์ ์ธ ๋…ธ๋ ฅ์—†์ด ์ปดํ“จํŠธ ๋…ธ๋“œ์™€ ์„œ๋ฒ„ ๋…ธ๋“œ ์‚ฌ์ด์— ์—ฌ๋Ÿฌ๊ฐœ์˜ ์ปค๋„ฅ์…˜์„ ์ œ๊ณตํ•œ๋‹ค. ์ด๋ฅผ ์œ„ํ•ด, ๊ฐ€์ƒ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ๋งˆ์šดํŠธ ์ˆ˜ํ–‰ ๊ณผ์ •๊ณผ ์ž…์ถœ๋ ฅ ์ฒ˜๋ฆฌ ๊ณผ์ •์„ ์ˆ˜์ •ํ•˜์˜€๋‹ค. ๋‘๋ฒˆ์งธ ๊ธฐ๋ฒ•์ธ ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์—์„œ๋Š” ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ์˜ ํ™œ์šฉ๋ฅ ์„ ํ–ฅ์ƒ ์‹œํ‚ค๊ธฐ ์œ„ํ•˜์—ฌ ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ ์ดˆ๊ณผ ํ• ๋‹น ๊ธฐ๋ฒ• (BB over-subscription method)์„ ์‚ฌ์šฉํ•œ๋‹ค. ํ•˜์ง€๋งŒ, ํ•ด๋‹น ํ• ๋‹น ๋ฐฉ์‹์€ ์‚ฌ์šฉ์ž ๊ฐ„์˜ ์ž…์ถœ๋ ฅ ๊ฒฝํ•ฉ๊ณผ ๋””๋ชจ์…˜ ์˜ค๋ฒ„ํ—ค๋“œ๋ฅผ ๋ฐœ์ƒํ•˜๊ธฐ๋•Œ๋ฌธ์— ๋‚ฎ์€ ์ฒดํฌํฌ์ธํŠธ์™€ ์žฌ์‹œ์ž‘ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค. ์ด๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ, ์ฒดํฌํฌ์ธํŠธ์™€ ์žฌ์‹œ์ž‘์˜ ํŠน์„ฑ์„ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ์™€ ๋ณ‘๋ ฌ ํŒŒ์ผ ์‹œ์Šคํ…œ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ด€๋ฆฌํ•œ๋‹ค. ๋ณธ ๋…ผ๋ฌธ์—์„œ๋Š” ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋“ค์˜ ํšจ๊ณผ๋ฅผ ์ฆ๋ช…ํ•˜๊ธฐ ์œ„ํ•˜์—ฌ ์‹ค์ œ ํ”Œ๋ž˜์‹œ ๊ธฐ๋ฐ˜์˜ ์Šคํ† ๋ฆฌ์ง€ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ•˜๊ณ  ์ œ์•ˆํ•œ ๋ฐฉ๋ฒ•๋“ค์„ ์ ์šฉํ•˜์—ฌ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ–ˆ๋‹ค. ์‹คํ—˜์„ ํ†ตํ•ด ์ž…์ถœ๋ ฅ ์ „์†ก ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์ด ๊ธฐ์กด ๊ธฐ๋ฒ•๋ณด๋‹ค ์ตœ๋Œ€ 6๋ฐฐ ๊ทธ๋ฆฌ๊ณ  ์ตœ๋Œ€ 2๋ฐฐ ๋†’์€ ์“ฐ๊ธฐ ๊ทธ๋ฆฌ๊ณ  ์ฝ๊ธฐ ์ž…์ถœ๋ ฅ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ–ˆ๋‹ค. ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๊ธฐ๋ฒ•์€ ๊ธฐ์กด ๋ฐฉ๋ฒ•์— ๋น„ํ•ด, ๋ฒ„์ŠคํŠธ ๋ฒ„ํผ ํ™œ์šฉ๋ฅ ์„ 2.2๋ฐฐ ํ–ฅ์ƒ ์‹œ์ผฐ๋‹ค. ๊ฒŒ๋‹ค๊ฐ€ ๋†’๊ณ  ์•ˆ์ •์ ์ธ ์ฒดํฌํฌ์ธํŠธ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋ฉฐ ์ตœ๋Œ€ 3.1๋ฐฐ ๋†’์€ ์žฌ์‹œ์ž‘ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ–ˆ๋‹ค.Chapter 1 Introduction 1 Chapter 2 Background 11 2.1 Burst Buffer 11 2.2 Virtual File System 13 2.3 Network Bandwidth 14 2.4 Mean Time Between Failures 16 2.5 Checkpoint/Restart Characteristics 17 Chapter 3 Motivation 19 3.1 I/O Transfer Management for HPC Storage Systems 19 3.1.1 Problems of Existing HPC Storage Systems 19 3.1.2 Limitations of Existing Approaches 23 3.2 Data Management for HPC Storage Systems 26 3.2.1 Problems of Existing HPC Storage Systems 26 3.2.2 Limitations with Existing Approaches 27 Chapter 4 Mulconn: User-Transparent I/O Transfer Management for HPC Storage Systems 31 4.1 Design and Architecture 31 4.1.1 Overview 31 4.1.2 Scale Up Connections 34 4.1.3 I/O Scheduling 36 4.1.4 Automatic Policy Decision 38 4.2 Implementation 41 4.2.1 File Open and Close 41 4.2.2 File Write and Read 45 4.3 Evaluation. 46 4.3.1 Experimental Environment 46 4.3.2 I/O Throughputs Improvement 46 4.3.3 Comparison between TtoS and TtoM 59 4.3.4 Effectiveness of Our System 60 4.4 Summary 63 Chapter 5 BBOS: User-Transparent Data Management for HPC Storage Systems 64 5.1 Design and Architecture 64 5.1.1 Overview 64 5.1.2 DataManagementEngine 66 5.2 Implementation 72 5.2.1 In-memory Key-value Store 72 5.2.2 I/O Engine 72 5.2.3 Data Management Engine 75 5.2.4 Stable Checkpoint and Demotion Performance 77 5.3 Evaluation 78 5.3.1 Experimental Environment 78 5.3.2 Burst Buffer Utilization 81 5.3.3 Checkpoint Performance 82 5.3.4 Restart Performance 86 5.4 Summary 90 Chapter 6 Related Work 91 Chapter 7 Conclusion 94 ์š”์•ฝ 105 ๊ฐ์‚ฌ์˜ ๊ธ€ 107Docto
    corecore