2,604 research outputs found

    AliEnFS - a Linux File System for the AliEn Grid Services

    Full text link
    Among the services offered by the AliEn (ALICE Environment http://alien.cern.ch) Grid framework there is a virtual file catalogue to allow transparent access to distributed data-sets using various file transfer protocols. alienfsalienfs (AliEn File System) integrates the AliEn file catalogue as a new file system type into the Linux kernel using LUFS, a hybrid user space file system framework (Open Source http://lufs.sourceforge.net). LUFS uses a special kernel interface level called VFS (Virtual File System Switch) to communicate via a generalised file system interface to the AliEn file system daemon. The AliEn framework is used for authentication, catalogue browsing, file registration and read/write transfer operations. A C++ API implements the generic file system operations. The goal of AliEnFS is to allow users easy interactive access to a worldwide distributed virtual file system using familiar shell commands (f.e. cp,ls,rm ...) The paper discusses general aspects of Grid File Systems, the AliEn implementation and present and future developments for the AliEn Grid File System.Comment: 9 pages, 12 figure

    데이터 집약적 μ‘μš©μ˜ 효율적인 μ‹œμŠ€ν…œ μžμ› ν™œμš©μ„ μœ„ν•œ λ©”λͺ¨λ¦¬ μ„œλΈŒμ‹œμŠ€ν…œ μ΅œμ ν™”

    Get PDF
    ν•™μœ„λ…Όλ¬Έ (박사) -- μ„œμšΈλŒ€ν•™κ΅ λŒ€ν•™μ› : κ³΅κ³ΌλŒ€ν•™ 전기·컴퓨터곡학뢀, 2020. 8. μ—Όν—Œμ˜.With explosive data growth, data-intensive applications, such as relational database and key-value storage, have been increasingly popular in a variety of domains in recent years. To meet the growing performance demands of data-intensive applications, it is crucial to efficiently and fully utilize memory resources for the best possible performance. However, general-purpose operating systems (OSs) are designed to provide system resources to applications running on a system in a fair manner at system-level. A single application may find it difficult to fully exploit the systems best performance due to this system-level fairness. For performance reasons, many data-intensive applications implement their own mechanisms that OSs already provide, under the assumption that they know better about the data than OSs. They can be greedily optimized for performance but this may result in inefficient use of system resources. In this dissertation, we claim that simple OS support with minor application modifications can yield even higher application performance without sacrificing system-level resource utilization. We optimize and extend OS memory subsystem for better supporting applications while addressing three memory-related issues in data-intensive applications. First, we introduce a memory-efficient cooperative caching approach between application and kernel buffer to address double caching problem where the same data resides in multiple layers. Second, we present a memory-efficient, transparent zero-copy read I/O scheme to avoid the performance interference problem caused by memory copy behavior during I/O. Third, we propose a memory-efficient fork-based checkpointing mechanism for in-memory database systems to mitigate the memory footprint problem of the existing fork-based checkpointing scheme; memory usage increases incrementally (up to 2x) during checkpointing for update-intensive workloads. To show the effectiveness of our approach, we implement and evaluate our schemes on real multi-core systems. The experimental results demonstrate that our cooperative approach can more effectively address the above issues related to data-intensive applications than existing non-cooperative approaches while delivering better performance (in terms of transaction processing speed, I/O throughput, or memory footprint).졜근 폭발적인 데이터 μ„±μž₯κ³Ό λ”λΆˆμ–΄ λ°μ΄ν„°λ² μ΄μŠ€, ν‚€-λ°Έλ₯˜ μŠ€ν† λ¦¬μ§€ λ“±μ˜ 데이터 집약적인 μ‘μš©λ“€μ΄ λ‹€μ–‘ν•œ λ„λ©”μΈμ—μ„œ 인기λ₯Ό μ–»κ³  μžˆλ‹€. 데이터 집약적인 μ‘μš©μ˜ 높은 μ„±λŠ₯ μš”κ΅¬λ₯Ό μΆ©μ‘±ν•˜κΈ° μœ„ν•΄μ„œλŠ” 주어진 λ©”λͺ¨λ¦¬ μžμ›μ„ 효율적이고 μ™„λ²½ν•˜κ²Œ ν™œμš©ν•˜λŠ” 것이 μ€‘μš”ν•˜λ‹€. κ·ΈλŸ¬λ‚˜, λ²”μš© 운영체제(OS)λŠ” μ‹œμŠ€ν…œμ—μ„œ μˆ˜ν–‰ 쀑인 λͺ¨λ“  μ‘μš©λ“€μ— λŒ€ν•΄ μ‹œμŠ€ν…œ μ°¨μ›μ—μ„œ κ³΅ν‰ν•˜κ²Œ μžμ›μ„ μ œκ³΅ν•˜λŠ” 것을 μš°μ„ ν•˜λ„λ‘ μ„€κ³„λ˜μ–΄μžˆλ‹€. 즉, μ‹œμŠ€ν…œ μ°¨μ›μ˜ 곡평성 μœ μ§€λ₯Ό μœ„ν•œ 운영체제 μ§€μ›μ˜ ν•œκ³„λ‘œ 인해 단일 μ‘μš©μ€ μ‹œμŠ€ν…œμ˜ 졜고 μ„±λŠ₯을 μ™„μ „νžˆ ν™œμš©ν•˜κΈ° μ–΄λ ΅λ‹€. μ΄λŸ¬ν•œ 이유둜, λ§Žμ€ 데이터 집약적 μ‘μš©μ€ μš΄μ˜μ²΄μ œμ—μ„œ μ œκ³΅ν•˜λŠ” κΈ°λŠ₯에 μ˜μ§€ν•˜μ§€ μ•Šκ³  λΉ„μŠ·ν•œ κΈ°λŠ₯을 μ‘μš© λ ˆλ²¨μ— κ΅¬ν˜„ν•˜κ³€ ν•œλ‹€. μ΄λŸ¬ν•œ μ ‘κ·Ό 방법은 νƒμš•μ μΈ μ΅œμ ν™”κ°€ κ°€λŠ₯ν•˜λ‹€λŠ” μ μ—μ„œ μ„±λŠ₯ 상 이득이 μžˆμ„ 수 μžˆμ§€λ§Œ, μ‹œμŠ€ν…œ μžμ›μ˜ λΉ„νš¨μœ¨μ μΈ μ‚¬μš©μ„ μ΄ˆλž˜ν•  수 μžˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” 운영체제의 지원과 μ•½κ°„μ˜ μ‘μš© μˆ˜μ •λ§ŒμœΌλ‘œλ„ λΉ„νš¨μœ¨μ μΈ μ‹œμŠ€ν…œ μžμ› μ‚¬μš© 없이 보닀 높은 μ‘μš© μ„±λŠ₯을 보일 수 μžˆμŒμ„ 증λͺ…ν•˜κ³ μž ν•œλ‹€. 그러기 μœ„ν•΄ 운영체제의 λ©”λͺ¨λ¦¬ μ„œλΈŒμ‹œμŠ€ν…œμ„ μ΅œμ ν™” 및 ν™•μž₯ν•˜μ—¬ 데이터 집약적인 μ‘μš©μ—μ„œ λ°œμƒν•˜λŠ” μ„Έ 가지 λ©”λͺ¨λ¦¬ κ΄€λ ¨ 문제λ₯Ό ν•΄κ²°ν•˜μ˜€λ‹€. 첫째, λ™μΌν•œ 데이터가 μ—¬λŸ¬ 계측에 μ‘΄μž¬ν•˜λŠ” 쀑볡 캐싱 문제λ₯Ό ν•΄κ²°ν•˜κΈ° μœ„ν•΄ μ‘μš©κ³Ό 컀널 버퍼 간에 λ©”λͺ¨λ¦¬ 효율적인 ν˜‘λ ₯ 캐싱 방식을 μ œμ‹œν•˜μ˜€λ‹€. λ‘˜μ§Έ, μž…μΆœλ ₯μ‹œ λ°œμƒν•˜λŠ” λ©”λͺ¨λ¦¬ λ³΅μ‚¬λ‘œ μΈν•œ μ„±λŠ₯ κ°„μ„­ 문제λ₯Ό ν”Όν•˜κΈ° μœ„ν•΄ λ©”λͺ¨λ¦¬ 효율적인 무볡사 읽기 μž…μΆœλ ₯ 방식을 μ œμ‹œν•˜μ˜€λ‹€. μ…‹μ§Έ, 인-λ©”λͺ¨λ¦¬ λ°μ΄ν„°λ² μ΄μŠ€ μ‹œμŠ€ν…œμ„ μœ„ν•œ λ©”λͺ¨λ¦¬ 효율적인 fork 기반 체크포인트 기법을 μ œμ•ˆν•˜μ—¬ κΈ°μ‘΄ 포크 기반 체크포인트 κΈ°λ²•μ—μ„œ λ°œμƒν•˜λŠ” λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰ 증가 문제λ₯Ό μ™„ν™”ν•˜μ˜€λ‹€; κΈ°μ‘΄ 방식은 μ—…λ°μ΄νŠΈ 집약적 μ›Œν¬λ‘œλ“œμ— λŒ€ν•΄ μ²΄ν¬ν¬μΈνŒ…μ„ μˆ˜ν–‰ν•˜λŠ” λ™μ•ˆ λ©”λͺ¨λ¦¬ μ‚¬μš©λŸ‰μ΄ μ΅œλŒ€ 2λ°°κΉŒμ§€ μ μ§„μ μœΌλ‘œ 증가할 수 μžˆμ—ˆλ‹€. λ³Έ λ…Όλ¬Έμ—μ„œλŠ” μ œμ•ˆν•œ λ°©λ²•λ“€μ˜ 효과λ₯Ό 증λͺ…ν•˜κΈ° μœ„ν•΄ μ‹€μ œ λ©€ν‹° μ½”μ–΄ μ‹œμŠ€ν…œμ— κ΅¬ν˜„ν•˜κ³  κ·Έ μ„±λŠ₯을 ν‰κ°€ν•˜μ˜€λ‹€. μ‹€ν—˜κ²°κ³Όλ₯Ό 톡해 μ œμ•ˆν•œ ν˜‘λ ₯적 접근방식이 기쑴의 λΉ„ν˜‘λ ₯적 접근방식보닀 데이터 집약적 μ‘μš©μ—κ²Œ 효율적인 λ©”λͺ¨λ¦¬ μžμ› ν™œμš©μ„ κ°€λŠ₯ν•˜κ²Œ ν•¨μœΌλ‘œμ¨ 더 높은 μ„±λŠ₯을 μ œκ³΅ν•  수 μžˆμŒμ„ 확인할 수 μžˆμ—ˆλ‹€.Chapter 1 Introduction 1 1.1 Motivation 1 1.1.1 Importance of Memory Resources 1 1.1.2 Problems 2 1.2 Contributions 5 1.3 Outline 6 Chapter 2 Background 7 2.1 Linux Kernel Memory Management 7 2.1.1 Page Cache 7 2.1.2 Page Reclamation 8 2.1.3 Page Table and TLB Shootdown 9 2.1.4 Copy-on-Write 10 2.2 Linux Support for Applications 11 2.2.1 fork 11 2.2.2 madvise 11 2.2.3 Direct I/O 12 2.2.4 mmap 13 Chapter 3 Memory Efficient Cooperative Caching 14 3.1 Motivation 14 3.1.1 Problems of Existing Datastore Architecture 14 3.1.2 Proposed Architecture 17 3.2 Related Work 17 3.3 Design and Implementation 19 3.3.1 Overview 19 3.3.2 Kernel Support 24 3.3.3 Migration to DBIO 25 3.4 Evaluation 27 3.4.1 System Configuration 27 3.4.2 Methodology 28 3.4.3 TPC-C Benchmarks 30 3.4.4 YCSB Benchmarks 32 3.5 Summary 37 Chapter 4 Memory Efficient Zero-copy I/O 38 4.1 Motivation 38 4.1.1 The Problems of Copy-Based I/O 38 4.2 Related Work 40 4.2.1 Zero Copy I/O 40 4.2.2 TLB Shootdown 42 4.2.3 Copy-on-Write 43 4.3 Design and Implementation 44 4.3.1 Prerequisites for z-READ 44 4.3.2 Overview of z-READ 45 4.3.3 TLB Shootdown Optimization 48 4.3.4 Copy-on-Write Optimization 52 4.3.5 Implementation 55 4.4 Evaluation 55 4.4.1 System Configurations 56 4.4.2 Effectiveness of the TLB Shootdown Optimization 57 4.4.3 Effectiveness of CoW Optimization 59 4.4.4 Analysis of the Performance Improvement 62 4.4.5 Performance Interference Intensity 63 4.4.6 Effectiveness of z-READ in Macrobenchmarks 65 4.5 Summary 67 Chapter 5 Memory Efficient Fork-based Checkpointing 69 5.1 Motivation 69 5.1.1 Fork-based Checkpointing 69 5.1.2 Approach 71 5.2 Related Work 73 5.3 Design and Implementation 74 5.3.1 Overview 74 5.3.2 OS Support 78 5.3.3 Implementation 79 5.4 Evaluation 80 5.4.1 Experimental Setup 80 5.4.2 Performance 81 5.5 Summary 86 Chapter 6 Conclusion 87 μš”μ•½ 100Docto

    Using khazana to support distributed application development

    Get PDF
    technical reportOne of the most important services required by most distributed applications is some form of shared data management, e.g., a directory service manages shared directory entries while groupware manages shared documents. Each such application currently must implement its own data management mechanisms, because existing runtime systems are not flexible enough to support all distributed applications efficiently. For example, groupware can be efficiently supported by a distributed object system, while a distributed database would prefer a more low-level storage abstraction. The goal of Khazana is to provide programmer's with configurable components that support the data management services required by a wide variety of distributed applications, including: consistent caching, automated replication and migration of data, persistence, access control, and fault tolerance. It does so via a carefully designed set of interfaces that supports a hierarchy of data abstractions, ranging from flat data to C++/Java objects, and that give programmers a great of control over how their data is managed. To demonstrate the effectiveness of our design, we report on our experience porting three applications to Khazana: a distributed file system, a distributed directory service, and a shared whiteboard

    Interest-Based Access Control for Content Centric Networks (extended version)

    Full text link
    Content-Centric Networking (CCN) is an emerging network architecture designed to overcome limitations of the current IP-based Internet. One of the fundamental tenets of CCN is that data, or content, is a named and addressable entity in the network. Consumers request content by issuing interest messages with the desired content name. These interests are forwarded by routers to producers, and the resulting content object is returned and optionally cached at each router along the path. In-network caching makes it difficult to enforce access control policies on sensitive content outside of the producer since routers only use interest information for forwarding decisions. To that end, we propose an Interest-Based Access Control (IBAC) scheme that enables access control enforcement using only information contained in interest messages, i.e., by making sensitive content names unpredictable to unauthorized parties. Our IBAC scheme supports both hash- and encryption-based name obfuscation. We address the problem of interest replay attacks by formulating a mutual trust framework between producers and consumers that enables routers to perform authorization checks when satisfying interests from their cache. We assess the computational, storage, and bandwidth overhead of each IBAC variant. Our design is flexible and allows producers to arbitrarily specify and enforce any type of access control on content, without having to deal with the problems of content encryption and key distribution. This is the first comprehensive design for CCN access control using only information contained in interest messages.Comment: 11 pages, 2 figure

    Dynamic Virtual Page-based Flash Translation Layer with Novel Hot Data Identification and Adaptive Parallelism Management

    Get PDF
    Solid-state disks (SSDs) tend to replace traditional motor-driven hard disks in high-end storage devices in past few decades. However, various inherent features, such as out-of-place update [resorting to garbage collection (GC)] and limited endurance (resorting to wear leveling), need to be reduced to a large extent before that day comes. Both the GC and wear leveling fundamentally depend on hot data identification (HDI). In this paper, we propose a hot data-aware flash translation layer architecture based on a dynamic virtual page (DVPFTL) so as to improve the performance and lifetime of NAND flash devices. First, we develop a generalized dual layer HDI (DL-HDI) framework, which is composed of a cold data pre-classifier and a hot data post-identifier. Those can efficiently follow the frequency and recency of information access. Then, we design an adaptive parallelism manager (APM) to assign the clustered data chunks to distinct resident blocks in the SSD so as to prolong its endurance. Finally, the experimental results from our realized SSD prototype indicate that the DVPFTL scheme has reliably improved the parallelizability and endurance of NAND flash devices with improved GC-costs, compared with related works.Peer reviewe
    • …
    corecore