826 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Ontwerp en evaluatie van content distributie netwerken voor multimediale streaming diensten.

    Get PDF
    Traditionele Internetgebaseerde diensten voor het verspreiden van bestanden, zoals Web browsen en het versturen van e-mails, worden aangeboden via één centrale server. Meer recente netwerkdiensten zoals interactieve digitale televisie of video-op-aanvraag vereisen echter hoge kwaliteitsgaranties (QoS), zoals een lage en constante netwerkvertraging, en verbruiken een aanzienlijke hoeveelheid bandbreedte op het netwerk. Architecturen met één centrale server kunnen deze garanties moeilijk bieden en voldoen daarom niet meer aan de hoge eisen van de volgende generatie multimediatoepassingen. In dit onderzoek worden daarom nieuwe netwerkarchitecturen bestudeerd, die een dergelijke dienstkwaliteit kunnen ondersteunen. Zowel peer-to-peer mechanismes, zoals bij het uitwisselen van muziekbestanden tussen eindgebruikers, als servergebaseerde oplossingen, zoals gedistribueerde caches en content distributie netwerken (CDN's), komen aan bod. Afhankelijk van de bestudeerde dienst en de gebruikte netwerktechnologieën en -architectuur, worden gecentraliseerde algoritmen voor netwerkontwerp voorgesteld. Deze algoritmen optimaliseren de plaatsing van de servers of netwerkcaches en bepalen de nodige capaciteit van de servers en netwerklinks. De dynamische plaatsing van de aangeboden bestanden in de verschillende netwerkelementen wordt aangepast aan de heersende staat van het netwerk en aan de variërende aanvraagpatronen van de eindgebruikers. Serverselectie, herroutering van aanvragen en het verspreiden van de belasting over het hele netwerk komen hierbij ook aan bod

    Distributed Redirection for the Globule Platform

    Get PDF

    dCache, scalable managed storage

    Get PDF
    End of 2007, the most challenging high energy physics experiment ever, the Large Hadron Collider(LHC)[9], at CERN, will start to produce a sustained stream of data in the order of 300MB/sec, equivalent to a stack of CDs as high as the Eiffel Tower once per week. This data is, while produced, distributed and persistently stored at several dozens of sites around the world, building the LHC data grid. The destination sites are expected to provide the necessary middle-ware, so called Storage Elements, offering standard protocols to receive the data and optionally store it at the site specific Tertiary Storage Systems. Beside its actual functionality, discussed subsequently, the Storage Element software has to be able to fit into a large variety of environments. They are known to range from sites providing a single storage box of some Tera Bytes of data and nearly no maintenance personnel up to Tier I sites with estimated disk storage capacities reaching into the Peta Byte area. Moreover, sites expected to store data permanently may want to use their already existing Hierarchical Storage Management (HSM) System to drive the robotics. This requires the Storage Element to be aware of HSM Systems and to be able to manage external file copies. The wide range of scalability, from the very small to the limits of affordable storage, is one of the primary goals of dCache, the Storage Element introduced in this presentation. By being strictly compliant to standard data transfer and control protocols, like gsiFtp, xRootd and the Storage Resource Manager protocol SRM, we are focusing on our second goal which is to make dCache available and useful beyond the borders of the High Energy Physics Community. Beside storing and preparing data for transfer, dCache provides a rich palette of functions to manage the available storage, as will be described subsequently. This includes replication of datasets on automated detection of busy storage components as well as optimization of access to tertiary storage systems

    Naming, Migration, and Replication for NFSv4

    Full text link
    In this paper, we discuss a global name space for NFSv4 and mechanisms for transparent migration and replication. By convention, any file or directory name beginning with /nfs on an NFS client is part of this shared global name space. Our system supports file system migration and replication through DNS resolution, provides directory migration and replication using built-in NFSv4 mechanisms, and supports read/write replication with precise consistency guarantees, small performance penalty, and good scaling. We implement these features with small extensions to the published NFSv4 protocol, and demonstrate a practical way to enhance network transparency and administerability of NFSv4 in wide area networks.http://deepblue.lib.umich.edu/bitstream/2027.42/107939/1/citi-tr-06-1.pd

    Partial Replica Location And Selection For Spatial Datasets

    Get PDF
    As the size of scientific datasets continues to grow, we will not be able to store enormous datasets on a single grid node, but must distribute them across many grid nodes. The implementation of partial or incomplete replicas, which represent only a subset of a larger dataset, has been an active topic of research. Partial Spatial Replicas extend this functionality to spatial data, allowing us to distribute a spatial dataset in pieces over several locations. We investigate solutions to the partial spatial replica selection problems. First, we describe and develop two designs for an Spatial Replica Location Service (SRLS), which must return the set of replicas that intersect with a query region. Integrating a relational database, a spatial data structure and grid computing software, we build a scalable solution that works well even for several million replicas. In our SRLS, we have improved performance by designing a R-tree structure in the backend database, and by aggregating several queries into one larger query, which reduces overhead. We also use the Morton Space-filling Curve during R-tree construction, which improves spatial locality. In addition, we describe R-tree Prefetching(RTP), which effectively utilizes the modern multi-processor architecture. Second, we present and implement a fast replica selection algorithm in which a set of partial replicas is chosen from a set of candidates so that retrieval performance is maximized. Using an R-tree based heuristic algorithm, we achieve O(n log n) complexity for this NP-complete problem. We describe a model for disk access performance that takes filesystem prefetching into account and is sufficiently accurate for spatial replica selection. Making a few simplifying assumptions, we present a fast replica selection algorithm for partial spatial replicas. The algorithm uses a greedy approach that attempts to maximize performance by choosing a collection of replica subsets that allow fast data retrieval by a client machine. Experiments show that the performance of the solution found by our algorithm is on average always at least 91% and 93.4% of the performance of the optimal solution in 4-node and 8-node tests respectively

    Handling Confidential Data on the Untrusted Cloud: An Agent-based Approach

    Get PDF
    Cloud computing allows shared computer and storage facilities to be used by a multitude of clients. While cloud management is centralized, the information resides in the cloud and information sharing can be implemented via off-the-shelf techniques for multiuser databases. Users, however, are very diffident for not having full control over their sensitive data. Untrusted database-as-a-server techniques are neither readily extendable to the cloud environment nor easily understandable by non-technical users. To solve this problem, we present an approach where agents share reserved data in a secure manner by the use of simple grant-and-revoke permissions on shared data.Comment: 7 pages, 9 figures, Cloud Computing 201
    corecore