10 research outputs found

    Recovery of a Digital Image Collection Through the SDSC/UMD/NARA Prototype Persistent Archive

    Get PDF
    The San Diego Supercomputer Center (SDSC), the University of Maryland, and the National Archives and Records Administration (NARA) are collaborating on building a pilot persistent archive using and extending data grid and digital library technologies. The current prototype consists of node servers at SDSC, University of Maryland, and NARA, connected through the Storage Request Broker (SRB) data grid middleware, and currently holds several terabytes of NARA selected collections. In particular, a historically important image collection that was on the verge of becoming inaccessible was fully restored and ingested into our pilot system. In this report, we describe the methodology behind our approach to fully restore this image collection and the process used to ingest it into the prototype persistent archive. (UMIACS-TR-2003-105

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    A Multiple Case Study Analysis of Digital Preservation Techniques across Government, Private, and Public Service Organizations

    Get PDF
    The process of record keeping has evolved through time. As our technology advances, so does our ability to manage information. We have progressed from paper-based records to new digital techniques and formats to store records. However, digital storage is not the Holy Grail answer to preservation and storage problems. Digital storage is confounded by multiple problems, also. Some of these problems are, but not limited to, lack of standardization and legal guidance, proprietary formats, and the fragility of the digital medium. This research examines several organizations that are deeply involved in digital preservation and tries to identify common practices and problems across the industry

    Development of a grid service for multi-objective design optimisation

    Get PDF
    The emerging grid technology is receiving great attention from researchers and applications that need computational and data capabilities to enhance performance and efficiency. Multi-Objective Design Optimisation (MODO) is computationally and data challenging. The challenges become even more with the emergence of evolutionary computing (EC) techniques which produce multiple solutions in a single simulation run. Other challenges are the complexity in mathematical models and multidisciplinary involvement of experts, thus making MODO collaborative and interactive in nature. These challenges call for a problem solving environment (P SE) that can provide computational and optimisation resources to MODO experts as services. Current PSEs provide only the technical specifications of the services which is used by programmers and do not have service specifications for designers that use the system to support design optimisation as services. There is need for PSEs to have service specification document that describes how the services are provided to the end users. Additionally, providing MODO resources as services enabled designers to share resources that they do not have through service subscription. The aim of this research is to develop specifications and architecture of a grid service for MODO. The specifications provide the service use cases that are used to build MODO services. A service specification document is proposed and this enables service providers to follow a process for providing services to end users. In this research, literature was reviewed and industry survey conducted. This was followed by the design, development, case study and validation. The research studied related PSEs in literature and industry to come up with a service specification document that captures the process for grid service definition. This specification was used to develop a framework for MODO applications. An architecture based on this framework was proposed and implemented as DECGrid (Decision Engineering Centre Grid) prototype. Three real-life case studies were used to validate the prototype. The results obtained compared favourably with the results in literature. Different scenarios for using the services among distributed design experts demonstrated the computational synergy and efficiency in collaboration. The mathematical model building service and optimisation service enabled designers to collaboratively build models using the collaboration service. This helps designers without optimisation knowledge to perform optimisation. The key contributions in this research are the service specifications that support MODO, the framework developed which provides the process for definining the services and the architecture used to implement the framework. The key limitations of the research are the use of only engineering design optimisation case studies and the prototype is not tested in industry.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    A grid and cloud-based framework for high throughput bioinformatics

    Get PDF
    Recent advances in genome sequencing technologies have unleashed a flood of new data. As a result, the computational analysis of bioinformatics data sets has been rapidly moving from a labbased desktop computer environment to exhaustive analyses performed by large dedicated computing resources. Traditionally, large computational problems have been performed on dedicated clusters of high performance machines that are typically local to, and owned by, a particular institution. The current trend in Grid computing has seen institutions pooling their computational resources in order to offload excess computational work to remote locations during busy periods. In the last year or so, commercial Cloud computing initiatives have matured enough to offer a viable remote source of reliable computational power. Collections of idle desktop computers have also been used as a source of computational power in the form of ‘volunteer Grids’. The field of bioinformatics is highly dynamic, with new or updated versions of software tools and databases continually being developed. Several different tools and datasets must often be combined into a coherent, automated workflow or pipeline. While existing solutions are available for constructing workflows, there is a clear need for long-lived analyses consisting of many interconnected steps to be able to migrate among Grid and cloud computational resources dynamically. This project involved research into the principles underlying the design and architecture of flexible, high-throughput bioinformatics processes. Following extensive research into requirements gathering, a novel Grid-based platform, Microbase, has been implemented that is based on service-oriented architectures and peer-to-peer data transfer technology. This platform has been shown to be amenable to utilising a wide range of hardware from commodity desktop computers, to high-performance cloud infrastructure. The system has been shown to drastically reduce the bandwidth requirements of bioinformatics data distribution, and therefore reduces both the financial and computational costs associated with cloud computing. The system is inherently modular in nature, comprising a service based notification system, a data storage system scheduler and a job manager. In keeping with e-Science principles, each module can operate in physical isolation from each other, distributed within an intranet or Internet. Moreover, since each module is loosely coupled via Web services, modules have the potential to be used in combination with external service oriented components or in isolation as part of another system. In order to demonstrate the utility of such an open source system to the bioinformatics community, a pipeline of inter-connected bioinformatics applications was developed using the Microbase system to form a high throughput application for the comparative and visual analysis of microbial genomes. This application, Automated Genome Analyser (AGA) has been developed to operate without user interaction. AGA exposes its results via Web-services which can be used by further analytical stages within Microbase, by external computational resources via a Web service interface or which can be queried by users via an interactive genome browser. In addition to providing the necessary infrastructure for scalable Grid applications, a modular development framework has been provided, which simplifies the process of writing Grid applications. Microbase has been adopted by a number of projects ranging from comparative genomics to synthetic biology simulations.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Applications Development for the Computational Grid

    Get PDF
    corecore