1,425 research outputs found

    Robo-line storage: Low latency, high capacity storage systems over geographically distributed networks

    Get PDF
    Rapid advances in high performance computing are making possible more complete and accurate computer-based modeling of complex physical phenomena, such as weather front interactions, dynamics of chemical reactions, numerical aerodynamic analysis of airframes, and ocean-land-atmosphere interactions. Many of these 'grand challenge' applications are as demanding of the underlying storage system, in terms of their capacity and bandwidth requirements, as they are on the computational power of the processor. A global view of the Earth's ocean chlorophyll and land vegetation requires over 2 terabytes of raw satellite image data. In this paper, we describe our planned research program in high capacity, high bandwidth storage systems. The project has four overall goals. First, we will examine new methods for high capacity storage systems, made possible by low cost, small form factor magnetic and optical tape systems. Second, access to the storage system will be low latency and high bandwidth. To achieve this, we must interleave data transfer at all levels of the storage system, including devices, controllers, servers, and communications links. Latency will be reduced by extensive caching throughout the storage hierarchy. Third, we will provide effective management of a storage hierarchy, extending the techniques already developed for the Log Structured File System. Finally, we will construct a protototype high capacity file server, suitable for use on the National Research and Education Network (NREN). Such research must be a Cornerstone of any coherent program in high performance computing and communications

    Multi-Terabyte EIDE Disk Arrays running Linux RAID5

    Full text link
    High-energy physics experiments are currently recording large amounts of data and in a few years will be recording prodigious quantities of data. New methods must be developed to handle this data and make analysis at universities possible. Grid Computing is one method; however, the data must be cached at the various Grid nodes. We examine some storage techniques that exploit recent developments in commodity hardware. Disk arrays using RAID level 5 (RAID-5) include both parity and striping. The striping improves access speed. The parity protects data in the event of a single disk failure, but not in the case of multiple disk failures. We report on tests of dual-processor Linux Software RAID-5 arrays and Hardware RAID-5 arrays using a 12-disk 3ware controller, in conjunction with 250 and 300 GB disks, for use in offline high-energy physics data analysis. The price of IDE disks is now less than $1/GB. These RAID-5 disk arrays can be scaled to sizes affordable to small institutions and used when fast random access at low cost is important.Comment: Talk from the 2004 Computing in High Energy and Nuclear Physics (CHEP04), Interlaken, Switzerland, 27th September - 1st October 2004, 4 pages, LaTeX, uses CHEP2004.cls. ID 47, Poster Session 2, Track

    Redundant Arrays of IDE Drives

    Get PDF
    The next generation of high-energy physics experiments is expected to gather prodigious amounts of data. New methods must be developed to handle this data and make analysis at universities possible. We examine some techniques that use recent developments in commodity hardware. We test redundant arrays of integrated drive electronics (IDE) disk drives for use in offline high-energy physics data analysis. IDE redundant array of inexpensive disks (RAID) prices now equal the cost per terabyte of million-dollar tape robots! The arrays can be scaled to sizes affordable to institutions without robots and used when fast random access at low cost is important. We also explore three methods of moving data between sites; internet transfers, hot pluggable IDE disks in FireWire cases, and writable digital video disks (DVD-R).Comment: Submitted to IEEE Transactions On Nuclear Science, for the 2001 IEEE Nuclear Science Symposium and Medical Imaging Conference, 8 pages, 1 figure, uses IEEEtran.cls. Revised March 19, 2002 and published August 200

    Does science need computer science?

    No full text
    IBM Hursley Talks Series 3An afternoon of talks, to be held on Wednesday March 10 from 2:30pm in Bldg 35 Lecture Room A, arranged by the School of Chemistry in conjunction with IBM Hursley and the Combechem e-Science Project.The talks are aimed at science students (undergraduate and post-graduate) from across the faculty. This is the third series of talks we have organized, but the first time we have put them together in an afternoon. The talks are general in nature and knowledge of computer science is certainly not necessary. After the talks there will be an opportunity for a discussion with the lecturers from IBM.Does Science Need Computer Science?Chair and Moderator - Jeremy Frey, School of Chemistry.- 14:00 "Computer games for fun and profit" (*) - Andrew Reynolds - 14:45 "Anyone for tennis? The science behind WIBMledon" (*) - Matt Roberts - 15:30 Tea (Chemistry Foyer, Bldg 29 opposite bldg 35) - 15:45 "Disk Drive physics from grandmothers to gigabytes" (*) - Steve Legg - 16:35 "What could happen to your data?" (*) - Nick Jones - 17:20 Panel Session, comprising the four IBM speakers and May Glover-Gunn (IBM) - 18:00 Receptio

    Introduction to Multiprocessor I/O Architecture

    Get PDF
    The computational performance of multiprocessors continues to improve by leaps and bounds, fueled in part by rapid improvements in processor and interconnection technology. I/O performance thus becomes ever more critical, to avoid becoming the bottleneck of system performance. In this paper we provide an introduction to I/O architectural issues in multiprocessors, with a focus on disk subsystems. While we discuss examples from actual architectures and provide pointers to interesting research in the literature, we do not attempt to provide a comprehensive survey. We concentrate on a study of the architectural design issues, and the effects of different design alternatives

    Storage media pipelining: Making good use of fine-grained media

    Get PDF
    This paper proposes a new high-performance paradigm for accessing removable media such as tapes and especially magneto-optical disks. In high-performance computing the striping of data across multiple devices is a common means of improving data transfer rates. Striping has been used very successfully for fixed magnetic disks improving overall system reliability as well as throughput. It has also been proposed as a solution for providing improved bandwidth for tape and magneto-optical subsystems. However, striping of removable media has shortcomings, particularly in the areas of latency to data and restricted system configurations, and is suitable primarily for very large I/Os. We propose that for fine-grained media, an alternative access method, media pipelining, may be used to provide high bandwidth for large requests while retaining the flexibility to support concurrent small requests and different system configurations. Its principal drawback is high buffering requirements in the host computer or file server. This paper discusses the possible organization of such a system including the hardware conditions under which it may be effective, and the flexibility of configuration. Its expected performance is discussed under varying workloads including large single I/O's and numerous smaller ones. Finally, a specific system incorporating a high-transfer-rate magneto-optical disk drive and autochanger is discussed

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor
    corecore