27,169 research outputs found
Multi-Terabyte EIDE Disk Arrays running Linux RAID5
High-energy physics experiments are currently recording large amounts of data
and in a few years will be recording prodigious quantities of data. New methods
must be developed to handle this data and make analysis at universities
possible. Grid Computing is one method; however, the data must be cached at the
various Grid nodes. We examine some storage techniques that exploit recent
developments in commodity hardware. Disk arrays using RAID level 5 (RAID-5)
include both parity and striping. The striping improves access speed. The
parity protects data in the event of a single disk failure, but not in the case
of multiple disk failures.
We report on tests of dual-processor Linux Software RAID-5 arrays and
Hardware RAID-5 arrays using a 12-disk 3ware controller, in conjunction with
250 and 300 GB disks, for use in offline high-energy physics data analysis. The
price of IDE disks is now less than $1/GB. These RAID-5 disk arrays can be
scaled to sizes affordable to small institutions and used when fast random
access at low cost is important.Comment: Talk from the 2004 Computing in High Energy and Nuclear Physics
(CHEP04), Interlaken, Switzerland, 27th September - 1st October 2004, 4
pages, LaTeX, uses CHEP2004.cls. ID 47, Poster Session 2, Track
Fine-Grain Checkpointing with In-Cache-Line Logging
Non-Volatile Memory offers the possibility of implementing high-performance,
durable data structures. However, achieving performance comparable to
well-designed data structures in non-persistent (transient) memory is
difficult, primarily because of the cost of ensuring the order in which memory
writes reach NVM. Often, this requires flushing data to NVM and waiting a full
memory round-trip time.
In this paper, we introduce two new techniques: Fine-Grained Checkpointing,
which ensures a consistent, quickly recoverable data structure in NVM after a
system failure, and In-Cache-Line Logging, an undo-logging technique that
enables recovery of earlier state without requiring cache-line flushes in the
normal case. We implemented these techniques in the Masstree data structure,
making it persistent and demonstrating the ease of applying them to a highly
optimized system and their low (5.9-15.4\%) runtime overhead cost.Comment: In 2019 Architectural Support for Programming Languages and Operating
Systems (ASPLOS 19), April 13, 2019, Providence, RI, US
Data processing model for the CDF experiment
The data processing model for the CDF experiment is described. Data
processing reconstructs events from parallel data streams taken with different
combinations of physics event triggers and further splits the events into
datasets of specialized physics datasets. The design of the processing control
system faces strict requirements on bookkeeping records, which trace the status
of data files and event contents during processing and storage. The computing
architecture was updated to meet the mass data flow of the Run II data
collection, recently upgraded to a maximum rate of 40 MByte/sec. The data
processing facility consists of a large cluster of Linux computers with data
movement managed by the CDF data handling system to a multi-petaByte Enstore
tape library. The latest processing cycle has achieved a stable speed of 35
MByte/sec (3 TByte/day). It can be readily scaled by increasing CPU and
data-handling capacity as required.Comment: 12 pages, 10 figures, submitted to IEEE-TN
Recommended from our members
Computing infrastructure issues in distributed communications systems : a survey of operating system transport system architectures
The performance of distributed applications (such as file transfer, remote login, tele-conferencing, full-motion video, and scientific visualization) is influenced by several factors that interact in complex ways. In particular, application performance is significantly affected both by communication infrastructure factors and computing infrastructure factors. Several communication infrastructure factors include channel speed, bit-error rate, and congestion at intermediate switching nodes. Computing infrastructure factors include (among other things) both protocol processing activities (such as connection management, flow control, error detection, and retransmission) and general operating system factors (such as memory latency, CPU speed, interrupt and context switching overhead, process architecture, and message buffering). Due to a several orders of magnitude increase in network channel speed and an increase in application diversity, performance bottlenecks are shifting from the network factors to the transport system factors.This paper defines an abstraction called an "Operating System Transport System Architecture" (OSTSA) that is used to classify the major components and services in the computing infrastructure. End-to-end network protocols such as TCP, TP4, VMTP, XTP, and Delta-t typically run on general-purpose computers, where they utilize various operating system resources such as processors, virtual memory, and network controllers. The OSTSA provides services that integrate these resources to support distributed applications running on local and wide area networks.A taxonomy is presented to evaluate OSTSAs in terms of their support for protocol processing activities. We use this taxonomy to compare and contrast five general-purpose commercial and experimental operating systems including System V UNIX, BSD UNIX, the x-kernel, Choices, and Xinu
Data production models for the CDF experiment
The data production for the CDF experiment is conducted on a large Linux PC
farm designed to meet the needs of data collection at a maximum rate of 40
MByte/sec. We present two data production models that exploits advances in
computing and communication technology. The first production farm is a
centralized system that has achieved a stable data processing rate of
approximately 2 TByte per day. The recently upgraded farm is migrated to the
SAM (Sequential Access to data via Metadata) data handling system. The software
and hardware of the CDF production farms has been successful in providing large
computing and data throughput capacity to the experiment.Comment: 8 pages, 9 figures; presented at HPC Asia2005, Beijing, China, Nov 30
- Dec 3, 200
- …