Skip to main content
Article thumbnail
Location of Repository

Domain Characteristics High Performance IO on Busy Systems

By Jay Lofstead, Qing Liu, Scott Klasky, Michael Booth, Ron Oldfield, Karsten Schwan and Matthew Wolf

Abstract

– Even with small per process data volumes, aggregate data volumes very large (10s of TB per output). – Communication during IO can negatively impact performance. Large Storage Systems – 100s of storage targets that must be managed to get performance. – Shared use by analysis data preparation impacts other users. Multi-user Systems – Simultaneous large jobs run concurrently (internal) –File system may be shared across systems (external) –Prep data in transit to aid downstream usage. Platform Concerns API performance on platform – The best performing IO API for a platform varies. – Some platforms do not have a working implementation of an API requiring selecting a different choice (e.g., HDF-5). File system characteristics vary – Adjust the IO organization to meet system characteristics (stripe size/count, storage targets). – Respond to variations in performance of the file system dynamically (adaptive IO techniques). Annotate data to aid in analysis – Generate characteristics for locating data (min, max) – Index data with characteristics to aid in finding – Use resilient formats to protect output dat

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.187.4815
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cc.gatech.edu/%7Elo... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.