Location of Repository

Domain Characteristics High Performance IO on Busy Systems

By Jay Lofstead, Qing Liu, Scott Klasky, Michael Booth, Ron Oldfield, Karsten Schwan and Matthew Wolf

Abstract

– Even with small per process data volumes, aggregate data volumes very large (10s of TB per output). – Communication during IO can negatively impact performance. Large Storage Systems – 100s of storage targets that must be managed to get performance. – Shared use by analysis data preparation impacts other users. Multi-user Systems – Simultaneous large jobs run concurrently (internal) –File system may be shared across systems (external) –Prep data in transit to aid downstream usage. Platform Concerns API performance on platform – The best performing IO API for a platform varies. – Some platforms do not have a working implementation of an API requiring selecting a different choice (e.g., HDF-5). File system characteristics vary – Adjust the IO organization to meet system characteristics (stripe size/count, storage targets). – Respond to variations in performance of the file system dynamically (adaptive IO techniques). Annotate data to aid in analysis – Generate characteristics for locating data (min, max) – Index data with characteristics to aid in finding – Use resilient formats to protect output dat

Year: 2011
OAI identifier: oai:CiteSeerX.psu:10.1.1.187.4815
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.cc.gatech.edu/%7Elo... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.