Fueled by increasing processor speeds and high speed interconnection networks, advances in high performance computer architectures have allowed the development of increasingly complex large scale parallel systems. For computational scientists, programming these systems efficiently is a challenging task. Understanding the performance of their parallel applications i