Having built up Linux clusters to more than 1000 nodes over the past five
years, we already have practical experience confronting some of the LHC scale
computing challenges: scalability, automation, hardware diversity, security,
and rolling OS upgrades. This paper describes the tools and processes we have
implemented, working in close collaboration with the EDG project [1],
especially with the WP4 subtask, to improve the manageability of our clusters,
in particular in the areas of system installation, configuration, and
monitoring. In addition to the purely technical issues, providing shared
interactive and batch services which can adapt to meet the diverse and changing
requirements of our users is a significant challenge. We describe the
developments and tuning that we have introduced on our LSF based systems to
maximise both responsiveness to users and overall system utilisation. Finally,
this paper will describe the problems we are facing in enlarging our
heterogeneous Linux clusters, the progress we have made in dealing with the
current issues and the steps we are taking to gridify the clustersComment: 5 pages, Proceedings for the CHEP 2003 conference, La Jolla,
California, March 24 - 28, 200