49 research outputs found
Hiding the complexity: building a distributed ATLAS Tier-2 with a single resource interface using ARC middleware
Since their inception, Grids for high energy physics have found management of data to be the most challenging aspect of operations. This problem has generally been tackled by the experiment's data management framework controlling in fine detail the distribution of data around the grid and the careful brokering of jobs to sites with co-located data. This approach, however, presents experiments with a difficult and complex system to manage as well as introducing a rigidity into the framework which is very far from the original conception of the grid.<p></p>
In this paper we describe how the ScotGrid distributed Tier-2, which has sites in Glasgow, Edinburgh and Durham, was presented to ATLAS as a single, unified resource using the ARC middleware stack. In this model the ScotGrid 'data store' is hosted at Glasgow and presented as a single ATLAS storage resource. As jobs are taken from the ATLAS PanDA framework, they are dispatched to the computing cluster with the fastest response time. An ARC compute element at each site then asynchronously stages the data from the data store into a local cache hosted at each site. The job is then launched in the batch system and accesses data locally.<p></p>
We discuss the merits of this system compared to other operational models and consider, from the point of view of the resource providers (sites), and from the resource consumers (experiments); and consider issues involved in transitions to this model
Analysing I/O bottlenecks in LHC data analysis on grid storage resources
We describe recent I/O testing frameworks that we have developed and applied within the UK GridPP Collaboration, the ATLAS experiment and the DPM team, for a variety of distinct purposes. These include benchmarking vendor supplied storage products, discovering scaling limits of SRM solutions, tuning of storage systems for experiment data analysis, evaluating file access protocols, and exploring I/O read patterns of experiment software and their underlying event data models. With multiple grid sites now dealing with petabytes of data, such studies are becoming essential. We describe how the tests build, and improve, on previous work and contrast how the use-cases differ. We also detail the results obtained and the implications for storage hardware, middleware and experiment software
Tuning grid storage resources for LHC data analysis
Grid Storage Resource Management (SRM) and local file-system solutions are facing significant challenges to support efficient analysis of the data now being produced at the Large Hadron Collider (LHC). We compare the performance of different storage technologies at UK grid sites examining the effects of tuning and recent improvements in the I/O patterns of experiment software. Results are presented for both live production systems and technologies not currently in widespread use. Performance is studied using tests, including real LHC data analysis, which can be used to aid sites in deploying or optimising their storage configuration
Multi-core job submission and grid resource scheduling for ATLAS AthenaMP
AthenaMP is the multi-core implementation of the ATLAS software framework and allows the efficient sharing of memory pages between multiple threads of execution. This has now been validated for production and delivers a significant reduction on the overall application memory footprint with negligible CPU overhead. Before AthenaMP can be routinely run on the LHC Computing Grid it must be determined how the computing resources available to ATLAS can best exploit the notable improvements delivered by switching to this multi-process model. A study into the effectiveness and scalability of AthenaMP in a production environment will be presented. Best practices for configuring the main LRMS implementations currently used by grid sites will be identified in the context of multi-core scheduling optimisation
Establishing Applicability of SSDs to LHC Tier-2 Hardware Configuration
Solid State Disk technologies are increasingly replacing high-speed hard
disks as the storage technology in high-random-I/O environments. There are
several potentially I/O bound services within the typical LHC Tier-2 - in the
back-end, with the trend towards many-core architectures continuing, worker
nodes running many single-threaded jobs and storage nodes delivering many
simultaneous files can both exhibit I/O limited efficiency. We estimate the
effectiveness of affordable SSDs in the context of worker nodes, on a large
Tier-2 production setup using both low level tools and real LHC I/O intensive
data analysis jobs comparing and contrasting with high performance spinning
disk based solutions. We consider the applicability of each solution in the
context of its price/performance metrics, with an eye on the pragmatic issues
facing Tier-2 provision and upgradesComment: 6 pages, 1 figure, 4 tables. Conference proceedings for CHEP201
ScotGrid: Providing an Effective Distributed Tier-2 in the LHC Era
ScotGrid is a distributed Tier-2 centre in the UK with sites in Durham,
Edinburgh and Glasgow. ScotGrid has undergone a huge expansion in hardware in
anticipation of the LHC and now provides more than 4MSI2K and 500TB to the LHC
VOs. Scaling up to this level of provision has brought many challenges to the
Tier-2 and we show in this paper how we have adopted new methods of organising
the centres, from fabric management and monitoring to remote management of
sites to management and operational procedures, to meet these challenges. We
describe how we have coped with different operational models at the sites,
where Glagsow and Durham sites are managed "in house" but resources at
Edinburgh are managed as a central university resource. This required the
adoption of a different fabric management model at Edinburgh and a special
engagement with the cluster managers. Challenges arose from the different job
models of local and grid submission that required special attention to resolve.
We show how ScotGrid has successfully provided an infrastructure for ATLAS and
LHCb Monte Carlo production. Special attention has been paid to ensuring that
user analysis functions efficiently, which has required optimisation of local
storage and networking to cope with the demands of user analysis. Finally,
although these Tier-2 resources are pledged to the whole VO, we have
established close links with our local physics user communities as being the
best way to ensure that the Tier-2 functions effectively as a part of the LHC
grid computing framework..Comment: Preprint for 17th International Conference on Computing in High
Energy and Nuclear Physics, 7 pages, 1 figur
Herbicide-Resistant Crops: Utilities and Limitations for Herbicide-Resistant Weed Management
Since 1996, genetically modified herbicide-resistant (HR) crops, particularly glyphosate-resistant (GR) crops, have transformed the tactics that corn, soybean, and cotton growers use to manage weeds. The use of GR crops continues to grow, but weeds are adapting to the common practice of using only glyphosate to control weeds. Growers using only a single mode of action to manage weeds need to change to a more diverse array of herbicidal, mechanical, and cultural practices to maintain the effectiveness of glyphosate. Unfortunately, the introduction of GR crops and the high initial efficacy of glyphosate often lead to a decline in the use of other herbicide options and less investment by industry to discover new herbicide active ingredients. With some exceptions, most growers can still manage their weed problems with currently available selective and HR crop-enabled herbicides. However, current crop management systems are in jeopardy given the pace at which weed populations are evolving glyphosate resistance. New HR crop technologies will expand the utility of currently available herbicides and enable new interim solutions for growers to manage HR weeds, but will not replace the long-term need to diversify weed management tactics and discover herbicides with new modes of action. This paper reviews the strengths and weaknesses of anticipated weed management options and the best management practices that growers need to implement in HR crops to maximize the long-term benefits of current technologies and reduce weed shifts to difficult-to-control and HR weeds
Annotation Error in Public Databases: Misannotation of Molecular Function in Enzyme Superfamilies
Due to the rapid release of new data from genome sequencing projects, the majority of protein sequences in public databases have not been experimentally characterized; rather, sequences are annotated using computational analysis. The level of misannotation and the types of misannotation in large public databases are currently unknown and have not been analyzed in depth. We have investigated the misannotation levels for molecular function in four public protein sequence databases (UniProtKB/Swiss-Prot, GenBank NR, UniProtKB/TrEMBL, and KEGG) for a model set of 37 enzyme families for which extensive experimental information is available. The manually curated database Swiss-Prot shows the lowest annotation error levels (close to 0% for most families); the two other protein sequence databases (GenBank NR and TrEMBL) and the protein sequences in the KEGG pathways database exhibit similar and surprisingly high levels of misannotation that average 5%â63% across the six superfamilies studied. For 10 of the 37 families examined, the level of misannotation in one or more of these databases is >80%. Examination of the NR database over time shows that misannotation has increased from 1993 to 2005. The types of misannotation that were found fall into several categories, most associated with âoverpredictionâ of molecular function. These results suggest that misannotation in enzyme superfamilies containing multiple families that catalyze different reactions is a larger problem than has been recognized. Strategies are suggested for addressing some of the systematic problems contributing to these high levels of misannotation
A Roadmap for HEP Software and Computing R&D for the 2020s
Particle physics has an ambitious and broad experimental programme for the coming decades. This programme requires large investments in detector hardware, either to build new facilities and experiments, or to upgrade existing ones. Similarly, it requires commensurate investment in the R&D of software to acquire, manage, process, and analyse the shear amounts of data to be recorded. In planning for the HL-LHC in particular, it is critical that all of the collaborating stakeholders agree on the software goals and priorities, and that the efforts complement each other. In this spirit, this white paper describes the R&D activities required to prepare for this software upgrade.Peer reviewe