15 research outputs found
Predicting dataset popularity for the CMS experiment
The CMS experiment at the LHC accelerator at CERN relies on its computing
infrastructure to stay at the frontier of High Energy Physics, searching for
new phenomena and making discoveries. Even though computing plays a significant
role in physics analysis we rarely use its data to predict the system behavior
itself. A basic information about computing resources, user activities and site
utilization can be really useful for improving the throughput of the system and
its management. In this paper, we discuss a first CMS analysis of dataset
popularity based on CMS meta-data which can be used as a model for dynamic data
placement and provide the foundation of data-driven approach for the CMS
computing infrastructure.Comment: Submitted to proceedings of 17th International workshop on Advanced
Computing and Analysis Techniques in physics research (ACAT
PhEDEx Data Service
The PhEDEx Data Service provides access to information from the central PhEDEx database, as well as certificate-authenticated managerial operations such as requesting the transfer or deletion of data. The Data Service is integrated with the SiteDB service for fine-grained access control, providing a safe and secure environment for operations. A plug-in architecture allows server-side modules to be developed rapidly and easily by anyone familiar with the schema, and can automatically return the data in a variety of formats for use by different client technologies. Using HTTP access via the Data Service instead of direct database connections makes it possible to build monitoring web-pages with complex drill-down operations, suitable for debugging or presentation from many aspects. This will form the basis of the new PhEDEx website in the near future, as well as providing access to PhEDEx information and certificate-authenticated services for other CMS dataflow and workflow management tools such as CRAB, WMCore, DBS and the dashboard. A PhEDEx command-line client tool provides one-stop access to all the functions of the PhEDEx Data Service interactively, for use in simple scripts that do not access the service directly. The client tool provides certificate-authenticated access to managerial functions, so all the functions of the PhEDEx Data Service are available to it. The tool can be expanded by plug-ins which can combine or extend the client-side manipulation of data from the Data Service, providing a powerful environment for manipulating data within PhEDEx
The Spring 2002 DAQ TDR Production
In Spring 2002 the CMS Production team produced a large sample of Monte Carlo events for the CMS DAQ TDR. This note documents the process by which those events were produced, with details of the tools, the architecture of the production machinery, and the individual sites that took part
Resource Monitoring Tool for CMS production
A monitoring tool is described which not only tracks and recognises errors but also works together with a management system that is responsible for resource allocation. In cluster/grid computing, the resources of all accessible computers are at the disposal of end users. With that much power at hand, the responsibility of the software managing these resources also increases. The better utilization of resources means that a monitoring system should make the collected data persistent, so that the management system has up-to-date information but also has a meaningful historical record. This database can then be consulted for finding the best available resources in a given scenario, and can also be used for understanding historical trends. The Resource Monitoring Tool, RMT, is such a tool, which caters for these needs. Its framework is designed in such a way that its potential can be enhanced easily by adding more modules
Software packaging with DAR
One of the important tasks in distributed computing is to deliver software applications to the computing resources. Distribution after Release (DAR) tool, is being used to package software applications for the world-wide event production by the CMS Collaboration. This presentation will focus on the concept of packaging applications based on the runtime environment. We discuss solutions for more effective software distribution based on two years experience with DAR. Finally, we will give an overview of the application distribution process and the interfaces to the CMS production tools
Towards Managed Terabit/s Scientific Data Flows
Scientific collaborations on a global scale, such as the LHC experiments at CERN [1], rely today on the presence of high performance, high availability networks. In this paper we review the developments performed over the last several years on high throughput applications, multilayer software-defined network path provisioning, path selection and load balancing methods, and the integration of these methods with the mainstream data transfer and management applications of CMS [2], one of the major LHC experiments. These developments are folded into a compact system capable of moving data among research sites at the 1 Terabit per second scale. Several aspects that went into the design and target different components of the system are presented, including: evaluation of the 40 and 100Gbps capable hardware on both network and server side, data movement applications, flow management and the network-application interface leveraging advanced network services. We report on comparative results between several multi-path algorithms, the performance increase obtained using this approach, and present results from the related SC'13 demonstration
Recommended from our members
MAGI: A Method for Metabolite Annotation and Gene Integration.
Metabolomics is a widely used technology for obtaining direct measures of metabolic activities from diverse biological systems. However, ambiguous metabolite identifications are a common challenge and biochemical interpretation is often limited by incomplete and inaccurate genome-based predictions of enzyme activities (that is, gene annotations). Metabolite Annotation and Gene Integration (MAGI) generates a metabolite-gene association score using a biochemical reaction network. This is calculated by a method that emphasizes consensus between metabolites and genes via biochemical reactions. To demonstrate the potential of this method, we applied MAGI to integrate sequence data and metabolomics data collected from Streptomyces coelicolor A3(2), an extensively characterized bacterium that produces diverse secondary metabolites. Our findings suggest that coupling metabolomics and genomics data by scoring consensus between the two increases the quality of both metabolite identifications and gene annotations in this organism. MAGI also made biochemical predictions for poorly annotated genes that were consistent with the extensive literature on this important organism. This limited analysis suggests that using metabolomics data has the potential to improve annotations in sequenced organisms and also provides testable hypotheses for specific biochemical functions. MAGI is freely available for academic use both as an online tool at https://magi.nersc.gov and with source code available at https://github.com/biorack/magi