Search CORE

15 research outputs found

Predicting dataset popularity for the CMS experiment

Author: Bonacorsi Daniele
Giommi Luca
Kuznetsov Valentin
Li Ting
Wildish Tony
Publication venue: 'IOP Publishing'
Publication date: 23/02/2016
Field of study

The CMS experiment at the LHC accelerator at CERN relies on its computing infrastructure to stay at the frontier of High Energy Physics, searching for new phenomena and making discoveries. Even though computing plays a significant role in physics analysis we rarely use its data to predict the system behavior itself. A basic information about computing resources, user activities and site utilization can be really useful for improving the throughput of the system and its management. In this paper, we discuss a first CMS analysis of dataset popularity based on CMS meta-data which can be used as a model for dynamic data placement and provide the foundation of data-driven approach for the CMS computing infrastructure.Comment: Submitted to proceedings of 17th International workshop on Advanced Computing and Analysis Techniques in physics research (ACAT

arXiv.org e-Print Archive

Crossref

CERN Document Server

PhEDEx Data Service

Author: Egeland Ricky
Huang Chih-Hao
Wildish Tony
Publication venue
Publication date: 12/05/2009
Field of study

The PhEDEx Data Service provides access to information from the central PhEDEx database, as well as certificate-authenticated managerial operations such as requesting the transfer or deletion of data. The Data Service is integrated with the SiteDB service for fine-grained access control, providing a safe and secure environment for operations. A plug-in architecture allows server-side modules to be developed rapidly and easily by anyone familiar with the schema, and can automatically return the data in a variety of formats for use by different client technologies. Using HTTP access via the Data Service instead of direct database connections makes it possible to build monitoring web-pages with complex drill-down operations, suitable for debugging or presentation from many aspects. This will form the basis of the new PhEDEx website in the near future, as well as providing access to PhEDEx information and certificate-authenticated services for other CMS dataflow and workflow management tools such as CRAB, WMCore, DBS and the dashboard. A PhEDEx command-line client tool provides one-stop access to all the functions of the PhEDEx Data Service interactively, for use in simple scripts that do not access the service directly. The client tool provides certificate-authenticated access to managerial functions, so all the functions of the PhEDEx Data Service are available to it. The tool can be expanded by plug-ins which can combine or extend the client-side manipulation of data from the Data Service, providing a powerful environment for manipulating data within PhEDEx

CERN Document Server

Helix Nebula Science Cloud Pilot Phase Final Public session

Author: Wildish Tony
Publication venue
Publication date: 01/01/2018
Field of study

CERN Document Server

The Spring 2002 DAQ TDR Production

Author: Lefébure Véronique
Wildish Tony
Publication venue
Publication date: 23/09/2002
Field of study

In Spring 2002 the CMS Production team produced a large sample of Monte Carlo events for the CMS DAQ TDR. This note documents the process by which those events were produced, with details of the tools, the architecture of the production machinery, and the individual sites that took part

CERN Document Server

Resource Monitoring Tool for CMS production

Author: Osman Asif
Wildish Tony
Willers Ian Malcolm
Publication venue
Publication date: 03/06/2003
Field of study

A monitoring tool is described which not only tracks and recognises errors but also works together with a management system that is responsible for resource allocation. In cluster/grid computing, the resources of all accessible computers are at the disposal of end users. With that much power at hand, the responsibility of the software managing these resources also increases. The better utilization of resources means that a monitoring system should make the collected data persistent, so that the management system has up-to-date information but also has a meaningful historical record. This database can then be consulted for finding the best available resources in a given scenario, and can also be used for understanding historical trends. The Resource Monitoring Tool, RMT, is such a tool, which caters for these needs. Its framework is designed in such a way that its potential can be enhanced easily by adding more modules

CERN Document Server

Exploring Patterns and Correlations in CMS Computing Operations Data with Big Data Analytics Techniques

Author: Bonacorsi Daniele
Giommi Luca
Kuznetsov Valentin
Wildish Tony
Publication venue: 'Sissa Medialab'
Publication date: 01/01/2015
Field of study

Crossref

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

CERN Document Server

Software packaging with DAR

Author: Afaq Anzar
Graham Greg
Lefébure V
Ratnikova Natalia
Wildish Tony
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

One of the important tasks in distributed computing is to deliver software applications to the computing resources. Distribution after Release (DAR) tool, is being used to package software applications for the world-wide event production by the CMS Collaboration. This presentation will focus on the concept of packaging applications based on the runtime environment. We discuss solutions for more effective software distribution based on two years experience with DAR. Finally, we will give an overview of the application distribution process and the interfaces to the CMS production tools

CERN Document Server

Towards Managed Terabit/s Scientific Data Flows

Author: Barczyk Artur
Bredel Michael
Lapadatescu Vlad
Legrand Iosif
Mughal Azher
Newman Harvey
Voicu Ramiro
Wildish Tony
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Scientific collaborations on a global scale, such as the LHC experiments at CERN [1], rely today on the presence of high performance, high availability networks. In this paper we review the developments performed over the last several years on high throughput applications, multilayer software-defined network path provisioning, path selection and load balancing methods, and the integration of these methods with the mainstream data transfer and management applications of CMS [2], one of the major LHC experiments. These developments are folded into a compact system capable of moving data among research sites at the 1 Terabit per second scale. Several aspects that went into the design and target different components of the system are presented, including: evaluation of the 40 and 100Gbps capable hardware on both network and server side, data movement applications, flow management and the network-application interface leveraging advanced network services. We report on comparative results between several multi-path algorithms, the performance increase obtained using this approach, and present results from the related SC'13 demonstration

Crossref

Caltech Authors

Recommended from our members

MAGI: A Method for Metabolite Annotation and Gene Integration.

Author: Bowen Benjamin P
Deutsch Samuel
Erbilgin Onur
Hoover Cindi
Louie Katherine B
Northen Trent R
Raad Markus de
Rübel Oliver
Trinh Matthew
Udwary Daniel
Wildish Tony
Publication venue: eScholarship, University of California
Publication date: 01/04/2019
Field of study

Metabolomics is a widely used technology for obtaining direct measures of metabolic activities from diverse biological systems. However, ambiguous metabolite identifications are a common challenge and biochemical interpretation is often limited by incomplete and inaccurate genome-based predictions of enzyme activities (that is, gene annotations). Metabolite Annotation and Gene Integration (MAGI) generates a metabolite-gene association score using a biochemical reaction network. This is calculated by a method that emphasizes consensus between metabolites and genes via biochemical reactions. To demonstrate the potential of this method, we applied MAGI to integrate sequence data and metabolomics data collected from Streptomyces coelicolor A3(2), an extensively characterized bacterium that produces diverse secondary metabolites. Our findings suggest that coupling metabolomics and genomics data by scoring consensus between the two increases the quality of both metabolite identifications and gene annotations in this organism. MAGI also made biochemical predictions for poorly annotated genes that were consistent with the extensive literature on this important organism. This limited analysis suggests that using metabolomics data has the potential to improve annotations in sequenced organisms and also provides testable hypotheses for specific biochemical functions. MAGI is freely available for academic use both as an online tool at https://magi.nersc.gov and with source code available at https://github.com/biorack/magi

eScholarship - University of California

FigShare

Integrating Network-Awareness and Network-Management into PhEDEx

Author: Ball Bob
Barczyk Artur
Batista Jorge
De Kaushik
Lapadatescu Vlad
Legrand Iosif
Mckee Shawn
Melo Andrew
Mughal Azher
Newman Harvey
Petrosyan Artem
Sheldon Paul
Voicu Ramiro
Wildish Tony
Publication venue: 'Sissa Medialab'
Publication date: 01/01/2016
Field of study

Crossref

CERN Document Server