Search CORE

32 research outputs found

CMS dashboard task monitoring: A user-centric monitoring view

Author: Andreeva J
Gaidioz B
Karavakis E
Khan A
Maier G
Publication venue
Publication date: 01/01/2010
Field of study

We are now in a phase change of the CMS experiment where people are turning more intensely to physics analysis and away from construction. This brings a lot of challenging issues with respect to monitoring of the user analysis. The physicists must be able to monitor the execution status, application and grid-level messages of their tasks that may run at any site within the CMS Virtual Organisation. The CMS Dashboard Task Monitoring project provides this information towards individual analysis users by collecting and exposing a user-centric set of information regarding submitted tasks including reason of failure, distribution by site and over time, consumed time and efficiency. The development was user-driven with physicists invited to test the prototype in order to assemble further requirements and identify weaknesses with the application

CERN Document Server

Brunel University Research Archive

High performance event-building in linux for LHCb

Author: Barczyk A
Gaidioz B
Jost B
Neufeld N
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

CERN Document Server

Idealization in Chemistry: Pure Substance and Laboratory Product

Author: A Bird
A Chakravartty
A Tiberghien
AH Johnstone
B Campbell
D Hodson
D. P. Portides
DF Shriver
E Scerri
ER Scerri
ER Scerri
ER Scerri
F Abd-El-Khalick
F Suppe
G Combes
G Contessa
GS Kirk
H Primas
H. Nielsen
J Brakel van
J Brakel van
J Brakel van
JR Brown
JW Hill
KC Berg De
L McIntyre
M Develaki
M Niaz
Manuel Fernández-González
MR Matthews
MR Matthews
MR Matthews
N Cartwright
National Research Council (NRC)
O Lombardi
O Lombardi
P Gaidioz
PA Kirschner
R Dugas
R Nola
R Osborne
R Schwartz
RF Gunstone
RG Wooley
RN Giere
RN Giere
S Clarke
S Erduran
S Erduran
S Erduran
S. Erduran
T Schwartz
WF McComas
Y Chevallard
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Equivalent differentiated services for AODVng

Author: Benjamin Gaidioz
Bernard Tourancheau
Gabriel Montenegro
Gaidioz B.
Pascale Primet
Perkins C.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Evaluation of subfarm controllers candidates with an implementation of LHCb event-building

Author: Barczyk A
Gaidioz B
Jost B
Neufeld N
Publication venue
Publication date: 02/11/2005
Field of study

This report summarises experimental results obtained when running an implementation of LHCb event-building on various candidates for subfarm controllers (SFC) of the LHCb data acquisition network. In the document, we first describe the implementation of event-building and then show experimental results

CERN Document Server

High performance event-building in linux for LHCb

Author: A. Barczyk
B. Gaidioz
B. Jost
N. Neufeld
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Association rule mining on grid monitoring data to detect error sources

Author: Gaidioz B
Kranzlmueller D
Maier G
Schiffers M
Publication venue: 'IOP Publishing'
Publication date: 01/01/2010
Field of study

Error handling is a crucial task in an infrastructure as complex as a grid. There are several monitoring tools put in place, which report failing grid jobs including exit codes. However, the exit codes do not always denote the actual fault, which caused the job failure. Human time and knowledge is required to manually trace back errors to the real fault underlying an error. We perform association rule mining on grid job monitoring data to automatically retrieve knowledge about the grid components' behavior by taking dependencies between grid job characteristics into account. Therewith, problematic grid components are located automatically and this information – expressed by association rules – is visualized in a web interface. This work achieves a decrease in time for fault recovery and yields an improvement of a grid's reliabilit

CERN Document Server

Grid reliability

Author: Andreeva J
Gaidioz B
Rocha R
Saiz P
Publication venue
Publication date: 01/01/2007
Field of study

We are offering a system to track the efficiency of different components of the GRID. We can study the performance of both the WMS and the data transfers At the moment, we have set different parts of the system for ALICE, ATLAS, CMS and LHCb. None of the components that we have developed are VO specific, therefore it would be very easy to deploy them for any other VO. Our main goal is basically to improve the reliability of the GRID. The main idea is to discover as soon as possible the different problems that have happened, and inform the responsible. Since we study the jobs and transfers issued by real users, we see the same problems that users see. As a matter of fact, we see even more problems than the end user does, since we are also interested in following up the errors that GRID components can overcome by themselves (like for instance, in case of a job failure, resubmitting the job to a different site). This kind of information is very useful to site and VO administrators. They can find out the efficiency of their sites, and, in case of failures, the problems that they have to solve. The reports that we provide are also interesting for the COD, since the errors might not be VO specific. All this system is based on studying the different actions that users do. Therefore, the first and most important dependency is on monitoring systems. The way we do it is to interface it with the DASHBOARD, which will hide the differences between the heterogeneous sources of data (like RGMA, ICXML or MonALISA). Another service very important for the effectiveness of the Grid reliability is the submission and tracking of tickets, GGUS. This has already been tested with a manual procedure. Since the result was very encouraging, we are working on ways of automatizing this interaction. The main problem that we have found so far is the lacking of communication between the new gLite RB and RGMA. Jobs that went through these resource brokers do not publish their status, thus making our tasks impossible. Another possible problem that we might encounter is the confidentiality of the data. To solve this, we are anonymising the jobs and transfers, since we are only interested in the different status that the job or transfer goes through

CERN Document Server

ARDA Dashboard Data Management Monitoring

Author: Andreeva J
Gaidioz B
Rocha R
Saiz P
Publication venue
Publication date: 01/01/2007
Field of study

The Atlas DDM (Distributed Data Management) system is responsible for the management and distribution of data across the different grid sites. The data is generated at CERN and has to be made available as fast as possible in a large number of centres for production purposes, and later in many other sites for end-user analysis. Monitoring their data transfer activity and availability is an essential task for both site administrators and end users doing analysis in their local centres. Data management using the grid depends on a complex set of services. File catalogues for file and file location bookkeeping, transfer services for file movement, storage managers and others. In addition there are several flavours of each of these components, tens of sites each managing a distinct installation - over 100 at the present time - and in some organizations data is seen and moved in larger granularity than files - usually called datasets, which makes the successful usage of the standard grid monitoring tools a non straightforward task. The dashboard provides a unified view of the whole data management infrastructure, relying mostly on the Atlas data management (DDM) system to collect the relevant information regarding dataset and file movement among the different sites, but also retrieving information from the grid fabric services where appropriate. This last point makes it an interesting tool also for other communities that rely on the same lower level grid services. Focusing mostly on data management on the grid, the most relevant services for this area of the dashboard are the transfer services and storage managers. It is essential that all information can be easily and quickly propagated to the dashboard service, either directly or via the DDM services, so that end users can have an almost real-time view over their activities; and production systems can rely on the system views provided by the monitoring. File transfer information is transient in most cases, and taken from the main transfer tool being used - the File Transfer Service (FTS). Storage and storage space information lie in the Storage Resource Managers (SRM), which should be able to provide a unique implementation independent over the physical data and available space. Information regarding file and system meta-data is expected to be kept consistent everywhere, and any changes to be propagated to the interested services like the dashboard. We plan to extend the handling of errors coming from the different Grid services used by the ATLAS DDM system

CERN Document Server