217 research outputs found

    CERN openlab Whitepaper on Future IT Challenges in Scientific Research

    Get PDF
    This whitepaper describes the major IT challenges in scientific research at CERN and several other European and international research laboratories and projects. Each challenge is exemplified through a set of concrete use cases drawn from the requirements of large-scale scientific programs. The paper is based on contributions from many researchers and IT experts of the participating laboratories and also input from the existing CERN openlab industrial sponsors. The views expressed in this document are those of the individual contributors and do not necessarily reflect the view of their organisations and/or affiliates

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    NetJobs: A new approach to network monitoring for the Grid using Grid jobs

    Get PDF
    With grid computing, the far-fl�ung and disparate IT resources act as a single "virtual datacenter". Grid computing interfaces heterogeneous IT resources so they are available when and where we need them. Grid allows us to provision applications and allocate capacity among research and business groups that are geographically and organizationally dispersed. Building a high availability Grid is hold as the next goal to achieve: protecting against computer failures and site failures to avoid downtime of resource and honor Service Level Agreements. Network monitoring has a key role in this challenge. This work is concerning the design and the prototypal implementation of a new approach to Network monitoring for the Grid based on the usage of Grid scheduled jobs. This work was carried out within the Network Support task (SA2) of the Enabling Grids for E-sciencE (EGEE) project. This thesis is organized as follows: Chapter 1: Grid Computing From the origins of Grid Computing to the latest projects. Conceptual framework and main features characterizing many kind of popular grids will be presented. Chapter 2: The EGEE and EGI projects This chapter describes the Enabling Grids for E-sciencE (EGEE) project and the European Grid Infrastructure (EGI). EGEE project (2004-2010) was the�flagship Grid infrastructure project of the EU. The third and last two-year phase of the project (started on 1 May 2008) was financed with a total budget of around 47 million euro, with a further estimated 50 million euro worth of computing resources contributed by the partners. A total manpower of 9,000 Person Months, of which over 4,500 Person Months has been contributed by the partners from their own funding sources. At its close, EGEE represented a worldwide infrastructure of approximately to 200,000 CPU cores, collaboratively hosted by more than 300 centres around the world. By the end of the project, around 13 million jobs were executed on the EGEE grid each month. The new organization, EGI.eu, has then been created to continue the coordination and evolution of the European Grid Infrastructure (EGI) based on EGEE Grid. Chapter3: gLite Middleware Chapter three gives an overview on the gLite Grid Middleware. gLite is the middleware stack for grid computing used by the EGEE and EGI projects with in a very large variety of scientifi�c domains. Born from the collaborative efforts of more than 80 people in 12 different academic and industrial research centers as part of the EGEE Project, gLite provides a complete set of services for building a production grid infrastructure. gLite provides a framework for building grid applications tapping into the power of distributed computing and storage resources across the Internet. The gLite services are currently adopted by more than 250 Computing Centres and used by more than 15000 researchers in Europe and around the world. Chapter 4: Network Activity in EGEE/EGI Grid infrastructures are distributed by nature, involving many sites, normally in different administrative domains. Individual sites are connected together by a network, which is therefore a critical part of the whole Grid infrastructure; without the network there is no Grid. Monitoring is a key component for the successful operation of any infrastructure, helping in the discovery and diagnosis of any problem which may arise. Network monitoring is able to contribute to the day-to-day operations of the Grid by helping to provide answers to specific questions from users and site administrators. This chapter will discuss all the effort lavished by EGEE and EGI in the Grid Network domain. Chapter 5: Grid Network Monitoring based on Grid Jobs Net Jobs is a prototype of a light weight solution for the Grid network monitoring. A job-based approach has been used in order to prove the feasibility of this non intrusive solution. It is currently configured to monitor eight production sites spread from Italy to France but this method could be applied to the vast majority of Grid sites. The prototype provides coherent RTT, MTU, number of hops and TCP achievable bandwidth tests

    Management, Optimization and Evolution of the LHCb Online Network

    Get PDF
    The LHCb experiment is one of the four large particle detectors running at the Large Hadron Collider (LHC) at CERN. It is a forward single-arm spectrometer dedicated to test the Standard Model through precision measurements of Charge-Parity (CP) violation and rare decays in the b quark sector. The LHCb experiment will operate at a luminosity of 2x10^32cm-2s-1, the proton-proton bunch crossings rate will be approximately 10 MHz. To select the interesting events, a two-level trigger scheme is applied: the rst level trigger (L0) and the high level trigger (HLT). The L0 trigger is implemented in custom hardware, while HLT is implemented in software runs on the CPUs of the Event Filter Farm (EFF). The L0 trigger rate is dened at about 1 MHz, and the event size for each event is about 35 kByte. It is a serious challenge to handle the resulting data rate (35 GByte/s). The Online system is a key part of the LHCb experiment, providing all the IT services. It consists of three major components: the Data Acquisition (DAQ) system, the Timing and Fast Control (TFC) system and the Experiment Control System (ECS). To provide the services, two large dedicated networks based on Gigabit Ethernet are deployed: one for DAQ and another one for ECS, which are referred to Online network in general. A large network needs sophisticated monitoring for its successful operation. Commercial network management systems are quite expensive and dicult to integrate into the LHCb ECS. A custom network monitoring system has been implemented based on a Supervisory Control And Data Acquisition (SCADA) system called PVSS which is used by LHCb ECS. It is a homogeneous part of the LHCb ECS. In this thesis, it is demonstrated how a large scale network can be monitored and managed using tools originally made for industrial supervisory control. The thesis is organized as the follows: Chapter 1 gives a brief introduction to LHC and the B physics on LHC, then describes all sub-detectors and the trigger and DAQ system of LHCb from structure to performance. Chapter 2 first introduces the LHCb Online system and the dataflow, then focuses on the Online network design and its optimization. In Chapter 3, the SCADA system PVSS is introduced briefly, then the architecture and implementation of the network monitoring system are described in detail, including the front-end processes, the data communication and the supervisory layer. Chapter 4 first discusses the packet sampling theory and one of the packet sampling mechanisms: sFlow, then demonstrates the applications of sFlow for the network trouble-shooting, the traffic monitoring and the anomaly detection. In Chapter 5, the upgrade of LHC and LHCb is introduced, the possible architecture of DAQ is discussed, and two candidate internetworking technologies (high speed Ethernet and InfniBand) are compared in different aspects for DAQ. Three schemes based on 10 Gigabit Ethernet are presented and studied. Chapter 6 is a general summary of the thesis

    January-March 2007

    Get PDF

    Development of a Low pT Muon LVL2 Trigger Algorithm with the ATLAS TileCal Detector

    Get PDF
    This research report is framed in the commissioning phase of the ATLAS experiment. It is devoted to the TileCal muon identification algorithm, TileMuId, which makes use of the energy deposition in the calorimeter cells following projective patterns in η\eta. The main purpose of this algorithm is to be used at Level-2 trigger for low pTp_{\text{T}} muons. The implementation of the algorithm in the TileCal ROD DSP firmware is discussed since the preprocessing at this stage can be used to save time at Level-2. The efficiency and fraction of fakes expected with this ROD-based version of the algorithm are evaluated using different samples of Monte Carlo data. Results obtained executing the algorithm online in the ROD DSPs during cosmics runs are also shown. Finally, the software implemented in the Athena framework related to this new version of the algorithm is summarized

    Performance analysis of a database caching system in a grid environment

    Get PDF
    Tese de mestrado. Engenharia Informática. Faculdade de Engenharia. Universidade do Porto. 200
    • …
    corecore