Search CORE

10 research outputs found

Indications of early thermalization in relativistic heavy-ion collisions

Author: A. Białas
Iwona Wyskiel-Piekarska
P. Bożek
P. F. Kolb
Piotr Bożek
S. Mrowczynski
Publication venue: 'American Physical Society (APS)'
Publication date: 11/02/2011
Field of study

The directed flow of particles emitted from the fireball created in a heavy-ion collision is shown to be a very sensitive measure of the pressure equilibration in the first 1 fm/c of the evolution. Performing a 3+1 dimensional relativistic hydrodynamic calculation with nonequilibrated longitudinal and transverse pressures, we show that the directed flow is strongly reduced if the pressure imbalance survives for even a short time. Transverse momentum spectra, elliptic flow and interferometry correlation radii are not very sensitive to this early pressure anisotropy. Comparison with the data points toward a short equilibration time of the order of

0.25

fm/c or less

arXiv.org e-Print Archive

Crossref

Scaling cloud-native Apache Spark on Kubernetes for workloads in external storages

Author: Mrowczynski Piotr
Publication venue: KTH, Skolan för elektroteknik och datavetenskap (EECS)
Publication date: 01/01/2018
Field of study

CERN Scalable Analytics Section currently offers shared YARN clusters to its users as monitoring, security and experiment operations. YARN clusters with data in HDFS are difficult to provision, complex to manage and resize. This imposes new data and operational challenges to satisfy future physics data processing requirements. As of 2018, there were over 250 PB of physics data stored in CERN’s mass storage called EOS. Hadoop-XRootD Connector allows to read over network data stored in CERN EOS. CERN’s on-premise private cloud based on OpenStack allows to provision on-demand compute resources. Emergence of technologies as Containers-as-a-Service in Openstack Magnum and support for Kubernetes as native resource scheduler for Apache Spark, give opportunity to increase workflow reproducability on different compute infrastructures with use of containers, reduce operational effort of maintaining computing cluster and increase resource utilization via cloud elastic resource provisioning. This trades-off the operational features with datalocality known from traditional systems as Spark/YARN with data in HDFS.In the proposed architecture of cloud-managed Spark/Kubernetes with data stored in external storage systems as EOS, Ceph S3 or Kafka, physicists and other CERN communities can on-demand spawn and resize Spark/Kubernetes cluster, having fine-grained control of Spark Applications. This work focuses on Kubernetes CRD Operator for idiomatically defining and running Apache Spark applications on Kubernetes, with automated scheduling and on-failure resubmission of long-running applications. Spark Operator was introduced with design principle to allow Spark on Kubernetes to be easy to deploy, scale and maintain with similar usability of Spark/YARN.The analysis of concerns related to non-cluster local persistent storage and memory handling has been performed. The architecture scalability has been evaluated on the use case of sustained workload as physics data reduction, with files in ROOT format being stored in CERN mass-storage called EOS. The series of microbenchmarks has been performed to evaluate the architecture properties compared to state-of-the-art Spark/YARN cluster at CERN. Finally, Spark on Kubernetes workload use-cases have been classified, and possible bottlenecks and requirements identified.CERN Scalable Analytics Section erbjuder för närvarande delade YARN-kluster till sina användare och för övervakning, säkerhet, experimentoperationer, samt till andra grupper som är intresserade av att bearbeta data med hjälp av Big Data-tekniker. Dock är YARNkluster med data i HDFS svåra att tillhandahålla, samt komplexa att hantera och ändra storlek på. Detta innebär nya data och operativa utmaningar för att uppfylla krav på dataprocessering för petabyte-skalning av fysikdata.Från och med 2018 fanns över 250 PB fysikdata lagrade i CERNs masslagring, kallad EOS. CERNs privata moln, baserat på OpenStack, gör det möjligt att tillhandahålla beräkningsresurser på begäran. Uppkomsten av teknik som Containers-as-a-Service i Openstack Magnum och stöd för Kubernetes som inbyggd resursschemaläggare för Apache Spark, ger möjlighet att öka arbetsflödesreproducerbarheten på olika databaser med användning av containers, minska operativa ansträngningar för att upprätthålla datakluster, öka resursutnyttjande via elasiska resurser, samt tillhandahålla delning av resurser mellan olika typer av arbetsbelastningar med kvoter och namnrymder.I den föreslagna arkitekturen av molnstyrda Spark / Kubernetes med data lagrade i externa lagringssystem som EOS, Ceph S3 eller Kafka, kan fysiker och andra CERN-samhällen på begäran skapa och ändra storlek på Spark / Kubernetes-klustrer med finkorrigerad kontroll över Spark Applikationer. Detta arbete fokuserar på Kubernetes CRD Operator för idiomatiskt definierande och körning av Apache Spark-applikationer på Kubernetes, med automatiserad schemaläggning och felåterkoppling av långvariga applikationer. Spark Operator introducerades med designprincipen att tillåta Spark över Kubernetes att vara enkel att distribuera, skala och underhålla. Analys av problem relaterade till icke-lokal kluster persistent lagring och minneshantering har utförts. Arkitekturen har utvärderats med användning av fysikdatareduktion, med filer i ROOT-format som lagras i CERNs masslagringsystem som kallas EOS. En serie av mikrobenchmarks har utförts för att utvärdera arkitekturegenskaperna såsom prestanda jämfört med toppmoderna Spark / YARN-kluster vid CERN, och skalbarhet för långvariga dataprocesseringsjobb

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line

HEPiX Spring 2019 Summary

Author: Barrand Quentin
Dumitru Andrei
Mrowczynski Piotr
Publication venue
Publication date: 01/01/2019
Field of study

The HEPiX forum brings together worldwide Information Technology staff, including system administrators, system engineers, and managers from the High Energy Physics and Nuclear Physics laboratories and institutes, to foster a learning and sharing experience between sites facing scientific computing and data challenges. Participating sites include BNL, CERN, DESY, FNAL, IHEP, IN2P3, INFN, JLAB, Nikhef, RAL, SLAC, TRIUMF and many others. The HEPiX organization was formed in 1991, and its semi-annual meetings are an excellent source of information and sharing for IT experts in scientific computing

CERN Document Server

Apache Spark usage and deployment models for scientific computing

Author: Castro Diogo
Kothuri Prasanth
Mrowczynski Piotr
Piparo Danilo
Tejedor Enric
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics, CERN accelerator logging system and infrastructure monitoring. The Hadoop Service has started to expand its user base for researchers who want to perform analysis with big data technologies. Among many frameworks, Apache Spark is currently getting the most traction from various user communities and new ways to deploy Spark such as Apache Mesos or Spark on Kubernetes have started to evolve rapidly. Meanwhile, notebook web applications such as Jupyter offer the ability to perform interactive data analytics and visualizations without the need to install additional software. CERN already provides a web platform, called SWAN (Service for Web-based ANalysis), where users can write and run their analyses in the form of notebooks, seamlessly accessing the data and software they need. The first part of the presentation talks about several recent integrations and optimizations to the Apache Spark computing platform to enable HEP data processing and CERN accelerator logging system analytics. The optimizations and integrations, include, but not limited to, access of kerberized resources, xrootd connector enabling remote access to EOS storage and integration with SWAN for interactive data analysis, thus forming a truly Unified Analytics Platform. The second part of the talk touches upon the evolution of the Apache Spark data analytics platform, particularly sharing the recent work done to run Spark on Kubernetes on the virtualized and container-based infrastructure in Openstack. This deployment model allows for elastic scaling of data analytics workloads enabling efficient, on-demand utilization of resources in private or public clouds

Directory of Open Access Journals

Apache Spark usage and deployment models for scientific computing

Author: Castro Diogo
Kothuri Prasanth
Mrowczynski Piotr
Piparo Danilo
Tejedor Enric
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

ScienceBox Converging to Kubernetes containers in production for on-premise and hybrid clouds for CERNBox, SWAN, and EOS

Author: Bocchi Enrico
Canali Luca
Castro Diogo
Gonzalez Labrador Hugo
Kothuri Prasanth
Malawski Maciej
Mościcki Jakub T.
Mrowczynski Piotr
Publication venue: 'EDP Sciences'
Publication date: 01/01/2020
Field of study

Docker containers are the de-facto standard to package, distribute, and run applications on cloud-based infrastructures. Commercial providers and private clouds expand their offer with container orchestration engines, making the management of resources and containerized applications tightly integrated. The Storage Group of CERN IT leverages on container technologies to provide ScienceBox: An integrated software bundle with storage and computing services for general purposes and scientific use. ScienceBox features distributed scalable storage, sync&share functionalities, and a web-based data analysis service, and can be deployed on a single machine or scaled-out across multiple servers. ScienceBox has proven to be helpful in different contexts, from High Energy Physics analysis to education for high schools, and has been successfully deployed on different cloud infrastructure and heterogeneous hardware

Directory of Open Access Journals

ScienceBox Converging to Kubernetes containers in production for on-premise and hybrid clouds for CERNBox, SWAN, and EOS

Author: Bocchi Enrico
Canali Luca
Castro Diogo
Kothuri Prasanth
Labrador Hugo Gonzalez
Malawski Maciej
Mościcki Jakub T
Mrowczynski Piotr
Publication venue: 'EDP Sciences'
Publication date: 01/01/2020
Field of study

Docker containers are the de-facto standard to package, distribute, and run applications on cloud-based infrastructures. Commercial providers and private clouds expand their offer with container orchestration engines, making the management of resources and containerized applications tightly integrated. The Storage Group of CERN IT leverages on container technologies to provide ScienceBox: An integrated software bundle with storage and computing services for general purposes and scientific use. ScienceBox features distributed scalable storage, sync&share; functionalities, and a web-based data analysis service, and can be deployed on a single machine or scaled-out across multiple servers. ScienceBox has proven to be helpful in different contexts, from High Energy Physics analysis to education for high schools, and has been successfully deployed on different cloud infrastructure and heterogeneous hardware

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server

Transverse momentum fluctuations in ultrarelativistic Pb

Author: A. Białas
A. Białas
A. Białas
D. J. Prindle
M. Chojnacki
Piotr Bożek
R. J. Glauber
S. Mrowczynski
Wojciech Broniowski
Y. Hama
Publication venue: 'American Physical Society (APS)'
Publication date
Field of study

Crossref

Evolution of the Hadoop Platform and Ecosystem for High Energy Physics

Author: Baranowski Zbigniew
Canali Luca
Castellotti Riccardo
Kleszcz Emil
Kothuri Prasanth
Luna Duran Jose Carlos
Martin Marquez Manuel
Matos de Barros Nuno Guilherme
Motesnitsalis Evangelos
Mrowczynski Piotr
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

The interest in using scalable data processing solutions based on Apache Hadoop ecosystem is constantly growing in the High Energy Physics (HEP) community. This drives the need for increased reliability and availability of the central Hadoop service and underlying infrastructure provided to the community by the CERN IT department. This paper reports on the overall status of the Hadoop platform and related Hadoop and Spark service at CERN, detailing recent enhancements and features introduced in many areas including the service configuration, availability, alerting, monitoring and data protection, in order to meet the new requirements posed by the users’ community

Directory of Open Access Journals

Evolution of the Hadoop platform and ecosystem for high energy physics

Author: Baranowski Zbigniew
Canali Luca
Castellotti Riccardo
Kleszcz Emil
Kothuri Prasanth
Luna Duran Jose Carlos
Martin Marquez Manuel
Matos de Barros Nuno Guilherme
Motesnitsalis Evangelos
Mrowczynski Piotr
Publication venue: 'EDP Sciences'
Publication date: 01/01/2019
Field of study

EDP Sciences OAI-PMH repository (1.2.0)

CERN Document Server