2,199 research outputs found
Optimal statistical inference in the presence of systematic uncertainties using neural network optimization based on binned Poisson likelihoods with nuisance parameters
Data analysis in science, e.g., high-energy particle physics, is often
subject to an intractable likelihood if the observables and observations span a
high-dimensional input space. Typically the problem is solved by reducing the
dimensionality using feature engineering and histograms, whereby the latter
technique allows to build the likelihood using Poisson statistics. However, in
the presence of systematic uncertainties represented by nuisance parameters in
the likelihood, the optimal dimensionality reduction with a minimal loss of
information about the parameters of interest is not known. This work presents a
novel strategy to construct the dimensionality reduction with neural networks
for feature engineering and a differential formulation of histograms so that
the full workflow can be optimized with the result of the statistical
inference, e.g., the variance of a parameter of interest, as objective. We
discuss how this approach results in an estimate of the parameters of interest
that is close to optimal and the applicability of the technique is demonstrated
with a simple example based on pseudo-experiments and a more complex example
from high-energy particle physics
Reducing the dependence of the neural network function to systematic uncertainties in the input space
Applications of neural networks to data analyses in natural sciences are
complicated by the fact that many inputs are subject to systematic
uncertainties. To control the dependence of the neural network function to
variations of the input space within these systematic uncertainties, several
methods have been proposed. In this work, we propose a new approach of training
the neural network by introducing penalties on the variation of the neural
network output directly in the loss function. This is achieved at the cost of
only a small number of additional hyperparameters. It can also be pursued by
treating all systematic variations in the form of statistical weights. The
proposed method is demonstrated with a simple example, based on
pseudo-experiments, and by a more complex example from high-energy particle
physics
Identifying the relevant dependencies of the neural network response on characteristics of the input space
The relation between the input and output spaces of neural networks (NNs) is
investigated to identify those characteristics of the input space that have a
large influence on the output for a given task. For this purpose, the NN
function is decomposed into a Taylor expansion in each element of the input
space. The Taylor coefficients contain information about the sensitivity of the
NN response to the inputs. A metric is introduced that allows for the
identification of the characteristics that mostly determine the performance of
the NN in solving a given task. Finally, the capability of this metric to
analyze the performance of the NN is evaluated based on a task common to data
analyses in high-energy particle physics experiments
Performance of the bwHPC cluster in the production of μ -> t embedded events used for the prediction of background for H -> tt analyses
In high energy physics, a main challenge is the accurate prediction of background
events at a particle detector. These events are usually estimated by simulation.
As an alternative, data-driven methods use observed events to derive a background
prediction and are often less computationally expensive than simulation.
The lepton embedding method presents a data-driven method to estimate the
background from Z ! events for Higgs boson analyses in the same final state.
Z ! μμ events recorded by the CMS experiment are selected, the muons are
removed from the event and replaced with simulated leptons with the same
kinematic properties as the removed muons. The resulting hybrid event provides
an improved description of pile-up and the underlying event compared to the simulation
of the full proton-proton collision. In this paper the production of these
hybrid events used by the CMS collaboration is described. The production relies
on the resources made available by the bwHPC project. The data used for this
purpose correspond to 65 million di-muon events collected in 2017 by CMS
Dynamic provisioning of a HEP computing infrastructure on a shared hybrid HPC system
Experiments in high-energy physics (HEP) rely on elaborate hardware, software and computing systems to sustain the high data rates necessary to study rare physics processes. The Institut fr Experimentelle Kernphysik (EKP) at KIT is a member of the CMS and Belle II experiments, located at the LHC and the Super-KEKB accelerators, respectively. These detectors share the requirement, that enormous amounts of measurement data must be processed and analyzed and a comparable amount of simulated events is required to compare experimental results with theoretical predictions. Classical HEP computing centers are dedicated sites which support multiple experiments and have the required software pre-installed. Nowadays, funding agencies encourage research groups to participate in shared HPC cluster models, where scientist from different domains use the same hardware to increase synergies. This shared usage proves to be challenging for HEP groups, due to their specialized software setup which includes a custom OS (often Scientific Linux), libraries and applications.
To overcome this hurdle, the EKP and data center team of the University of Freiburg have developed a system to enable the HEP use case on a shared HPC cluster. To achieve this, an OpenStack-based virtualization layer is installed on top of a bare-metal cluster. While other user groups can run their batch jobs via the Moab workload manager directly on bare-metal, HEP users can request virtual machines with a specialized machine image which contains a dedicated operating system and software stack. In contrast to similar installations, in this hybrid setup, no static partitioning of the cluster into a physical and virtualized segment is required. As a unique feature, the placement of the virtual machine on the cluster nodes is scheduled by Moab and the job lifetime is coupled to the lifetime of the virtual machine. This allows for a seamless integration with the jobs sent by other user groups and honors the fairshare policies of the cluster. The developed thin integration layer between OpenStack and Moab can be adapted to other batch servers and virtualization systems, making the concept also applicable for other cluster operators.
This contribution will report on the concept and implementation of an OpenStack-virtualized cluster used for HEP work ows. While the full cluster will be installed in spring 2016, a test-bed setup with 800 cores has been used to study the overall system performance and dedicated HEP jobs were run in a virtualized environment over many weeks. Furthermore, the dynamic integration of the virtualized worker nodes, depending on the workload at the institute\u27s computing system, will be described
Dynamic Resource Extension for Data Intensive Computing with Specialized Software Environments on HPC Systems
Modern High Energy Physics (HEP) requires large-scale processing of extensive
amounts of scientific data. The needed computing resources are currently
provided statically by HEP specific computing centers. To increase the number
of available resources, for example to cover peak loads, the HEP computing development
team at KIT concentrates on the dynamic integration of additional
computing resources into the HEP infrastructure. Therefore, we developed ROCED,
a tool to dynamically request and integrate computing resources including
resources at HPC centers and commercial cloud providers. Since these resources
usually do not support HEP software natively, we rely on virtualization and container
technologies, which allows us to run HEP workflows on these so called
opportunistic resources. Additionally, we study the efficient processing of huge
amounts of data on a distributed infrastructure, where the data is usually stored
at HEP specific data centers and is accessed remotely over WAN. To optimize
the overall data throughput and to increase the CPU efficiency, we are currently
developing an automated caching system for frequently used data that is transparently
integrated into the distributed HEP computing infrastructure
Dynamic Virtualized Deployment of Particle Physics Environments on a High Performance Computing Cluster
The NEMO High Performance Computing Cluster at the University of Freiburg has
been made available to researchers of the ATLAS and CMS experiments. Users
access the cluster from external machines connected to the World-wide LHC
Computing Grid (WLCG). This paper describes how the full software environment
of the WLCG is provided in a virtual machine image. The interplay between the
schedulers for NEMO and for the external clusters is coordinated through the
ROCED service. A cloud computing infrastructure is deployed at NEMO to
orchestrate the simultaneous usage by bare metal and virtualized jobs. Through
the setup, resources are provided to users in a transparent, automatized, and
on-demand way. The performance of the virtualized environment has been
evaluated for particle physics applications
- …