The Large Hadron Collider (LHC), which collides protons at an energy of 14
TeV, produces hundreds of exabytes of data per year, making it one of the
largest sources of data in the world today. At present it is not possible to
even transfer most of this data from the four main particle detectors at the
LHC to "offline" data facilities, much less to permanently store it for future
processing. For this reason the LHC detectors are equipped with real-time
analysis systems, called triggers, which process this volume of data and select
the most interesting proton-proton collisions. The LHC experiment triggers
reduce the data produced by the LHC by between 1/1000 and 1/100000, to tens of
petabytes per year, allowing its economical storage and further analysis. The
bulk of the data-reduction is performed by custom electronics which ignores
most of the data in its decision making, and is therefore unable to exploit the
most powerful known data analysis strategies. I cover the present status of
real-time data analysis at the LHC, before explaining why the future upgrades
of the LHC experiments will increase the volume of data which can be sent off
the detector and into off-the-shelf data processing facilities (such as CPU or
GPU farms) to tens of exabytes per year. This development will simultaneously
enable a vast expansion of the physics programme of the LHC's detectors, and
make it mandatory to develop and implement a new generation of real-time
multivariate analysis tools in order to fully exploit this new potential of the
LHC. I explain what work is ongoing in this direction and motivate why more
effort is needed in the coming years.Comment: Contribution to the proceedings of the HEPML workshop NIPS 2014. 20
pages, 5 figure