Search CORE

18,572 research outputs found

The archive solution for distributed workflow management agents of the CMS experiment at LHC

Author: Fischer Nils Leif
Guo Yuyi
Kuznetsov Valentin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/01/2018
Field of study

The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate

\mathcal{O}

(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.Comment: This is a pre-print of an article published in Computing and Software for Big Science. The final authenticated version is available online at: https://doi.org/10.1007/s41781-018-0005-

arXiv.org e-Print Archive

CERN Document Server

Gaining insight from large data volumes with ease

Author: Kuznetsov Valentin
Publication venue: 'EDP Sciences'
Publication date: 18/09/2018
Field of study

Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by the desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction, various optimization of data workflows, and pipelines). In this paper, we discuss how modern technologies are transforming well established patterns in HEP communities. The new data insight can be achieved by embracing Big Data tools for a variety of use-cases, from analytics and monitoring to training Machine Learning models on a terabyte scale. We provide concrete examples within context of the CMS experiment where Big Data tools are already playing or would play a significant role in daily operations

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Directory of Open Access Journals

CERN Document Server

File-based data flow in the CMS Filter Farm

Author: Darlea G.-L.
Gomez-Ceballos Guillelmo
Paus Christoph M. E.
Veverka Jan
Publication venue: 'IOP Publishing'
Publication date: 01/04/2015
Field of study

During the LHC Long Shutdown 1, the CMS Data Acquisition system underwent a partial redesign to replace obsolete network equipment, use more homogeneous switching technologies, and prepare the ground for future upgrades of the detector front-ends. The software and hardware infrastructure to provide input, execute the High Level Trigger (HLT) algorithms and deal with output data transport and storage has also been redesigned to be completely file- based. This approach provides additional decoupling between the HLT algorithms and the input and output data flow. All the metadata needed for bookkeeping of the data flow and the HLT process lifetimes are also generated in the form of small "documents" using the JSON encoding, by either services in the flow of the HLT execution (for rates etc.) or watchdog processes. These "files" can remain memory-resident or be written to disk if they are to be used in another part of the system (e.g. for aggregation of output data). We discuss how this redesign improves the robustness and flexibility of the CMS DAQ and the performance of the system currently being commissioned for the LHC Run 2.National Science Foundation (U.S.)United States. Department of Energ

DSpace@MIT

Pattern Reification as the Basis for Description-Driven Systems

Author: Estrella Florida
Goff Jean-Marie Le
Kovacs Zsolt
McClatchey Richard
Solomonides Tony
Toth Norbert
Publication venue
Publication date: 01/01/2003
Field of study

One of the main factors driving object-oriented software development for information systems is the requirement for systems to be tolerant to change. To address this issue in designing systems, this paper proposes a pattern-based, object-oriented, description-driven system (DDS) architecture as an extension to the standard UML four-layer meta-model. A DDS architecture is proposed in which aspects of both static and dynamic systems behavior can be captured via descriptive models and meta-models. The proposed architecture embodies four main elements - firstly, the adoption of a multi-layered meta-modeling architecture and reflective meta-level architecture, secondly the identification of four data modeling relationships that can be made explicit such that they can be modified dynamically, thirdly the identification of five design patterns which have emerged from practice and have proved essential in providing reusable building blocks for data management, and fourthly the encoding of the structural properties of the five design patterns by means of one fundamental pattern, the Graph pattern. A practical example of this philosophy, the CRISTAL project, is used to demonstrate the use of description-driven data objects to handle system evolution.Comment: 20 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

UWE Bristol Research Repository

Regional Coalitions for Healthcare Improvement: Definition, Lessons, and Prospects

Author: Barbra G. Rabson
Gordon Mosser
Melinda Karp
Publication venue: Network for Regional Healthcare Improvement
Publication date: 08/08/2006
Field of study

Outlines how regional quality coalitions can collaborate to help deliver evidence-based healthcare; improve care processes; and measure, report, and reward results. Includes guidelines for starting and running a coalition and summaries of NRHI coalitions

IssueLab

Digging Deeper for New Physics in the LHC Data

Author: Asadi Pouya
Buckley Matthew R.
DiFranzo Anthony
Monteux Angelo
Shih David
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/07/2017
Field of study

In this paper we describe a novel, model-independent technique of "rectangular aggregations" for mining the LHC data for hints of new physics. A typical (CMS) search now has hundreds of signal regions, which can obscure potentially interesting anomalies. Applying our technique to the two CMS jets+MET SUSY searches, we identify a set of previously overlooked

\sim 3\sigma

excesses. Among these, four excesses survive tests of inter- and intra-search compatibility, and two are especially interesting: they are largely overlapping between the jets+MET searches and are characterized by low jet multiplicity, zero

b

-jets, and low MET and

H_T

. We find that resonant color-triplet production decaying to a quark plus an invisible particle provides an excellent fit to these two excesses and all other data -- including the ATLAS jets+MET search, which actually sees a correlated excess. We discuss the additional constraints coming from dijet resonance searches, monojet searches and pair production. Based on these results, we believe the wide-spread view that the LHC data contains no interesting excesses is greatly exaggerated.Comment: 31 pages + appendices, 14 figures, source code for recasted searches attached as auxiliary materia

arXiv.org e-Print Archive

Directory of Open Access Journals

A scalable monitoring for the CMS Filter Farm based on elasticsearch

Author: A Dupont
Andre J-M
Andronidis A
Behrens U
Branson J
Chaze O
Cittolin S
Darlea G. L.
Deldicque C
Dobson M
Erhan S
Gigi D
Glege F
Gomez-Ceballos Guillelmo
Hegeman J
Holzner A
Jimenez-Estupinan R
Masetti L
Meijers F
Meschi E
Mommsen R K
Morovic S
Nunez-Barranco-Fernandez C
O'Dell V.
Orsini L
Paus Christoph M. E.
Petrucci A
Pieri M
Racz A
Roberts P
Sakulin H
Schwick C
Stieger B
Sumorok Konstanty C
Veverka Jan
Zaza S
Zejdl P
Publication venue: 'IOP Publishing'
Publication date: 11/05/2015
Field of study

A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2

DSpace@MIT

CERN Document Server

Multi-Output Broadacre Agricultural Production: Estimating A Cost Function Using Quasi-Micro Farm Level Data From Australia

Author: McLaren Keith Robert
Nguyen Duong T.M.
Zhao Xueyan
Publication venue
Publication date
Field of study

Existing econometric models for Australian broadacre agricultural production are few and have become dated. This paper estimates a multi-product restricted cost function using a unique quasi-micro farm level dataset from the Australian Agricultural and Grazing Industries Survey. Both the transcendental logarithmic and normalized quadratic functional forms are employed. Heteroskedasticity caused by the particular nature of the quasi-micro data is also assessed and accommodated. Allen partial elasticities of input substitution and own-and cross-price input demand elasticities are computed. The estimated demands for most production factors are inelastic to prices. Hired labour is responsive to own price and cropping input prices.Production Economics, Research Methods/ Statistical Methods,

Research Papers in Economics

Fuzzy Content Mining for Targeted Advertisement

Author: Wang Yezhou
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2008
Field of study

Content-targeted advertising system is becoming an increasingly important part of the funding source of free web services. Highly efficient content analysis is the pivotal key of such a system. This project aims to establish a content analysis engine involving fuzzy logic that is able to automatically analyze real user-posted Web documents such as blog entries. Based on the analysis result, the system matches and retrieves the most appropriate Web advertisements. The focus and complexity is on how to better estimate and acquire the keywords that represent a given Web document. Fuzzy Web mining concept will be applied to synthetically consider multiple factors of Web content. A Fuzzy Ranking System is established based on certain fuzzy (and some crisp) rules, fuzzy sets, and membership functions to get the best candidate keywords. Once it is has obtained the keywords, the system will retrieve corresponding advertisements from certain providers through Web services as matched advertisements, similarly to retrieving a products list from Amazon.com. In 87% of the cases, the results of this system can match the accuracy of the Google Adwords system. Furthermore, this expandable system will also be a solid base for further research and development on this topic

SJSU ScholarWorks