18,572 research outputs found
The archive solution for distributed workflow management agents of the CMS experiment at LHC
The CMS experiment at the CERN LHC developed the Workflow Management Archive
system to persistently store unstructured framework job report documents
produced by distributed workflow management agents. In this paper we present
its architecture, implementation, deployment, and integration with the CMS and
CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster.
The system leverages modern technologies such as a document oriented database
and the Hadoop eco-system to provide the necessary flexibility to reliably
process, store, and aggregate (1M) documents on a daily basis. We
describe the data transformation, the short and long term storage layers, the
query language, along with the aggregation pipeline developed to visualize
various performance metrics to assist CMS data operators in assessing the
performance of the CMS computing system.Comment: This is a pre-print of an article published in Computing and Software
for Big Science. The final authenticated version is available online at:
https://doi.org/10.1007/s41781-018-0005-
Gaining insight from large data volumes with ease
Efficient handling of large data-volumes becomes a necessity in today's
world. It is driven by the desire to get more insight from the data and to gain
a better understanding of user trends which can be transformed into economic
incentives (profits, cost-reduction, various optimization of data workflows,
and pipelines). In this paper, we discuss how modern technologies are
transforming well established patterns in HEP communities. The new data insight
can be achieved by embracing Big Data tools for a variety of use-cases, from
analytics and monitoring to training Machine Learning models on a terabyte
scale. We provide concrete examples within context of the CMS experiment where
Big Data tools are already playing or would play a significant role in daily
operations
File-based data flow in the CMS Filter Farm
During the LHC Long Shutdown 1, the CMS Data Acquisition system underwent a partial redesign to replace obsolete network equipment, use more homogeneous switching technologies, and prepare the ground for future upgrades of the detector front-ends. The software and hardware infrastructure to provide input, execute the High Level Trigger (HLT) algorithms and deal with output data transport and storage has also been redesigned to be completely file- based. This approach provides additional decoupling between the HLT algorithms and the input and output data flow. All the metadata needed for bookkeeping of the data flow and the HLT process lifetimes are also generated in the form of small "documents" using the JSON encoding, by either services in the flow of the HLT execution (for rates etc.) or watchdog processes. These "files" can remain memory-resident or be written to disk if they are to be used in another part of the system (e.g. for aggregation of output data). We discuss how this redesign improves the robustness and flexibility of the CMS DAQ and the performance of the system currently being commissioned for the LHC Run 2.National Science Foundation (U.S.)United States. Department of Energ
Pattern Reification as the Basis for Description-Driven Systems
One of the main factors driving object-oriented software development for
information systems is the requirement for systems to be tolerant to change. To
address this issue in designing systems, this paper proposes a pattern-based,
object-oriented, description-driven system (DDS) architecture as an extension
to the standard UML four-layer meta-model. A DDS architecture is proposed in
which aspects of both static and dynamic systems behavior can be captured via
descriptive models and meta-models. The proposed architecture embodies four
main elements - firstly, the adoption of a multi-layered meta-modeling
architecture and reflective meta-level architecture, secondly the
identification of four data modeling relationships that can be made explicit
such that they can be modified dynamically, thirdly the identification of five
design patterns which have emerged from practice and have proved essential in
providing reusable building blocks for data management, and fourthly the
encoding of the structural properties of the five design patterns by means of
one fundamental pattern, the Graph pattern. A practical example of this
philosophy, the CRISTAL project, is used to demonstrate the use of
description-driven data objects to handle system evolution.Comment: 20 pages, 10 figure
Regional Coalitions for Healthcare Improvement: Definition, Lessons, and Prospects
Outlines how regional quality coalitions can collaborate to help deliver evidence-based healthcare; improve care processes; and measure, report, and reward results. Includes guidelines for starting and running a coalition and summaries of NRHI coalitions
Digging Deeper for New Physics in the LHC Data
In this paper we describe a novel, model-independent technique of
"rectangular aggregations" for mining the LHC data for hints of new physics. A
typical (CMS) search now has hundreds of signal regions, which can obscure
potentially interesting anomalies. Applying our technique to the two CMS
jets+MET SUSY searches, we identify a set of previously overlooked excesses. Among these, four excesses survive tests of inter- and
intra-search compatibility, and two are especially interesting: they are
largely overlapping between the jets+MET searches and are characterized by low
jet multiplicity, zero -jets, and low MET and . We find that resonant
color-triplet production decaying to a quark plus an invisible particle
provides an excellent fit to these two excesses and all other data -- including
the ATLAS jets+MET search, which actually sees a correlated excess. We discuss
the additional constraints coming from dijet resonance searches, monojet
searches and pair production. Based on these results, we believe the
wide-spread view that the LHC data contains no interesting excesses is greatly
exaggerated.Comment: 31 pages + appendices, 14 figures, source code for recasted searches
attached as auxiliary materia
A scalable monitoring for the CMS Filter Farm based on elasticsearch
A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2
Multi-Output Broadacre Agricultural Production: Estimating A Cost Function Using Quasi-Micro Farm Level Data From Australia
Existing econometric models for Australian broadacre agricultural production are few and have become dated. This paper estimates a multi-product restricted cost function using a unique quasi-micro farm level dataset from the Australian Agricultural and Grazing Industries Survey. Both the transcendental logarithmic and normalized quadratic functional forms are employed. Heteroskedasticity caused by the particular nature of the quasi-micro data is also assessed and accommodated. Allen partial elasticities of input substitution and own-and cross-price input demand elasticities are computed. The estimated demands for most production factors are inelastic to prices. Hired labour is responsive to own price and cropping input prices.Production Economics, Research Methods/ Statistical Methods,
Fuzzy Content Mining for Targeted Advertisement
Content-targeted advertising system is becoming an increasingly important part of the funding source of free web services. Highly efficient content analysis is the pivotal key of such a system. This project aims to establish a content analysis engine involving fuzzy logic that is able to automatically analyze real user-posted Web documents such as blog entries. Based on the analysis result, the system matches and retrieves the most appropriate Web advertisements. The focus and complexity is on how to better estimate and acquire the keywords that represent a given Web document. Fuzzy Web mining concept will be applied to synthetically consider multiple factors of Web content. A Fuzzy Ranking System is established based on certain fuzzy (and some crisp) rules, fuzzy sets, and membership functions to get the best candidate keywords. Once it is has obtained the keywords, the system will retrieve corresponding advertisements from certain providers through Web services as matched advertisements, similarly to retrieving a products list from Amazon.com. In 87% of the cases, the results of this system can match the accuracy of the Google Adwords system. Furthermore, this expandable system will also be a solid base for further research and development on this topic
- …