18,572 research outputs found

    The archive solution for distributed workflow management agents of the CMS experiment at LHC

    Full text link
    The CMS experiment at the CERN LHC developed the Workflow Management Archive system to persistently store unstructured framework job report documents produced by distributed workflow management agents. In this paper we present its architecture, implementation, deployment, and integration with the CMS and CERN computing infrastructures, such as central HDFS and Hadoop Spark cluster. The system leverages modern technologies such as a document oriented database and the Hadoop eco-system to provide the necessary flexibility to reliably process, store, and aggregate O\mathcal{O}(1M) documents on a daily basis. We describe the data transformation, the short and long term storage layers, the query language, along with the aggregation pipeline developed to visualize various performance metrics to assist CMS data operators in assessing the performance of the CMS computing system.Comment: This is a pre-print of an article published in Computing and Software for Big Science. The final authenticated version is available online at: https://doi.org/10.1007/s41781-018-0005-

    Gaining insight from large data volumes with ease

    Get PDF
    Efficient handling of large data-volumes becomes a necessity in today's world. It is driven by the desire to get more insight from the data and to gain a better understanding of user trends which can be transformed into economic incentives (profits, cost-reduction, various optimization of data workflows, and pipelines). In this paper, we discuss how modern technologies are transforming well established patterns in HEP communities. The new data insight can be achieved by embracing Big Data tools for a variety of use-cases, from analytics and monitoring to training Machine Learning models on a terabyte scale. We provide concrete examples within context of the CMS experiment where Big Data tools are already playing or would play a significant role in daily operations

    File-based data flow in the CMS Filter Farm

    Get PDF
    During the LHC Long Shutdown 1, the CMS Data Acquisition system underwent a partial redesign to replace obsolete network equipment, use more homogeneous switching technologies, and prepare the ground for future upgrades of the detector front-ends. The software and hardware infrastructure to provide input, execute the High Level Trigger (HLT) algorithms and deal with output data transport and storage has also been redesigned to be completely file- based. This approach provides additional decoupling between the HLT algorithms and the input and output data flow. All the metadata needed for bookkeeping of the data flow and the HLT process lifetimes are also generated in the form of small "documents" using the JSON encoding, by either services in the flow of the HLT execution (for rates etc.) or watchdog processes. These "files" can remain memory-resident or be written to disk if they are to be used in another part of the system (e.g. for aggregation of output data). We discuss how this redesign improves the robustness and flexibility of the CMS DAQ and the performance of the system currently being commissioned for the LHC Run 2.National Science Foundation (U.S.)United States. Department of Energ

    Pattern Reification as the Basis for Description-Driven Systems

    Full text link
    One of the main factors driving object-oriented software development for information systems is the requirement for systems to be tolerant to change. To address this issue in designing systems, this paper proposes a pattern-based, object-oriented, description-driven system (DDS) architecture as an extension to the standard UML four-layer meta-model. A DDS architecture is proposed in which aspects of both static and dynamic systems behavior can be captured via descriptive models and meta-models. The proposed architecture embodies four main elements - firstly, the adoption of a multi-layered meta-modeling architecture and reflective meta-level architecture, secondly the identification of four data modeling relationships that can be made explicit such that they can be modified dynamically, thirdly the identification of five design patterns which have emerged from practice and have proved essential in providing reusable building blocks for data management, and fourthly the encoding of the structural properties of the five design patterns by means of one fundamental pattern, the Graph pattern. A practical example of this philosophy, the CRISTAL project, is used to demonstrate the use of description-driven data objects to handle system evolution.Comment: 20 pages, 10 figure

    Regional Coalitions for Healthcare Improvement: Definition, Lessons, and Prospects

    Get PDF
    Outlines how regional quality coalitions can collaborate to help deliver evidence-based healthcare; improve care processes; and measure, report, and reward results. Includes guidelines for starting and running a coalition and summaries of NRHI coalitions

    Digging Deeper for New Physics in the LHC Data

    Full text link
    In this paper we describe a novel, model-independent technique of "rectangular aggregations" for mining the LHC data for hints of new physics. A typical (CMS) search now has hundreds of signal regions, which can obscure potentially interesting anomalies. Applying our technique to the two CMS jets+MET SUSY searches, we identify a set of previously overlooked 3σ\sim 3\sigma excesses. Among these, four excesses survive tests of inter- and intra-search compatibility, and two are especially interesting: they are largely overlapping between the jets+MET searches and are characterized by low jet multiplicity, zero bb-jets, and low MET and HTH_T. We find that resonant color-triplet production decaying to a quark plus an invisible particle provides an excellent fit to these two excesses and all other data -- including the ATLAS jets+MET search, which actually sees a correlated excess. We discuss the additional constraints coming from dijet resonance searches, monojet searches and pair production. Based on these results, we believe the wide-spread view that the LHC data contains no interesting excesses is greatly exaggerated.Comment: 31 pages + appendices, 14 figures, source code for recasted searches attached as auxiliary materia

    A scalable monitoring for the CMS Filter Farm based on elasticsearch

    Get PDF
    A flexible monitoring system has been designed for the CMS File-based Filter Farm making use of modern data mining and analytics components. All the metadata and monitoring information concerning data flow and execution of the HLT are generated locally in the form of small documents using the JSON encoding. These documents are indexed into a hierarchy of elasticsearch (es) clusters along with process and system log information. Elasticsearch is a search server based on Apache Lucene. It provides a distributed, multitenant-capable search and aggregation engine. Since es is schema-free, any new information can be added seamlessly and the unstructured information can be queried in non-predetermined ways. The leaf es clusters consist of the very same nodes that form the Filter Farm thus providing natural horizontal scaling. A separate central" es cluster is used to collect and index aggregated information. The fine-grained information, all the way to individual processes, remains available in the leaf clusters. The central es cluster provides quasi-real-time high-level monitoring information to any kind of client. Historical data can be retrieved to analyse past problems or correlate them with external information. We discuss the design and performance of this system in the context of the CMS DAQ commissioning for LHC Run 2

    Multi-Output Broadacre Agricultural Production: Estimating A Cost Function Using Quasi-Micro Farm Level Data From Australia

    Get PDF
    Existing econometric models for Australian broadacre agricultural production are few and have become dated. This paper estimates a multi-product restricted cost function using a unique quasi-micro farm level dataset from the Australian Agricultural and Grazing Industries Survey. Both the transcendental logarithmic and normalized quadratic functional forms are employed. Heteroskedasticity caused by the particular nature of the quasi-micro data is also assessed and accommodated. Allen partial elasticities of input substitution and own-and cross-price input demand elasticities are computed. The estimated demands for most production factors are inelastic to prices. Hired labour is responsive to own price and cropping input prices.Production Economics, Research Methods/ Statistical Methods,

    Fuzzy Content Mining for Targeted Advertisement

    Get PDF
    Content-targeted advertising system is becoming an increasingly important part of the funding source of free web services. Highly efficient content analysis is the pivotal key of such a system. This project aims to establish a content analysis engine involving fuzzy logic that is able to automatically analyze real user-posted Web documents such as blog entries. Based on the analysis result, the system matches and retrieves the most appropriate Web advertisements. The focus and complexity is on how to better estimate and acquire the keywords that represent a given Web document. Fuzzy Web mining concept will be applied to synthetically consider multiple factors of Web content. A Fuzzy Ranking System is established based on certain fuzzy (and some crisp) rules, fuzzy sets, and membership functions to get the best candidate keywords. Once it is has obtained the keywords, the system will retrieve corresponding advertisements from certain providers through Web services as matched advertisements, similarly to retrieving a products list from Amazon.com. In 87% of the cases, the results of this system can match the accuracy of the Google Adwords system. Furthermore, this expandable system will also be a solid base for further research and development on this topic
    corecore