328 research outputs found

    McRunjob: A High Energy Physics Workflow Planner for Grid Production Processing

    Full text link
    McRunjob is a powerful grid workflow manager used to manage the generation of large numbers of production processing jobs in High Energy Physics. In use at both the DZero and CMS experiments, McRunjob has been used to manage large Monte Carlo production processing since 1999 and is being extended to uses in regular production processing for analysis and reconstruction. Described at CHEP 2001, McRunjob converts core metadata into jobs submittable in a variety of environments. The powerful core metadata description language includes methods for converting the metadata into persistent forms, job descriptions, multi-step workflows, and data provenance information. The language features allow for structure in the metadata by including full expressions, namespaces, functional dependencies, site specific parameters in a grid environment, and ontological definitions. It also has simple control structures for parallelization of large jobs. McRunjob features a modular design which allows for easy expansion to new job description languages or new application level tasks.Comment: CHEP 2003 serial number TUCT00

    What's in a name? Exploiting URIs to enrich provenance explanations in plain English

    No full text
    Provenance allows decision-makers to evaluate the importance of pieces of data. PROV is the standardised model of provenance for use on the web, particularly suited for situations where data is generated by systems under distributed control, such as in coalition operations. If human decision-makers are to make effective use of provenance data, they need to understand it, and this work establishes techniques for explaining PROV graphs to human users in natural English.In this paper, we demonstrate the potential role of exploiting the linguistic information that is informally encoded in the URIs used to denote provenance data resources to generate these more natural English explanations of provenance. We show how this additional linguistic information allows us to generate richer, more readable explanation texts, thus enabling better decision-making and increasing the value of preexisting provenance data.<br/

    Modus: a Datalog dialect for building container images

    Get PDF
    Containers help share and deploy software by packaging it with all its dependencies. Tools, like Docker or Kubernetes, spawn containers from images as specified by a build system’s language, such as Dockerfile. A build system takes many parameters to build an image, including OS and application versions. These build parameters can interact: setting one can restrict another. Dockerfile lacks support for reifying and constraining these interactions, thus forcing developers to write a build script per workflow. As a result, developers have resorted to creating ad-hoc solutions such as templates or domain-specific frameworks that harm performance and complicate maintenance because they are verbose and mix languages. To address this problem, we introduce Modus, a Datalog dialect for building container images. Modus' key insight is that container definitions naturally map to proof trees of Horn clauses. In these trees, container configurations correspond to logical facts, build instructions correspond to logic rules, and the build tree is computed as the minimal proof of the Datalog query specifying the target image. Modus relies on Datalog’s expressivity to specify complex workflows with concision and facilitate automatic parallelisation. We evaluated Modus by porting build systems of six popular Docker Hub images to Modus. Modus reduced the code size by 20.1% compared to the used ad-hoc solutions, while imposing a negligible performance overhead, preserving the original image size and image efficiency. We also provide a detailed analysis of porting OpenJDK image build system to Modus

    Workflow Engineering in Materials Design within the BATTERY 2030+ Project

    Get PDF
    In recent years, modeling and simulation of materials have become indispensable to complement experiments in materials design. High-throughput simulations increasingly aid researchers in selecting the most promising materials for experimental studies or by providing insights inaccessible by experiment. However, this often requires multiple simulation tools to meet the modeling goal. As a result, methods and tools are needed to enable extensive-scale simulations with streamlined execution of all tasks within a complex simulation protocol, including the transfer and adaptation of data between calculations. These methods should allow rapid prototyping of new protocols and proper documentation of the process. Here an overview of the benefits and challenges of workflow engineering in virtual material design is presented. Furthermore, a selection of prominent scientific workflow frameworks used for the research in the BATTERY 2030+ project is presented. Their strengths and weaknesses as well as a selection of use cases in which workflow frameworks significantly contributed to the respective studies are discussed

    Towards the domain agnostic generation of natural language explanations from provenance graphs for casual users

    No full text
    As more systems become PROV-enabled, there will be a cor- responding increase in the need to communicate provenance data directly to users. Whilst there are a number of existing methods for doing this — formally, diagrammatically, and textually — there are currently no application-generic techniques for generating linguistic explanations of provenance. The principal reason for this is that a certain amount of linguistic information is required to transform a provenance graph — such as in PROV — into a textual explanation, and if this information is not available as an annotation, this transformation is presently not possible. In this paper, we describe how we have adapted the common ‘consensus’ architecture from the field of natural language generation to achieve this graph transformation, resulting in the novel PROVglish architecture. We then present an approach to garnering the necessary linguistic information from a PROV dataset, which involves exploiting the linguistic information informally encoded in the URIs denoting provenance resources. We finish by detailing an evaluation undertaken to assess the effectiveness of this approach to lexicalisation, demonstrating a significant improvement in terms of fluency, comprehensibility, and grammatical correctness

    Why Are Conversational Assistants Still Black Boxes? The Case For Transparency

    Full text link
    Much has been written about privacy in the context of conversational and voice assistants. Yet, there have been remarkably few developments in terms of the actual privacy offered by these devices. But how much of this is due to the technical and design limitations of speech as an interaction modality? In this paper, we set out to reframe the discussion on why commercial conversational assistants do not offer meaningful privacy and transparency by demonstrating how they \emph{could}. By instrumenting the open-source voice assistant Mycroft to capture audit trails for data access, we demonstrate how such functionality could be integrated into big players in the sector like Alexa and Google Assistant. We show that this problem can be solved with existing technology and open standards and is thus fundamentally a business decision rather than a technical limitation.Comment: To appear in the Proceedings of the 2023 ACM conference on Conversational User Interfaces (CUI 23
    • …
    corecore