6 research outputs found

    Autonomic Execution of Computational Workflows

    Get PDF
    This paper describes the application of anautonomic paradigm to manage the complexity of softwaresystems such as computational workflows. To demonstrate ourapproach, the workflow and the services comprising it aretreated as managed resources controlled by hierarchicallyorganized autonomic managers. By applying service-orientedsoftware engineering principles, in particular enterpriseintegration patterns, we have developed a scalable, agile, selfhealingenvironment for execution of dynamic, data-drivenworkflows which are capable of assuring scientific fidelitydespite unavoidable faults and without human intervention

    Giving RSEs a Larger Stage through the Better Scientific Software Fellowship

    Full text link
    The Better Scientific Software Fellowship (BSSwF) was launched in 2018 to foster and promote practices, processes, and tools to improve developer productivity and software sustainability of scientific codes. BSSwF's vision is to grow the community with practitioners, leaders, mentors, and consultants to increase the visibility of scientific software production and sustainability. Over the last five years, many fellowship recipients and honorable mentions have identified as research software engineers (RSEs). This paper provides case studies from several of the program's participants to illustrate some of the diverse ways BSSwF has benefited both the RSE and scientific communities. In an environment where the contributions of RSEs are too often undervalued, we believe that programs such as BSSwF can be a valuable means to recognize and encourage community members to step outside of their regular commitments and expand on their work, collaborations and ideas for a larger audience.Comment: submitted to Computing in Science & Engineering (CiSE), Special Issue on the Future of Research Software Engineers in the U

    Analyzing and Evaluating the Resilience of Scheduling Scientific Applications on High Performance Computing Systems using a Simulation-based Methodology

    No full text
    Large scale systems provide a powerful computing platform for solving large and complex scientific applications. However, the inherent complexity, heterogeneity, wide distribution, and dynamism of the computing environments can lead to performance degradation of the scientific applications executing on these computing systems. Load imbalance arising from a variety of sources such as application, algorithmic, and systemic variations is one of the major contributors to their performance degradation. In general, load balancing is achieved via scheduling. Moreover, frequently occurring resource failures drastically affect the execution of applications running on high performance computing systems. Therefore, the study of deploying support for integrated scheduling and fault-tolerance mechanisms for guaranteeing that applications deployed on computing systems are resilient to failures becomes of paramount importance. Recently, several research initiatives have started to address the issue of resilience. However, the major focus of these efforts was geared more toward achieving system level resilience with less emphasis on achieving resilience at the application level. Therefore, it is increasingly important to extend the concept of resilience to the scheduling techniques at the application level for establishing a holistic approach that addresses the performability of these applications on high performance computing systems. This can be achieved by developing a comprehensive modeling framework that can be used to evaluate the resiliency of such techniques on heterogeneous computing systems for assessing the impact of failures as well as workloads in an integrated way. This dissertation presents an experimental methodology based on discrete event simulation for the analysis and the evaluation of the resilience of scheduling scientific applications on high performance computing systems. With the aid of the methodology a wide class of dependencies existing between application and computing system are captured within a deterministic model for quantifying the performance impact expected from changes in application and system characteristics. Ideally, the results obtained by employing the proposed simulation-based performance prediction framework enabled an introspective design and investigation of scheduling heuristics to reason about how to best fully optimize various often antagonistic objectives, such as minimizing application makespan and maximizing reliability

    Exploring latent weight factors and global information for food-oriented cross-modal retrieval

    No full text
    Food-oriented cross-modal retrieval aims to retrieve relevant recipes given food images or vice versa. The modality semantic gap between recipes and food images (text and image modalities) is the main challenge. Though several studies are introduced to bridge this gap, they still suffer from two major limitations: 1) The simple embedding concatenation only can capture the simple interactions rather than complex interactions between different recipe components. 2) The image feature extraction based on convolutional neural networks only considers the local features and ignores the global features of an image, as well as the interactions between different extracted features. This paper proposes a novel method based on Latent Component Weight Factors and Global Information (LCWF-GI) to learn the robust recipe and image representations for food-oriented cross-modal retrieval. This proposed method integrates the textual embeddings of different recipe components into a compact embedding to represent the recipes with the latent component-specific weight factors. A transformer encoder is utilised to capture the intra-modality interactions and the importance of different extracted image features for enhanced image representations. Finally, the bi-directional triplet loss is further used to perform retrieval learning. Experimental results on the Recipe 1M dataset show that our LCWF-GI method achieves competent improvements
    corecore