23 research outputs found

    Unveiling User Behavior on Summit Login Nodes as a User

    Full text link
    We observe and analyze usage of the login nodes of the leadership class Summit supercomputer from the perspective of an ordinary user -- not a system administrator -- by periodically sampling user activities (job queues, running processes, etc.) for two full years (2020-2021). Our findings unveil key usage patterns that evidence misuse of the system, including gaming the policies, impairing I/O performance, and using login nodes as a sole computing resource. Our analysis highlights observed patterns for the execution of complex computations (workflows), which are key for processing large-scale applications.Comment: International Conference on Computational Science (ICCS), 202

    WfBench: Automated Generation of Scientific Workflow Benchmarks

    Full text link
    The prevalence of scientific workflows with high computational demands calls for their execution on various distributed computing platforms, including large-scale leadership-class high-performance computing (HPC) clusters. To handle the deployment, monitoring, and optimization of workflow executions, many workflow systems have been developed over the past decade. There is a need for workflow benchmarks that can be used to evaluate the performance of workflow systems on current and future software stacks and hardware platforms. We present a generator of realistic workflow benchmark specifications that can be translated into benchmark code to be executed with current workflow systems. Our approach generates workflow tasks with arbitrary performance characteristics (CPU, memory, and I/O usage) and with realistic task dependency structures based on those seen in production workflows. We present experimental results that show that our approach generates benchmarks that are representative of production workflows, and conduct a case study to demonstrate the use and usefulness of our generated benchmarks to evaluate the performance of workflow systems under different configuration scenarios

    Summary of the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1)

    Get PDF
    Challenges related to development, deployment, and maintenance of reusable software for science are becoming a growing concern. Many scientists’ research increasingly depends on the quality and availability of software upon which their works are built. To highlight some of these issues and share experiences, the First Workshop on Sustainable Software for Science: Practice and Experiences (WSSSPE1) was held in November 2013 in conjunction with the SC13 Conference. The workshop featured keynote presentations and a large number (54) of solicited extended abstracts that were grouped into three themes and presented via panels. A set of collaborative notes of the presentations and discussion was taken during the workshop. Unique perspectives were captured about issues such as comprehensive documentation, development and deployment practices, software licenses and career paths for developers. Attribution systems that account for evidence of software contribution and impact were also discussed. These include mechanisms such as Digital Object Identifiers, publication of “software papers”, and the use of online systems, for example source code repositories like GitHub. This paper summarizes the issues and shared experiences that were discussed, including cross-cutting issues and use cases. It joins a nascent literature seeking to understand what drives software work in science, and how it is impacted by the reward systems of science. These incentives can determine the extent to which developers are motivated to build software for the long-term, for the use of others, and whether to work collaboratively or separately. It also explores community building, leadership, and dynamics in relation to successful scientific software

    Discovering RNA-Protein Interactome by Using Chemical Context Profiling of the RNA-Protein Interface

    Get PDF
    SummaryRNA-protein (RNP) interactions generally are required for RNA function. At least 5% of human genes code for RNA-binding proteins. Whereas many approaches can identify the RNA partners for a specific protein, finding the protein partners for a specific RNA is difficult. We present a machine-learning method that scores a protein’s binding potential for an RNA structure by utilizing the chemical context profiles of the interface from known RNP structures. Our approach is applicable even when only a single RNP structure is available. We examined 801 mammalian proteins and find that 37 (4.6%) potentially bind transfer RNA (tRNA). Most are enzymes involved in cellular processes unrelated to translation and were not known to interact with RNA. We experimentally tested six positive and three negative predictions for tRNA binding in vivo, and all nine predictions were correct. Our computational approach provides a powerful complement to experiments in discovering new RNPs

    Data-intensive scientific workflows (representations of parallelism and enactment on distributed systems)

    No full text
    Le portage d'applications manipulant de grandes masses de données sur des infrastructures de calcul distribué à grande échelle est un problème difficile. Combler l'écart entre l'application et sa description sous forme de workflow soulève des défis fia différents niveaux. Le défi au niveau de l'utilisateur final est le besoin d'exprimer la logique de l'application et des dépendances de flots de données dans un domaine non-technique. Au niveau de l'infrastructure, il s'agit d'un défi pour le portage de l'application sur infrastructures fia grande échelle en optimisant l'exploitation des ressources distribuées. Les workflows permettent le déploiement d'applications distribuées grâce fia la représentation formelle de composants les constituant, de leurs interactions et des flots de données véhiculés. Cependant, la description de workflows et leurs gestionnaires d'exécution nécessitent des améliorations pour relever les défis mentionnés. Faciliter la description du parallélisme sous une forme concise, la combinaison des données et des structures de données de haut niveau de manière cohérente est nécessaire. Cette thèse vise fia satisfaire ces exigences. Partant du cas d'utilisation de traitement d'images médicales, plusieurs stratégies sont développées afin d'exprimer le parallélisme et l'exécution asynchrone de worflkows complexes en fournissant une expression concise et un gestionnaire d'exécution interfacé avec des infrastructures fia grande échelle. Les contributions principales de cette thèse sont: a) Un langage riche de workflows disposant de deux représentations. L'exécution des applications de traitement d'images médicales décrites avec ce langage sur la grille de calcul européenne (EGI) donne des résultats expérimentaux fructueux. b) Une extension d'un environnement d'exécution existant de flots applicatifs (Taverna) pour permettre l'exécution de l'application sur les infrastructures fia grande échelle.Porting data-intensive applications on large scale distributed computing infrastructures is not trivial. Bridging the gap between application and its workflow expression poses challenges at different levels. The challenge at the end-user level is a need to express the application's logic and data flow requirements from a non-technical domain. At the infrastructure level, it is a challenge to port the application such that a maximum exploitation of the underlying resources can takes place. Workflows enable distributed application deployment by recognizing the application component's inter-connections and the flow among them. However, workflow expressions and engines need enhancements to meet the challenges outlined. Facilitation of a concise expression of parallelism, data combinations and higher level data structures in a coherent fashion is required. This work targets to fulfill these requirements. It is driven by the use-cases in the field of medical image processing domain. Various strategies are developed to efficiently express asynchronous and maximum parallel execution of complex flows by providing concise expression and enactments interfaced with large scale distributed computing infrastructures. The main contributions of this research are: a) A rich workflow language with two-way expression and fruitful results from the experiments carried out on enactment of medical image processing applications workflows on the European Grid Computing Infrastructure; and b) Extension of an existing workflow environment (Taverna) to interface with the Grid Computing Infrastructures.NICE-BU Sciences (060882101) / SudocSudocFranceF

    Bibliometric Survey on Supply Chain in Healthcare using Artificial Intelligence

    Get PDF
    With the increasing demand for the supply chain in the service sector, new techniques have become essential. With the latest emerging technologies, it has become crucial to have a bibliometric analysis of supply chain management (SCM) in the healthcare sector. The paper represents the analysis of research supply chain in the service sector using artificial intelligence techniques. The main aim of the analysis is to accomplish the technology in healthcare supply chain management using SCOPUS, Google Scholar, Research Gate, etc. and the various softwares like Gephi, GSP Visualizer, etc. The bibliometric analysis shows that India has ranked 4th in publishing documents on healthcare supply chain and artificial intelligence after the US, China and the UK. The prominent keywords used are supply chain management and the healthcare sector. Artificial Intelligence is another vital keyword for this study, which applies to all domains

    A data-driven workflow language for grids based on array programming principles

    Get PDF
    International audienceDifferent scientific workflow languages have been developed to help programmers in designing complex data analysis pro- cedures. However, little effort has been invested in com- paring and finding a common root for existing approaches. This work is motivated by the search for a scientific workflow language which coherently integrates different aspects of dis- tributed computing. The language proposed is data-driven for easing the expression of parallel flows. It leverages array programming principles to ease data-intensive applications design. It provides a rich set of control structures and it- eration strategies while avoiding unnecessary programming constructs. It allows programmers to express a wide set of applications in a compact framework

    Workflow-based comparison of two Distributed Computing Infrastructures

    Get PDF
    International audiencePorting applications to Distributed Computing Infrastruc- tures (DCIs) is eased by the use of workflow abstractions. Yet, estimating the impact of the execution DCI on applica- tion performance is difficult due to the heterogeneity of the resources available, middleware and operation models. This paper describes a workflow-based experimental method to acquire objective performance comparison criterions when dealing with completely different DCIs. Experiments were conducted on the European EGI and the French Grid'5000 infrastructures to highlight raw performance variations and identify their causes. The results obtained also show that it is possible to conduct experiments on a production infras- tructure with similar reproducibility as on an experimental platform
    corecore