4,410 research outputs found

    Single system image: A survey

    Get PDF
    Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system. This approach encompasses decades of research using a broad variety of techniques at varying levels of abstraction, from custom hardware and distributed hypervisors to specialized operating system kernels and user-level tools. Existing classification schemes for SSI technologies are reviewed, and an updated classification scheme is proposed. A survey of implementation techniques is provided along with relevant examples. Notable deployments are examined and insights gained from hands-on experience are summarized. Issues affecting the adoption of kernel-level SSI are identified and discussed in the context of technology adoption literature

    Centralised monitoring and alerting solution for complex information management infrastructure

    Get PDF
    Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsMonitoring and alerting receive increasing attention over the last years together with growth of the information generated and managed from one side and increasing of the computing power and capacity of the computing hardware. Monitoring solutions have been tightly linked to the software being monitored making the broad view of the performance of the all interlinked services too complex and ineffective. In this work a monitoring and alerting solution based on “Prometheus” is developed. Continuous collection of metrics from various different services is performed and organized for visualization and monitoring in several layers of precision. Different exporters for various systems were evaluated and many of them enhanced. Alerting logic in response of detected performance problems and function irregularities has been developed and implemented with “Alert Manager”. Another software is developed for recording and visualizing current or past alerts and also is used as debugging tool of the alerts configuration. Visualization is implemented for Grafana with several dashboards. All tools and software packages used for implementing this monitoring and alerting solution are open source and free to use

    Low‐latency Java communication devices on RDMA‐enabled networks

    Get PDF
    This is the peer reviewed version of the following article: Expósito, R. R., Taboada, G. L., Ramos, S., Touriño, J., & Doallo, R. (2015). Low‐latency Java communication devices on RDMA‐enabled networks. Concurrency and Computation: Practice and Experience, 27(17), 4852-4879., which has been published in final form at https://doi.org/10.1002/cpe.3473. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] Providing high‐performance inter‐node communication is a key capability for running high performance computing applications efficiently on parallel architectures. In fact, current systems deployments are aggregating a significant number of cores interconnected via advanced networking hardware with Remote Direct Memory Access (RDMA) mechanisms, that enable zero‐copy and kernel‐bypass features. The use of Java for parallel programming is becoming more promising thanks to some useful characteristics of this language, particularly its built‐in multithreading support, portability, easy‐to‐learn properties, and high productivity, along with the continuous increase in the performance of the Java virtual machine. However, current parallel Java applications generally suffer from inefficient communication middleware, mainly based on protocols with high communication overhead that do not take full advantage of RDMA‐enabled networks. This paper presents efficient low‐level Java communication devices that overcome these constraints by fully exploiting the underlying RDMA hardware, providing low‐latency and high‐bandwidth communications for parallel Java applications. The performance evaluation conducted on representative RDMA networks and parallel systems has shown significant point‐to‐point performance increases compared with previous Java communication middleware, allowing to obtain up to 40% improvement in application‐level performance on 4096 cores of a Cray XE6 supercomputer.Ministerio de Economía y Competitividad; TIN2013-42148-PXunta de Galicia; GRC2013/055Ministerio de Educación y Ciencia; AP2010-434

    Scheduling in virtual infrastructure

    Get PDF
    For the execution of the scientific applications, different methods have been proposed to dynamically provide execution environments for such applications that hide the complexity of underlying distributed and heterogeneous infrastructures. Recently virtualization has emerged as a promising technology to provide such environments. Virtualization is a technology that abstracts away the details of physical hardware and provides virtualized resources for high-level scientific applications. Virtualization offers a cost-effective and flexible way to use and manage computing resources. Such an abstraction is appealing in Grid computing and Cloud computing for better matching jobs (applications) to computational resources. This work applies the virtualization concept to the Condor dynamic resource management system by using Condor Virtual Universe to harvest the existing virtual computing resources to their maximum utility. It allows existing computing resources to be dynamically provisioned at run-time by users based on application requirements instead of statically at design-time thereby lay the basis for efficient use of the available resources, thus providing way for the efficient use of the available resources.En la ejecución de aplicaciones científicas, existen diversas propuestas cuyo objetivo es proporcionar entornos adecuados de ejecución que oculten la complejidad de las infraestructuras distribuidas y heterogéneas subyacentes a las aplicaciones. Recientemente, la virtualización ha emergido como una prometedora tecnología que permite abstraer los detalles del hardware, mediante la asignación de recursos virtualizados a las aplicaciones científicas de altas necesidades de cómputo. La virtualización ofrece una solución rentable y además permite una gestión flexible de recursos. Este nivel de abstracción es deseable en entornos de Grid Computing y Cloud Computing para obtener una planificación adecuada de tarea (aplicaciones) sobre los recursos computacionales. Este trabajo aplica el concepto de virtualización al sistema gestor dinámico de recursos Condor, mediante la utilización de Condor Virtual Universe para conseguir una máxima utilización de los recursos computacionales virtuales. Además, permite que los recursos de cómputo existentes sean proporcionados dinámicamente en tiempo de ejecución por los usuarios, en función de los requisitos de la aplicación, en lugar de mantener la definición estática definida en tiempo de diseño, y así sentar las bases del uso eficiente de los recursos disponibles.En l'execució d'aplicacions científiques, existeixen diverses propostes amb l'objectiu de proporcionar entorns adequats d'execució que amaguin la complexitat de les infraestructures distribuïdes i heterogènies subjacents a les aplicacions. Recentment, la virtualització ha sorgit com una prometedora tecnologia que ha de permetre abstraure els detalls del hardware, mitjançant l'assignació de recursos virtualitzats a les aplicacions científiques amb altes necessitats de còmput. La virtualizatzació ofereix una solució rentable i a més permet una gestió flexible de recursos. Aquest nivell d'abstracció es desitjable en entorns de Grid Computing i Cloud Computing per a obtenir una planificació adequada del treball (aplicacions) sobre els recursos computacionals. Aquest treball aplica el concepte de virtualització al sistema gestor dinàmic de recursos Condor, mitjançant la utilització de Condor Virtual Universe per aconseguir una màxima utilització dels recursos computacionals virtuals. A més, permet que els recursos de còmput existents siguin proporcionats dinàmicament en temps d'execució pels usuaris, en funció dels requisits de l'aplicació, en lloc de mantenir la definició estàtica definida en temps de disseny, i així assentar unes bases per l'ús eficient dels recursos disponibles
    corecore