4,410 research outputs found
Single system image: A survey
Single system image is a computing paradigm where a number of distributed computing resources are aggregated and presented via an interface that maintains the illusion of interaction with a single system. This approach encompasses decades of research using a broad variety of techniques at varying levels of abstraction, from custom hardware and distributed hypervisors to specialized operating system kernels and user-level tools. Existing classification schemes for SSI technologies are reviewed, and an updated classification scheme is proposed. A survey of implementation techniques is provided along with relevant examples. Notable deployments are examined and insights gained from hands-on experience are summarized. Issues affecting the adoption of kernel-level SSI are identified and discussed in the context of technology adoption literature
Centralised monitoring and alerting solution for complex information management infrastructure
Internship Report presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsMonitoring and alerting receive increasing attention over the last years together with growth of the
information generated and managed from one side and increasing of the computing power and capacity of
the computing hardware. Monitoring solutions have been tightly linked to the software being monitored
making the broad view of the performance of the all interlinked services too complex and ineffective. In this
work a monitoring and alerting solution based on “Prometheus” is developed. Continuous collection of
metrics from various different services is performed and organized for visualization and monitoring in
several layers of precision. Different exporters for various systems were evaluated and many of them
enhanced. Alerting logic in response of detected performance problems and function irregularities has been
developed and implemented with “Alert Manager”. Another software is developed for recording and
visualizing current or past alerts and also is used as debugging tool of the alerts configuration. Visualization
is implemented for Grafana with several dashboards. All tools and software packages used for implementing
this monitoring and alerting solution are open source and free to use
Recommended from our members
JavaFlow : a Java DataFlow Machine
textThe JavaFlow, a Java DataFlow Machine is a machine design concept implementing a Java Virtual Machine aimed at addressing technology roadmap issues along with the ability to effectively utilize and manage very large numbers of processing cores. Specific design challenges addressed include: design complexity through a common set of repeatable structures; low power by featuring unused circuits and ability to power off sections of the chip; clock propagation and wire limits by using locality to bring data to processing elements and a Globally Asynchronous Locally Synchronous (GALS) design; and reliability by allowing portions of the design to be bypassed in case of failures. A Data Flow Architecture is used with multiple heterogeneous networks to connect processing elements capable of executing a single Java ByteCode instruction. Whole methods are cached in this DataFlow fabric, and the networks plus distributed intelligence are used for their management and execution. A mesh network is used for the DataFlow transfers; two ordered networks are used for management and control flow mapping; and multiple high speed rings are used to access the storage subsystem and a controlling General Purpose Processor (GPP). Analysis of benchmarks demonstrates the potential for this design concept. The design process was initiated by analyzing SPEC JVM benchmarks which identified a small number methods contributing to a significant percentage of the overall ByteCode operations. Additional analysis established static instruction mixes to prioritize the types of processing elements used in the DataFlow Fabric. The overall objective of the machine is to provide multi-threading performance for Java Methods deployed to this DataFlow fabric. With advances in technology it is envisioned that from 1,000 to 10,000 cores/instructions could be deployed and managed using this structure. This size of DataFlow fabric would allow all the key methods from the SPEC benchmarks to be resident. A baseline configuration is defined with a compressed dataflow structure and then compared to multiple configurations of instruction assignments and clock relationships. Using a series of methods from the SPEC benchmark running independently, IPC (Instructions per Cycle) performance of the sparsely populated heterogeneous structure is 40% of the baseline. The average ratio of instructions to required nodes is 3.5. Innovative solutions to the loading and management of Java methods along with the translation from control flow to DataFlow structure are demonstrated.Electrical and Computer Engineerin
Low‐latency Java communication devices on RDMA‐enabled networks
This is the peer reviewed version of the following article: Expósito, R. R., Taboada, G. L., Ramos, S., Touriño, J., & Doallo, R. (2015). Low‐latency Java communication devices on RDMA‐enabled networks. Concurrency and Computation: Practice and Experience, 27(17), 4852-4879., which has been published in final form at https://doi.org/10.1002/cpe.3473. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Use of Self-Archived Versions.[Abstract] Providing high‐performance inter‐node communication is a key capability for running high performance computing applications efficiently on parallel architectures. In fact, current systems deployments are aggregating a significant number of cores interconnected via advanced networking hardware with Remote Direct Memory Access (RDMA) mechanisms, that enable zero‐copy and kernel‐bypass features. The use of Java for parallel programming is becoming more promising thanks to some useful characteristics of this language, particularly its built‐in multithreading support, portability, easy‐to‐learn properties, and high productivity, along with the continuous increase in the performance of the Java virtual machine. However, current parallel Java applications generally suffer from inefficient communication middleware, mainly based on protocols with high communication overhead that do not take full advantage of RDMA‐enabled networks. This paper presents efficient low‐level Java communication devices that overcome these constraints by fully exploiting the underlying RDMA hardware, providing low‐latency and high‐bandwidth communications for parallel Java applications. The performance evaluation conducted on representative RDMA networks and parallel systems has shown significant point‐to‐point performance increases compared with previous Java communication middleware, allowing to obtain up to 40% improvement in application‐level performance on 4096 cores of a Cray XE6 supercomputer.Ministerio de Economía y Competitividad; TIN2013-42148-PXunta de Galicia; GRC2013/055Ministerio de Educación y Ciencia; AP2010-434
Scheduling in virtual infrastructure
For the execution of the scientific applications, different methods have been proposed to dynamically provide execution environments for such applications that hide the complexity of underlying distributed and heterogeneous infrastructures. Recently virtualization has emerged as a promising technology to provide such environments. Virtualization is a technology that abstracts away the details of physical hardware and provides virtualized resources for high-level scientific applications. Virtualization offers a cost-effective and flexible way to use and manage computing resources. Such an abstraction is appealing in Grid computing and Cloud computing for better matching jobs (applications) to computational resources. This work applies the virtualization concept to the Condor dynamic resource management system by using Condor Virtual Universe to harvest the existing virtual computing resources to their maximum utility. It allows existing computing resources to be dynamically provisioned at run-time by users based on application requirements instead of statically at design-time thereby lay the basis for efficient use of the available resources, thus providing way for the efficient use of the available resources.En la ejecución de aplicaciones científicas, existen diversas propuestas cuyo objetivo es proporcionar entornos adecuados de ejecución que oculten la complejidad de las infraestructuras distribuidas y heterogéneas subyacentes a las aplicaciones. Recientemente, la virtualización ha emergido como una prometedora tecnología que permite abstraer los detalles del hardware, mediante la asignación de recursos virtualizados a las aplicaciones científicas de altas necesidades de cómputo. La virtualización ofrece una solución rentable y además permite una gestión flexible de recursos. Este nivel de abstracción es deseable en entornos de Grid Computing y Cloud Computing para obtener una planificación adecuada de tarea (aplicaciones) sobre los recursos computacionales. Este trabajo aplica el concepto de virtualización al sistema gestor dinámico de recursos Condor, mediante la utilización de Condor Virtual Universe para conseguir una máxima utilización de los recursos computacionales virtuales. Además, permite que los recursos de cómputo existentes sean proporcionados dinámicamente en tiempo de ejecución por los usuarios, en función de los requisitos de la aplicación, en lugar de mantener la definición estática definida en tiempo de diseño, y así sentar las bases del uso eficiente de los recursos disponibles.En l'execució d'aplicacions científiques, existeixen diverses propostes amb l'objectiu de proporcionar entorns adequats d'execució que amaguin la complexitat de les infraestructures distribuïdes i heterogènies subjacents a les aplicacions. Recentment, la virtualització ha sorgit com una prometedora tecnologia que ha de permetre abstraure els detalls del hardware, mitjançant l'assignació de recursos virtualitzats a les aplicacions científiques amb altes necessitats de còmput. La virtualizatzació ofereix una solució rentable i a més permet una gestió flexible de recursos. Aquest nivell d'abstracció es desitjable en entorns de Grid Computing i Cloud Computing per a obtenir una planificació adequada del treball (aplicacions) sobre els recursos computacionals. Aquest treball aplica el concepte de virtualització al sistema gestor dinàmic de recursos Condor, mitjançant la utilització de Condor Virtual Universe per aconseguir una màxima utilització dels recursos computacionals virtuals. A més, permet que els recursos de còmput existents siguin proporcionats dinàmicament en temps d'execució pels usuaris, en funció dels requisits de l'aplicació, en lloc de mantenir la definició estàtica definida en temps de disseny, i així assentar unes bases per l'ús eficient dels recursos disponibles
- …