285 research outputs found

    A Self-managed Mesos Cluster for Data Analytics with QoS Guarantees

    Full text link
    [EN] This article describes the development of an automated configuration of a software platform for Data Analytics that supports horizontal and vertical elasticity to guarantee meeting a specific deadline. It specifies all the components, software dependencies and configurations required to build up the cluster, and analyses the deployment times of different instances, as well as the horizontal and vertical elasticity. The approach followed builds up self-managed hybrid clusters that can deal with different workloads and network requirements. The article describes the structure of the recipes, points out to public repositories where the code is available and discusses the limitations of the approach as well as the results of several experiments.The work presented in this article has been partially funded by a research grant from the regional government of the Comunitat Valenciana (Spain), co-funded by the European Union ERDF funds (European Regional Development Fund) of the Comunitat Valenciana 2014-2020, with reference IDIFEDER/2018/032 (High-Performance Algorithms for the Modelling, Simulation and early Detection of diseases in Personalized Medicine). The authors would also like to thank the Spanish "Ministerio de Economia, Industria y Competitividad" for the project "BigCLOE" with reference number TIN2016-79951-R.López-Huguet, S.; Pérez-González, AM.; Calatrava Arroyo, A.; Alfonso Laguna, CD.; Caballer Fernández, M.; Moltó, G.; Blanquer Espert, I. (2019). A Self-managed Mesos Cluster for Data Analytics with QoS Guarantees. Future Generation Computer Systems. 96:449-461. https://doi.org/10.1016/j.future.2019.02.047S4494619

    Dynamic Scheduling of MapReduce Shuffle under Bandwidth Constraints

    Get PDF
    Whether it is for e-science or business, the amount of data produced every year is growing at a high rate. Managing and processing those data raises new challenges. MapReduce is one answer to the need for scalable tools able to handle the amount of data. It imposes a general structure of computation and let the implementation perform its optimizations. During the computation, there is a phase called Shuffle where every node sends a possibly large amount of data to every other node. This report proposes and evaluates six algorithms to improve data transfers during the Shuffle phase under bandwidth constraints.Que ce soit pour l’e-science ou pour les affaires, la quantité de données produites chaque année augmente à une vitesse vertigineuse. Gérer et traiter ces données soulève de nouveaux défis. MapReduce est l’une des réponses aux besoins d’outils qui passent à l’échelle et capables de gérer ces volumes de données. Il impose une structure générale de calcul et laisse l’implémentation effectuer ses optimisations. Durant l’une des phases du calcul appelée Shuffle, tous les nœuds envoient des données potentiellement grosses à tous les autres nœuds. Ce rapport propose et évalue six algorithmes pour améliorer le transfert des données durant cette phase de Shuffle sous des contraintes de bande passante

    Distributed evolutionary algorithms and their models: A survey of the state-of-the-art

    Get PDF
    The increasing complexity of real-world optimization problems raises new challenges to evolutionary computation. Responding to these challenges, distributed evolutionary computation has received considerable attention over the past decade. This article provides a comprehensive survey of the state-of-the-art distributed evolutionary algorithms and models, which have been classified into two groups according to their task division mechanism. Population-distributed models are presented with master-slave, island, cellular, hierarchical, and pool architectures, which parallelize an evolution task at population, individual, or operation levels. Dimension-distributed models include coevolution and multi-agent models, which focus on dimension reduction. Insights into the models, such as synchronization, homogeneity, communication, topology, speedup, advantages and disadvantages are also presented and discussed. The study of these models helps guide future development of different and/or improved algorithms. Also highlighted are recent hotspots in this area, including the cloud and MapReduce-based implementations, GPU and CUDA-based implementations, distributed evolutionary multiobjective optimization, and real-world applications. Further, a number of future research directions have been discussed, with a conclusion that the development of distributed evolutionary computation will continue to flourish

    Energy-efficient Transitional Near-* Computing

    Get PDF
    Studies have shown that communication networks, devices accessing the Internet, and data centers account for 4.6% of the worldwide electricity consumption. Although data centers, core network equipment, and mobile devices are getting more energy-efficient, the amount of data that is being processed, transferred, and stored is vastly increasing. Recent computer paradigms, such as fog and edge computing, try to improve this situation by processing data near the user, the network, the devices, and the data itself. In this thesis, these trends are summarized under the new term near-* or near-everything computing. Furthermore, a novel paradigm designed to increase the energy efficiency of near-* computing is proposed: transitional computing. It transfers multi-mechanism transitions, a recently developed paradigm for a highly adaptable future Internet, from the field of communication systems to computing systems. Moreover, three types of novel transitions are introduced to achieve gains in energy efficiency in near-* environments, spanning from private Infrastructure-as-a-Service (IaaS) clouds, Software-defined Wireless Networks (SDWNs) at the edge of the network, Disruption-Tolerant Information-Centric Networks (DTN-ICNs) involving mobile devices, sensors, edge devices as well as programmable components on a mobile System-on-a-Chip (SoC). Finally, the novel idea of transitional near-* computing for emergency response applications is presented to assist rescuers and affected persons during an emergency event or a disaster, although connections to cloud services and social networks might be disturbed by network outages, and network bandwidth and battery power of mobile devices might be limited

    Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing

    Full text link
    Tesis por compendio[ES] Las aplicaciones científicas implican generalmente una carga computacional variable y no predecible a la que las instituciones deben hacer frente variando dinámicamente la asignación de recursos en función de las distintas necesidades computacionales. Las aplicaciones científicas pueden necesitar grandes requisitos. Por ejemplo, una gran cantidad de recursos computacionales para el procesado de numerosos trabajos independientes (High Throughput Computing o HTC) o recursos de alto rendimiento para la resolución de un problema individual (High Performance Computing o HPC). Los recursos computacionales necesarios en este tipo de aplicaciones suelen acarrear un coste muy alto que puede exceder la disponibilidad de los recursos de la institución o estos pueden no adaptarse correctamente a las necesidades de las aplicaciones científicas, especialmente en el caso de infraestructuras preparadas para la ejecución de aplicaciones de HPC. De hecho, es posible que las diferentes partes de una aplicación necesiten distintos tipos de recursos computacionales. Actualmente las plataformas de servicios en la nube se han convertido en una solución eficiente para satisfacer la demanda de las aplicaciones HTC, ya que proporcionan un abanico de recursos computacionales accesibles bajo demanda. Por esta razón, se ha producido un incremento en la cantidad de clouds híbridos, los cuales son una combinación de infraestructuras alojadas en servicios en la nube y en las propias instituciones (on-premise). Dado que las aplicaciones pueden ser procesadas en distintas infraestructuras, actualmente la portabilidad de las aplicaciones se ha convertido en un aspecto clave. Probablemente, las tecnologías de contenedores son la tecnología más popular para la entrega de aplicaciones gracias a que permiten reproducibilidad, trazabilidad, versionado, aislamiento y portabilidad. El objetivo de la tesis es proporcionar una arquitectura y una serie de servicios para proveer infraestructuras elásticas híbridas de procesamiento que puedan dar respuesta a las diferentes cargas de trabajo. Para ello, se ha considerado la utilización de elasticidad vertical y horizontal desarrollando una prueba de concepto para proporcionar elasticidad vertical y se ha diseñado una arquitectura cloud elástica de procesamiento de Análisis de Datos. Después, se ha trabajo en una arquitectura cloud de recursos heterogéneos de procesamiento de imágenes médicas que proporciona distintas colas de procesamiento para trabajos con diferentes requisitos. Esta arquitectura ha estado enmarcada en una colaboración con la empresa QUIBIM. En la última parte de la tesis, se ha evolucionado esta arquitectura para diseñar e implementar un cloud elástico, multi-site y multi-tenant para el procesamiento de imágenes médicas en el marco del proyecto europeo PRIMAGE. Esta arquitectura utiliza un almacenamiento distribuido integrando servicios externos para la autenticación y la autorización basados en OpenID Connect (OIDC). Para ello, se ha desarrollado la herramienta kube-authorizer que, de manera automatizada y a partir de la información obtenida en el proceso de autenticación, proporciona el control de acceso a los recursos de la infraestructura de procesamiento mediante la creación de las políticas y roles. Finalmente, se ha desarrollado otra herramienta, hpc-connector, que permite la integración de infraestructuras de procesamiento HPC en infraestructuras cloud sin necesitar realizar cambios en la infraestructura HPC ni en la arquitectura cloud. Cabe destacar que, durante la realización de esta tesis, se han utilizado distintas tecnologías de gestión de trabajos y de contenedores de código abierto, se han desarrollado herramientas y componentes de código abierto y se han implementado recetas para la configuración automatizada de las distintas arquitecturas diseñadas desde la perspectiva DevOps.[CA] Les aplicacions científiques impliquen generalment una càrrega computacional variable i no predictible a què les institucions han de fer front variant dinàmicament l'assignació de recursos en funció de les diferents necessitats computacionals. Les aplicacions científiques poden necessitar grans requisits. Per exemple, una gran quantitat de recursos computacionals per al processament de nombrosos treballs independents (High Throughput Computing o HTC) o recursos d'alt rendiment per a la resolució d'un problema individual (High Performance Computing o HPC). Els recursos computacionals necessaris en aquest tipus d'aplicacions solen comportar un cost molt elevat que pot excedir la disponibilitat dels recursos de la institució o aquests poden no adaptar-se correctament a les necessitats de les aplicacions científiques, especialment en el cas d'infraestructures preparades per a l'avaluació d'aplicacions d'HPC. De fet, és possible que les diferents parts d'una aplicació necessiten diferents tipus de recursos computacionals. Actualment les plataformes de servicis al núvol han esdevingut una solució eficient per satisfer la demanda de les aplicacions HTC, ja que proporcionen un ventall de recursos computacionals accessibles a demanda. Per aquest motiu, s'ha produït un increment de la quantitat de clouds híbrids, els quals són una combinació d'infraestructures allotjades a servicis en el núvol i a les mateixes institucions (on-premise). Donat que les aplicacions poden ser processades en diferents infraestructures, actualment la portabilitat de les aplicacions s'ha convertit en un aspecte clau. Probablement, les tecnologies de contenidors són la tecnologia més popular per a l'entrega d'aplicacions gràcies al fet que permeten reproductibilitat, traçabilitat, versionat, aïllament i portabilitat. L'objectiu de la tesi és proporcionar una arquitectura i una sèrie de servicis per proveir infraestructures elàstiques híbrides de processament que puguen donar resposta a les diferents càrregues de treball. Per a això, s'ha considerat la utilització d'elasticitat vertical i horitzontal desenvolupant una prova de concepte per proporcionar elasticitat vertical i s'ha dissenyat una arquitectura cloud elàstica de processament d'Anàlisi de Dades. Després, s'ha treballat en una arquitectura cloud de recursos heterogenis de processament d'imatges mèdiques que proporciona distintes cues de processament per a treballs amb diferents requisits. Aquesta arquitectura ha estat emmarcada en una col·laboració amb l'empresa QUIBIM. En l'última part de la tesi, s'ha evolucionat aquesta arquitectura per dissenyar i implementar un cloud elàstic, multi-site i multi-tenant per al processament d'imatges mèdiques en el marc del projecte europeu PRIMAGE. Aquesta arquitectura utilitza un emmagatzemament integrant servicis externs per a l'autenticació i autorització basats en OpenID Connect (OIDC). Per a això, s'ha desenvolupat la ferramenta kube-authorizer que, de manera automatitzada i a partir de la informació obtinguda en el procés d'autenticació, proporciona el control d'accés als recursos de la infraestructura de processament mitjançant la creació de les polítiques i rols. Finalment, s'ha desenvolupat una altra ferramenta, hpc-connector, que permet la integració d'infraestructures de processament HPC en infraestructures cloud sense necessitat de realitzar canvis en la infraestructura HPC ni en l'arquitectura cloud. Es pot destacar que, durant la realització d'aquesta tesi, s'han utilitzat diferents tecnologies de gestió de treballs i de contenidors de codi obert, s'han desenvolupat ferramentes i components de codi obert, i s'han implementat receptes per a la configuració automatitzada de les distintes arquitectures dissenyades des de la perspectiva DevOps.[EN] Scientific applications generally imply a variable and an unpredictable computational workload that institutions must address by dynamically adjusting the allocation of resources to their different computational needs. Scientific applications could require a high capacity, e.g. the concurrent usage of computational resources for processing several independent jobs (High Throughput Computing or HTC) or a high capability by means of using high-performance resources for solving complex problems (High Performance Computing or HPC). The computational resources required in this type of applications usually have a very high cost that may exceed the availability of the institution's resources or they are may not be successfully adapted to the scientific applications, especially in the case of infrastructures prepared for the execution of HPC applications. Indeed, it is possible that the different parts that compose an application require different type of computational resources. Nowadays, cloud service platforms have become an efficient solution to meet the need of HTC applications as they provide a wide range of computing resources accessible on demand. For this reason, the number of hybrid computational infrastructures has increased during the last years. The hybrid computation infrastructures are the combination of infrastructures hosted in cloud platforms and the computation resources hosted in the institutions, which are named on-premise infrastructures. As scientific applications can be processed on different infrastructures, the application delivery has become a key issue. Nowadays, containers are probably the most popular technology for application delivery as they ease reproducibility, traceability, versioning, isolation, and portability. The main objective of this thesis is to provide an architecture and a set of services to build up hybrid processing infrastructures that fit the need of different workloads. Hence, the thesis considered aspects such as elasticity and federation. The use of vertical and horizontal elasticity by developing a proof of concept to provide vertical elasticity on top of an elastic cloud architecture for data analytics. Afterwards, an elastic cloud architecture comprising heterogeneous computational resources has been implemented for medical imaging processing using multiple processing queues for jobs with different requirements. The development of this architecture has been framed in a collaboration with a company called QUIBIM. In the last part of the thesis, the previous work has been evolved to design and implement an elastic, multi-site and multi-tenant cloud architecture for medical image processing has been designed in the framework of a European project PRIMAGE. This architecture uses a storage integrating external services for the authentication and authorization based on OpenID Connect (OIDC). The tool kube-authorizer has been developed to provide access control to the resources of the processing infrastructure in an automatic way from the information obtained in the authentication process, by creating policies and roles. Finally, another tool, hpc-connector, has been developed to enable the integration of HPC processing infrastructures into cloud infrastructures without requiring modifications in both infrastructures, cloud and HPC. It should be noted that, during the realization of this thesis, different contributions to open source container and job management technologies have been performed by developing open source tools and components and configuration recipes for the automated configuration of the different architectures designed from the DevOps perspective. The results obtained support the feasibility of the vertical elasticity combined with the horizontal elasticity to implement QoS policies based on a deadline, as well as the feasibility of the federated authentication model to combine public and on-premise clouds.López Huguet, S. (2021). Elastic, Interoperable and Container-based Cloud Infrastructures for High Performance Computing [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/172327TESISCompendi

    Designing Human-Centered Collective Intelligence

    Get PDF
    Human-Centered Collective Intelligence (HCCI) is an emergent research area that seeks to bring together major research areas like machine learning, statistical modeling, information retrieval, market research, and software engineering to address challenges pertaining to deriving intelligent insights and solutions through the collaboration of several intelligent sensors, devices and data sources. An archetypal contextual CI scenario might be concerned with deriving affect-driven intelligence through multimodal emotion detection sources in a bid to determine the likability of one movie trailer over another. On the other hand, the key tenets to designing robust and evolutionary software and infrastructure architecture models to address cross-cutting quality concerns is of keen interest in the “Cloud” age of today. Some of the key quality concerns of interest in CI scenarios span the gamut of security and privacy, scalability, performance, fault-tolerance, and reliability. I present recent advances in CI system design with a focus on highlighting optimal solutions for the aforementioned cross-cutting concerns. I also describe a number of design challenges and a framework that I have determined to be critical to designing CI systems. With inspiration from machine learning, computational advertising, ubiquitous computing, and sociable robotics, this literature incorporates theories and concepts from various viewpoints to empower the collective intelligence engine, ZOEI, to discover affective state and emotional intent across multiple mediums. The discerned affective state is used in recommender systems among others to support content personalization. I dive into the design of optimal architectures that allow humans and intelligent systems to work collectively to solve complex problems. I present an evaluation of various studies that leverage the ZOEI framework to design collective intelligence
    corecore