14,423 research outputs found

    Computing at massive scale: Scalability and dependability challenges

    Get PDF
    Large-scale Cloud systems and big data analytics frameworks are now widely used for practical services and applications. However, with the increase of data volume, together with the heterogeneity of workloads and resources, and the dynamic nature of massive user requests, the uncertainties and complexity of resource management and service provisioning increase dramatically, often resulting in poor resource utilization, vulnerable system dependability, and user-perceived performance degradations. In this paper we report our latest understanding of the current and future challenges in this particular area, and discuss both existing and potential solutions to the problems, especially those concerned with system efficiency, scalability and dependability. We first introduce a data-driven analysis methodology for characterizing the resource and workload patterns and tracing performance bottlenecks in a massive-scale distributed computing environment. We then examine and analyze several fundamental challenges and the solutions we are developing to tackle them, including for example incremental but decentralized resource scheduling, incremental messaging communication, rapid system failover, and request handling parallelism. We integrate these solutions with our data analysis methodology in order to establish an engineering approach that facilitates the optimization, tuning and verification of massive-scale distributed systems. We aim to develop and offer innovative methods and mechanisms for future computing platforms that will provide strong support for new big data and IoE (Internet of Everything) applications

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Integration of tools for the Design and Assessment of High-Performance, Highly Reliable Computing Systems (DAHPHRS), phase 1

    Get PDF
    Systems for Space Defense Initiative (SDI) space applications typically require both high performance and very high reliability. These requirements present the systems engineer evaluating such systems with the extremely difficult problem of conducting performance and reliability trade-offs over large design spaces. A controlled development process supported by appropriate automated tools must be used to assure that the system will meet design objectives. This report describes an investigation of methods, tools, and techniques necessary to support performance and reliability modeling for SDI systems development. Models of the JPL Hypercubes, the Encore Multimax, and the C.S. Draper Lab Fault-Tolerant Parallel Processor (FTPP) parallel-computing architectures using candidate SDI weapons-to-target assignment algorithms as workloads were built and analyzed as a means of identifying the necessary system models, how the models interact, and what experiments and analyses should be performed. As a result of this effort, weaknesses in the existing methods and tools were revealed and capabilities that will be required for both individual tools and an integrated toolset were identified

    Topology-aware transmission scheduling for distributed highway traffic monitoring wireless sensor networks

    Get PDF
    Wireless sensor networks have been deployed along highways for traffic monitoring. The thesis studies a set of transmission scheduling methods for optimizing network throughput, message transfer delay, and energy efficiency. Today\u27s traffic monitoring systems are centrally managed. Several studies have envisioned the advantages of distributed traffic management techniques. The thesis is based on previously proposed hierarchical sensor network architecture, for which the routing and transmission scheduling methods are derived. Wireless sensor networks have a lifetime limited by battery energy of the sensors. The thesis proposes to assign schedules for nodes to transmit and receive packets and turning off their radios during other times to save energy. The schedules are assigned to minimize the end-to-end packet delivery latency and maximize the network throughput. Conflict-free transmission slots are assigned to sensors along road segments leading to a common intersection based on locally discovered topology. The slot assignment adopts a heuristic that rotates among segments, assigns closest possible slots to neighboring nodes in a pipelined fashion, and exploits radio capture effects when possible. Based on the single-intersection approach, centralized and distributed multi-intersection scheduling methods are proposed to resolve conflicts among nodes belonging to different intersections. The centralized approach designates a controller as the leader to collect topology information of a set of contiguous intersections and assign schedules using the same single-intersection algorithm. The distributed approach has each intersection determine its own schedule independently and then exchange the topology information and schedules with its adjacent intersections to resolve conflicts locally. Based on simulation studies in ns-2, the centralized approach achieves better performance, while the distributed approach tries to approach the centralized performance at much lower communication costs. A communication cost analysis is performed to assess the trade-off between the centralized and distributed approaches

    A Study of Application-awareness in Software-defined Data Center Networks

    Get PDF
    A data center (DC) has been a fundamental infrastructure for academia and industry for many years. Applications in DC have diverse requirements on communication. There are huge demands on data center network (DCN) control frameworks (CFs) for coordinating communication traffic. Simultaneously satisfying all demands is difficult and inefficient using existing traditional network devices and protocols. Recently, the agile software-defined Networking (SDN) is introduced to DCN for speeding up the development of the DCNCF. Application-awareness preserves the application semantics including the collective goals of communications. Previous works have illustrated that application-aware DCNCFs can much more efficiently allocate network resources by explicitly considering applications needs. A transfer application task level application-aware software-defined DCNCF (SDDCNCF) for OpenFlow software-defined DCN (SDDCN) for big data exchange is designed. The SDDCNCF achieves application-aware load balancing, short average transfer application task completion time, and high link utilization. The SDDCNCF is immediately deployable on SDDCN which consists of OpenFlow 1.3 switches. The Big Data Research Integration with Cyberinfrastructure for LSU (BIC-LSU) project adopts the SDDCNCF to construct a 40Gb/s high-speed storage area network to efficiently transfer big data for accelerating big data related researches at Louisiana State University. On the basis of the success of BIC-LSU, a coflow level application-aware SD- DCNCF for OpenFlow-based storage area networks, MinCOF, is designed. MinCOF incorporates all desirable features of existing coflow scheduling and routing frame- works and requires minimal changes on hosts. To avoid the architectural limitation of the OpenFlow SDN implementation, a coflow level application-aware SDDCNCF using fast packet processing library, Coflourish, is designed. Coflourish exploits congestion feedback assistances from SDN switches in the DCN to schedule coflows and can smoothly co-exist with arbitrary applications in a shared DCN. Coflourish is implemented using the fast packet processing library on an SDN switch, Open vSwitch with DPDK. Simulation and experiment results indicate that Coflourish effectively shortens average application completion time

    Application Driven MOdels for Resource Management in Cloud Environments

    Get PDF
    El despliegue y la ejecución de aplicaciones de gran escala en sistemas distribuidos con unos parametros de Calidad de Servicio adecuados necesita gestionar de manera eficiente los recursos computacionales. Para desacoplar los requirimientos funcionales y los no funcionales (u operacionales) de dichas aplicaciones, se puede distinguir dos niveles de abstracción: i) el nivel funcional, que contempla aquellos requerimientos relacionados con funcionalidades de la aplicación; y ii) el nivel operacional, que depende del sistema distribuido donde se despliegue y garantizará aquellos parámetros relacionados con la Calidad del Servicio, disponibilidad, tolerancia a fallos y coste económico, entre otros. De entre las diferentes alternativas del nivel operacional, en la presente tesis se contempla un entorno cloud basado en la virtualización de contenedores, como puede ofrecer Kubernetes.El uso de modelos para el diseño de aplicaciones en ambos niveles permite garantizar que dichos requerimientos sean satisfechos. Según la complejidad del modelo que describa la aplicación, o el conocimiento que el nivel operacional tenga de ella, se diferencian tres tipos de aplicaciones: i) aplicaciones dirigidas por el modelo, como es el caso de la simulación de eventos discretos, donde el propio modelo, por ejemplo Redes de Petri de Alto Nivel, describen la aplicación; ii) aplicaciones dirigidas por los datos, como es el caso de la ejecución de analíticas sobre Data Stream; y iii) aplicaciones dirigidas por el sistema, donde el nivel operacional rige el despliegue al considerarlas como una caja negra.En la presente tesis doctoral, se propone el uso de un scheduler específico para cada tipo de aplicación y modelo, con ejemplos concretos, de manera que el cliente de la infraestructura pueda utilizar información del modelo descriptivo y del modelo operacional. Esta solución permite rellenar el hueco conceptual entre ambos niveles. De esta manera, se proponen diferentes métodos y técnicas para desplegar diferentes aplicaciones: una simulación de un sistema de Vehículos Eléctricos descrita a través de Redes de Petri; procesado de algoritmos sobre un grafo que llega siguiendo el paradigma Data Stream; y el propio sistema operacional como sujeto de estudio.En este último caso de estudio, se ha analizado cómo determinados parámetros del nivel operacional (por ejemplo, la agrupación de contenedores, o la compartición de recursos entre contenedores alojados en una misma máquina) tienen un impacto en las prestaciones. Para analizar dicho impacto, se propone un modelo formal de una infrastructura operacional concreta (Kubernetes). Por último, se propone una metodología para construir índices de interferencia para caracterizar aplicaciones y estimar la degradación de prestaciones incurrida cuando dos contenedores son desplegados y ejecutados juntos. Estos índices modelan cómo los recursos del nivel operacional son usados por las applicaciones. Esto supone que el nivel operacional maneja información cercana a la aplicación y le permite tomar mejores decisiones de despliegue y distribución.<br /

    Scheduling in Transactional Memory Systems: Models, Algorithms, and Evaluations

    Get PDF
    Transactional memory provides an alternative synchronization mechanism that removes many limitations of traditional lock-based synchronization so that concurrent program writing is easier than lock-based code in modern multicore architectures. The fundamental module in a transactional memory system is the transaction which represents a sequence of read and write operations that are performed atomically to a set of shared resources; transactions may conflict if they access the same shared resources. A transaction scheduling algorithm is used to handle these transaction conflicts and schedule appropriately the transactions. In this dissertation, we study transaction scheduling problem in several systems that differ through the variation of the intra-core communication cost in accessing shared resources. Symmetric communication costs imply tightly-coupled systems, asymmetric communication costs imply large-scale distributed systems, and partially asymmetric communication costs imply non-uniform memory access systems. We made several theoretical contributions providing tight, near-tight, and/or impossibility results on three different performance evaluation metrics: execution time, communication cost, and load, for any transaction scheduling algorithm. We then complement these theoretical results by experimental evaluations, whenever possible, showing their benefits in practical scenarios. To the best of our knowledge, the contributions of this dissertation are either the first of their kind or significant improvements over the best previously known results
    corecore