1,085 research outputs found

    Armada: a Parallel I/O Framework for Computational Grids

    Get PDF
    High-performance computing increasingly occurs on “computational grids” composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single “virtual” computer. One of the great challenges for this environment is to provide efficient access to data that is distributed across remote data servers in a grid. In this paper, we describe our solution, a framework we call Armada. Armada allows applications to flexibly compose modules to access their data, and to place those modules at appropriate hosts within the grid to reduce network traffic

    Co-designing reliability and performance for datacenter memory

    Get PDF
    Memory is one of the key components that affects reliability and performance of datacenter servers. Memory in today’s servers is organized and shared in several ways to provide the most performant and efficient access to data. For example, cache hierarchy in multi-core chips to reduce access latency, non-uniform memory access (NUMA) in multi-socket servers to improve scalability, disaggregation to increase memory capacity. In all these organizations, hardware coherence protocols are used to maintain memory consistency of this shared memory and implicitly move data to the requesting cores. This thesis aims to provide fault-tolerance against newer models of failure in the organization of memory in datacenter servers. While designing for improved reliability, this thesis explores solutions that can also enhance performance of applications. The solutions build over modern coherence protocols to achieve these properties. First, we observe that DRAM memory system failure rates have increased, demanding stronger forms of memory reliability. To combat this, the thesis proposes Dvé, a hardware driven replication mechanism where data blocks are replicated across two different memory controllers in a cache-coherent NUMA system. Data blocks are accompanied by a code with strong error detection capabilities so that when an error is detected, correction is performed using the replica. Dvé’s organization offers two independent points of access to data which enables: (a) strong error correction that can recover from a range of faults affecting any of the components in the memory and (b) higher performance by providing another nearer point of memory access. Dvé’s coherent replication keeps the replicas in sync for reliability and also provides coherent access to read replicas during fault-free operation for improved performance. Dvé can flexibly provide these benefits on-demand at runtime. Next, we observe that the coherence protocol itself requires to be hardened against failures. Memory in datacenter servers is being disaggregated from the compute servers into dedicated memory servers, driven by standards like CXL. CXL specifies the coherence protocol semantics for compute servers to access and cache data from a shared region in the disaggregated memory. However, the CXL specification lacks the requisite level of fault-tolerance necessary to operate at an inter-server scale within the datacenter. Compute servers can fail or be unresponsive in the datacenter and therefore, it is important that the coherence protocol remain available in the presence of such failures. The thesis proposes Āpta, a CXL-based, shared disaggregated memory system for keeping the cached data consistent without compromising availability in the face of compute server failures. Āpta architects a high-performance fault-tolerant object-granular memory server that significantly improves performance for stateless function-as-a-service (FaaS) datacenter applications

    An Expressive Stateful Aspect Language

    Get PDF
    Abstract Stateful aspects can react to a program execution; they support modular implementations of several crosscutting concerns like error detection, security, event handling, and debugging. However, most proposed stateful aspect languages have specifically been tailored to address a particular concern. Indeed, most of these languages differ in their pattern languages and semantics. As a consequence, developers need to tweak aspect definitions in contortive ways or create new specialized stateful aspect languages altogether if their specific needs are not supported. In this paper, we describe ESA, an expressive stateful aspect language, in which the pattern language is Turing-complete and patterns themselves are reusable, composable first-class values. In addition, the core semantic elements of every aspect in ESA is open to customization. We describe ESA in a typed functional language. We use this description to develop a concrete and practical implementation of ESA for JavaScript. With this implementation, we illustrate the expressiveness of ESA in action with examples of diverse scenarios and expressing semantics of existing stateful aspect languages

    Self-Adaptive Decentralized Monitoring in Software-Defined Networks

    Get PDF
    The Software-Defined Networking (SDN) paradigm can allow network management solutions to automatically and frequently reconfigure network resources. When developing SDNbased management architectures, it is of paramount importance to design a monitoring system that can provide timely and consistent updates to heterogeneous management applications. To support such applications operating with low latency requirements, the monitoring system should scale with increasing network size and provide precise network views with minimum overhead on the available resources. In this paper we present a novel, self-adaptive, decentralized framework for resource monitoring in SDN. Our framework enables accurate statistics to be collected with limited burden on the network resources. This is realized through a self-tuning, adaptive monitoring mechanism that automatically adjusts its settings based on the traffic dynamics. We evaluate our proposal based on a realistic use case scenario, where a content distribution service and an on-demand gaming platform are deployed within an ISP network. The results show that reduced monitoring latencies are obtained with the proposed framework, thus enabling shorter reconfiguration control loops. In addition, the proposed adaptive monitoring method achieves significant gain in terms of monitoring overhead, while preserving the performance of the services considered

    Logs and Models in Engineering Complex Embedded Production Software Systems

    Get PDF

    A Machine Learning Enhanced Scheme for Intelligent Network Management

    Get PDF
    The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

    Turning the shelves: empirical findings and space syntax analyses of two virtual supermarket variations

    Get PDF
    The spatial structure of a virtual supermarket was systematically varied to investigate human behavior and cognitive processes in unusual building configurations. The study builds upon experiments in a regular supermarket, which serve as a baseline case. In a between-participant design a total of 41 participants completed a search task in two different virtual supermarket environments. For 21 participants the supermarket shelves were turned towards them at a 45° angle when entering the store, giving high visual access to product categories and products. For 20 participants the shelves were placed in exactly the opposite direction obstructing a quick development of shopping goods dependencies. The obtained differences in search performance between the two conditions are analyzed using space syntax analyses and comparisons made of environmental features and participants’ actual search path trajectories

    Efficient I/O for Computational Grid Applications

    Get PDF
    High-performance computing increasingly occurs on computational grids composed of heterogeneous and geographically distributed systems of computers, networks, and storage devices that collectively act as a single virtual computer. A key challenge in this environment is to provide efficient access to data distributed across remote data servers. This dissertation explores some of the issues associated with I/O for wide-area distributed computing and describes an I/O system, called Armada, with the following features: a framework to allow application and dataset providers to flexibly compose graphs of processing modules that describe the distribution, application interfaces, and processing required of the dataset before or after computation; an algorithm to restructure application graphs to increase parallelism and to improve network performance in a wide-area network; and a hierarchical graph-partitioning scheme that deploys components of the application graph in a way that is both beneficial to the application and sensitive to the administrative policies of the different administrative domains. Experiments show that applications using Armada perform well in both low- and high-bandwidth environments, and that our approach does an exceptional job of hiding the network latency inherent in grid computing
    corecore