68 research outputs found

    Data availability in challenging networking environments in presence of failures

    Get PDF
    This Doctoral thesis presents research on improving data availability in challenging networking environments where failures frequently occur. The thesis discusses the data retrieval and transfer mechanisms in challenging networks such as the Grid and the delay-tolerant networking (DTN). The Grid concept has gained adaptation as a solution to high-performance computing challenges that are faced in international research collaborations. Challenging networking is a novel research area in communications. The first part of the thesis introduces the challenges of data availability in environment where resources are scarce. The focus is especially on the challenges faced in the Grid and in the challenging networking scenarios. A literature overview is given to explain the most important research findings and the state of the standardization work in the field. The experimental part of the thesis consists of eight scientific publications and explains how they contribute to research in the field. Focus in on explaining how data transfer mechanisms have been improved from the application and networking layer points of views. Experimental methods for the Grid scenarios comprise of running a newly developed storage application on the existing research infrastructure. A network simulator is extended for the experimentation with challenging networking mechanisms in a network formed by mobile users. The simulator enables to investigate network behavior with a large number of nodes, and with conditions that are difficult to re-instantiate. As a result, recommendations are given for data retrieval and transfer design for the Grid and mobile networks. These recommendations can guide both system architects and application developers in their work. In the case of the Grid research, the results give first indications on the applicability of the erasure correcting codes for data storage and retrieval with the existing Grid data storage tools. In the case of the challenging networks, the results show how an application-aware communication approach can be used to improve data retrieval and communications. Recommendations are presented to enable efficient transfer and management of data items that are large compared to available resources

    A multi-tier cached I/O architecture for massively parallel supercomputers

    Get PDF
    Recent advances in storage technologies and high performance interconnects have made possible in the last years to build, more and more potent storage systems that serve thousands of nodes. The majority of storage systems of clusters and supercomputers from Top 500 list are managed by one of three scalable parallel file systems: GPFS, PVFS, and Lustre. Most large-scale scientific parallel applications are written in Message Passing Interface (MPI), which has become the de-facto standard for scalable distributed memory machines. One part of the MPI standard is related to I/O and has among its main goals the portability and efficiency of file system accesses. All of the above mentioned parallel file systems may be accessed also through the MPI-IO interface. The I/O access patterns of scientific parallel applications often consist of accesses to a large number of small, non-contiguous pieces of data. For small file accesses the performance is dominated by the latency of network transfers and disks. Parallel scientific applications lead to interleaved file access patterns with high interprocess spatial locality at the I/O nodes. Additionally, scientific applications exhibit repetitive behaviour when a loop or a function with loops issues I/O requests. When I/O access patterns are repetitive, caching and prefetching can effectively mask their access latency. These characteristics of the access patterns motivated several researchers to propose parallel I/O optimizations both at library and file system levels. However, these optimizations are not always integrated across different layers in the systems. In this dissertation we propose a novel generic parallel I/O architecture for clusters and supercomputers. Our design is aimed at large-scale parallel architectures with thousands of compute nodes. Besides acting as middleware for existing parallel file systems, our architecture provides on-line virtualization of storage resources. Another objective of this thesis is to factor out the common parallel I/O functionality from clusters and supercomputers in generic modules in order to facilitate porting of scientific applications across these platforms. Our solution is based on a multi-tier cache architecture, collective I/O, and asynchronous data staging strategies hiding the latency of data transfer between cache tiers. The thesis targets to reduce the file access latency perceived by the data-intensive parallel scientific applications by multi-layer asynchronous data transfers. In order to accomplish this objective, our techniques leverage the multi-core architectures by overlapping computation with communication and I/O in parallel threads. Prototypes of our solutions have been deployed on both clusters and Blue Gene supercomputers. Performance evaluation shows that the combination of collective strategies with overlapping of computation, communication, and I/O may bring a substantial performance benefit for access patterns common for parallel scientific applications.-----------------------------------------------------------------------------------------------------------------------------En los últimos años se ha observado un incremento sustancial de la cantidad de datos producidos por las aplicaciones científicas paralelas y de la necesidad de almacenar estos datos de forma persistente. Los sistemas de ficheros paralelos como PVFS, Lustre y GPFS han ofrecido una solución escalable para esta demanda creciente de almacenamiento. La mayoría de las aplicaciones científicas son escritas haciendo uso de la interfaz de paso de mensajes (MPI), que se ha convertido en un estándar de-facto de programación para las arquitecturas de memoria distribuida. Las aplicaciones paralelas que usan MPI pueden acceder a los sistemas de ficheros paralelos a través de la interfaz ofrecida por MPI-IO. Los patrones de acceso de las aplicaciones científicas paralelas consisten en un gran número de accesos pequeños y no contiguos. Para tamaños de acceso pequeños, el rendimiento viene limitado por la latencia de las transferencias de red y disco. Además, las aplicaciones científicas llevan a cabo accesos con una alta localidad espacial entre los distintos procesos en los nodos de E/S. Adicionalmente, las aplicaciones científicas presentan típicamente un comportamiento repetitivo. Cuando los patrones de acceso de E/S son repetitivos, técnicas como escritura demorada y lectura adelantada pueden enmascarar de forma eficiente las latencias de los accesos de E/S. Estas características han motivado a muchos investigadores en proponer optimizaciones de E/S tanto a nivel de biblioteca como a nivel del sistema de ficheros. Sin embargo, actualmente estas optimizaciones no se integran siempre a través de las distintas capas del sistema. El objetivo principal de esta tesis es proponer una nueva arquitectura genérica de E/S paralela para clusters y supercomputadores. Nuestra solución está basada en una arquitectura de caches en varias capas, una técnica de E/S colectiva y estrategias de acceso asíncronas que ocultan la latencia de transferencia de datos entre las distintas capas de caches. Nuestro diseño está dirigido a arquitecturas paralelas escalables con miles de nodos de cómputo. Además de actuar como middleware para los sistemas de ficheros paralelos existentes, nuestra arquitectura debe proporcionar virtualización on-line de los recursos de almacenamiento. Otro de los objeticos marcados para esta tesis es la factorización de las funcionalidades comunes en clusters y supercomputadores, en módulos genéricos que faciliten el despliegue de las aplicaciones científicas a través de estas plataformas. Se han desplegado distintos prototipos de nuestras soluciones tanto en clusters como en supercomputadores. Las evaluaciones de rendimiento demuestran que gracias a la combicación de las estratégias colectivas de E/S y del solapamiento de computación, comunicación y E/S, se puede obtener una sustancial mejora del rendimiento en los patrones de acceso anteriormente descritos, muy comunes en las aplicaciones paralelas de caracter científico

    Systems-compatible Incentives

    Get PDF
    Originally, the Internet was a technological playground, a collaborative endeavor among researchers who shared the common goal of achieving communication. Self-interest used not to be a concern, but the motivations of the Internet's participants have broadened. Today, the Internet consists of millions of commercial entities and nearly 2 billion users, who often have conflicting goals. For example, while Facebook gives users the illusion of access control, users do not have the ability to control how the personal data they upload is shared or sold by Facebook. Even in BitTorrent, where all users seemingly have the same motivation of downloading a file as quickly as possible, users can subvert the protocol to download more quickly without giving their fair share. These examples demonstrate that protocols that are merely technologically proficient are not enough. Successful networked systems must account for potentially competing interests. In this dissertation, I demonstrate how to build systems that give users incentives to follow the systems' protocols. To achieve incentive-compatible systems, I apply mechanisms from game theory and auction theory to protocol design. This approach has been considered in prior literature, but unfortunately has resulted in few real, deployed systems with incentives to cooperate. I identify the primary challenge in applying mechanism design and game theory to large-scale systems: the goals and assumptions of economic mechanisms often do not match those of networked systems. For example, while auction theory may assume a centralized clearing house, there is no analog in a decentralized system seeking to avoid single points of failure or centralized policies. Similarly, game theory often assumes that each player is able to observe everyone else's actions, or at the very least know how many other players there are, but maintaining perfect system-wide information is impossible in most systems. In other words, not all incentive mechanisms are systems-compatible. The main contribution of this dissertation is the design, implementation, and evaluation of various systems-compatible incentive mechanisms and their application to a wide range of deployable systems. These systems include BitTorrent, which is used to distribute a large file to a large number of downloaders, PeerWise, which leverages user cooperation to achieve lower latencies in Internet routing, and Hoodnets, a new system I present that allows users to share their cellular data access to obtain greater bandwidth on their mobile devices. Each of these systems represents a different point in the design space of systems-compatible incentives. Taken together, along with their implementations and evaluations, these systems demonstrate that systems-compatibility is crucial in achieving practical incentives in real systems. I present design principles outlining how to achieve systems-compatible incentives, which may serve an even broader range of systems than considered herein. I conclude this dissertation with what I consider to be the most important open problems in aligning the competing interests of the Internet's participants

    Systems-Level Support for Mobile Device Connectivity.

    Full text link
    The rise of handheld computing devices has inspired a great deal of research aimed at addressing the unique problems posed by their mobile, "always-on" nature. In order to help mobile devices navigate a complex world of overlapping, uneven public wireless coverage, one must be mindful of the distinction between nomadic usage and true mobility. Accordingly, systems research must move beyond simply optimizing for a set of local conditions (e.g., finding the best access point for a laptop user in a stationary location) to considering the "derivative of connectivity" when network conditions are constantly in flux. This dissertation presents a new paradigm for networking support on mobile devices. This project has several complementary aspects. As devices encounter network connectivity our system both evaluates the application-level quality of WiFi access points and updates a device-centric mobility model. Together, this mobility model and AP quality database yield "connectivity forecasts," which let applications optimize not just for current network conditions but for the expected big picture to come. Results of a prototype deployment in several cities shows that considering the application-level quality of APs (rather than just signal strength) significantly boosts the success rate of finding a usable access point. Furthermore, this dissertation shows how connectivity forecasts---even with minimal model training time---allow several applications commonly found on mobile devices to reap significant benefits, such as extended battery life. Mobile devices are often within range of multiple connectivity options, however, and choosing just one therefore ignores potential connectivity. This dissertation describes a virtual link layer for Linux, called Juggler, that uses one network card to simultaneously associate with many WiFi APs, ad hoc groups or mesh networks. The results show how Juggler can boost effective bandwidth by striping data across multiple APs, enable seamless 802.11 handoff by preemptively associating with the "next" AP before the current one become unusable, and maintain a modest side-channel to the user's personal area network or mesh network without impacting foreground bandwidth to infrastructure.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61718/1/tonynich_1.pd

    Architecture independent environment for developing engineering software on MIMD computers

    Get PDF
    Engineers are constantly faced with solving problems of increasing complexity and detail. Multiple Instruction stream Multiple Data stream (MIMD) computers have been developed to overcome the performance limitations of serial computers. The hardware architectures of MIMD computers vary considerably and are much more sophisticated than serial computers. Developing large scale software for a variety of MIMD computers is difficult and expensive. There is a need to provide tools that facilitate programming these machines. First, the issues that must be considered to develop those tools are examined. The two main areas of concern were architecture independence and data management. Architecture independent software facilitates software portability and improves the longevity and utility of the software product. It provides some form of insurance for the investment of time and effort that goes into developing the software. The management of data is a crucial aspect of solving large engineering problems. It must be considered in light of the new hardware organizations that are available. Second, the functional design and implementation of a software environment that facilitates developing architecture independent software for large engineering applications are described. The topics of discussion include: a description of the model that supports the development of architecture independent software; identifying and exploiting concurrency within the application program; data coherence; engineering data base and memory management

    Accelerating Network Communication and I/O in Scientific High Performance Computing Environments

    Get PDF
    High performance computing has become one of the major drivers behind technology inventions and science discoveries. Originally driven through the increase of operating frequencies and technology scaling, a recent slowdown in this evolution has led to the development of multi-core architectures, which are supported by accelerator devices such as graphics processing units (GPUs). With the upcoming exascale era, the overall power consumption and the gap between compute capabilities and I/O bandwidth have become major challenges. Nowadays, the system performance is dominated by the time spent in communication and I/O, which highly depends on the capabilities of the network interface. In order to cope with the extreme concurrency and heterogeneity of future systems, the software ecosystem of the interconnect needs to be carefully tuned to excel in reliability, programmability, and usability. This work identifies and addresses three major gaps in today's interconnect software systems. The I/O gap describes the disparity in operating speeds between the computing capabilities and second storage tiers. The communication gap is introduced through the communication overhead needed to synchronize distributed large-scale applications and the mixed workload. The last gap is the so called concurrency gap, which is introduced through the extreme concurrency and the inflicted learning curve posed to scientific application developers to exploit the hardware capabilities. The first contribution is the introduction of the network-attached accelerator approach, which moves accelerators into a "stand-alone" cluster connected through the Extoll interconnect. The novel communication architecture enables the direct accelerators communication without any host interactions and an optimal application-to-compute-resources mapping. The effectiveness of this approach is evaluated for two classes of accelerators: Intel Xeon Phi coprocessors and NVIDIA GPUs. The next contribution comprises the design, implementation, and evaluation of the support of legacy codes and protocols over the Extoll interconnect technology. By providing TCP/IP protocol support over Extoll, it is shown that the performance benefits of the interconnect can be fully leveraged by a broader range of applications, including the seamless support of legacy codes. The third contribution is twofold. First, a comprehensive analysis of the Lustre networking protocol semantics and interfaces is presented. Afterwards, these insights are utilized to map the LNET protocol semantics onto the Extoll networking technology. The result is a fully functional Lustre network driver for Extoll. An initial performance evaluation demonstrates promising bandwidth and message rate results. The last contribution comprises the design, implementation, and evaluation of two easy-to-use load balancing frameworks, which transparently distribute the I/O workload across all available storage system components. The solutions maximize the parallelization and throughput of file I/O. The frameworks are evaluated on the Titan supercomputing systems for three I/O interfaces. For example for large-scale application runs, POSIX I/O and MPI-IO can be improved by up to 50% on a per job basis, while HDF5 shows performance improvements of up to 32%

    Interactivity And User-heterogeneity In On Demand Broadcast Video

    Get PDF
    Video-On-Demand (VOD) has appeared as an important technology for many multimedia applications such as news on demand, digital libraries, home entertainment, and distance learning. In its simplest form, delivery of a video stream requires a dedicated channel for each video session. This scheme is very expensive and non-scalable. To preserve server bandwidth, many users can share a channel using multicast. Two types of multicast have been considered. In a non-periodic multicast setting, users make video requests to the server; and it serves them according to some scheduling policy. In a periodic broadcast environment, the server does not wait for service requests. It broadcasts a video cyclically, e.g., a new stream of the same video is started every t seconds. Although, this type of approach does not guarantee true VOD, the worst service latency experienced by any client is less than t seconds. A distinct advantage of this approach is that it can serve a very large community of users using minimal server bandwidth. In VOD System it is desirable to provide the user with the video-cassette-recorder-like (VCR) capabilities such as fast-forwarding a video or jumping to a specific frame. This issue in the broadcast framework is addressed, where each video and its interactive version are broadcast repeatedly on the network. Existing techniques rely on data prefetching as the mechanism to provide this functionality. This approach provides limited usability since the prefetching rate cannot keep up with typical fast-forward speeds. In the same environment, end users might have access to different bandwidth capabilities at different times. Current periodic broadcast schemes, do not take advantage of high-bandwidth capabilities, nor do they adapt to the low-bandwidth limitation of the receivers. A heterogeneous technique is presented that can adapt to a range of receiving bandwidth capability. Given a server bandwidth and a range of different client bandwidths, users employing the proposed technique will choose either to use their full reception bandwidth capability and therefore accessing the video at a very short time, or using part or enough reception bandwidth at the expense of a longer access latency
    • …
    corecore