203 research outputs found

    Adaptive and secured resource management in distributed and Internet systems

    Get PDF
    The effectiveness of computer system resource management has been always determined by two major factors: (1) workload demands and management objectives, (2) the updates of the computer technology. These two factors are dynamically changing, and resource management systems must be timely adaptive to the changes. This dissertation attempts to address several important and related resource management issues.;We first study memory system utilization in centralized servers by improving memory performance of sorting algorithms, which provides fundamental understanding on memory system organizations and its performance optimizations for data-intensive workloads. to reduce different types of cache misses, we restructure the mergesort and quicksort algorithms by integrating tiling, padding, and buffering techniques and by repartitioning the data set. Our study shows substantial performance improvements from our new methods.;We have further extended the work to improve load sharing for utilizing global memory resources in distributed systems. Aiming at reducing the memory resource contention caused by page faults and I/O activities, we have developed and examined load sharing policies by considering effective usage of global memory in addition to CPU load balancing in both homogeneous and heterogeneous clusters.;Extending our research from clusters to Internet systems, we have further investigated memory and storage utilizations in Web caching systems. We have proposed several novel management schemes to restructure and decentralize the existing caching system by exploiting data locality at different levels of the global memory hierarchy and by effectively sharing data objects among the clients and their proxy caches.;Data integrity and communication anonymity issues are raised from our decentralized Web caching system design, which are also security concerns for general peer-to-peer systems. We propose an integrity protocol to ensure data integrity, and several protocols to achieve mutual communication anonymity between an information requester and a provider.;The potential impact and contributions of this dissertation are briefly stated as follows: (1) two major research topics identified in this dissertation are fundamentally important for the growth and development of information technology, and will continue to be demanding topics for a long term. (2) Our proposed cache-effective sorting methods bridge a serious gap between analytical complexity of algorithms and their execution complexity in practice due to the increasingly deep memory hierarchy in computer systems. This approach can also be used to improve memory performance at different levels of the memory hierarchy, such as I/O and file systems. (3) Our load sharing principle of giving a high priority to the requests of data accesses in memory and I/Os timely adapts the technology changes and effectively responds to the increasing demand of data-intensive applications. (4) Our proposed decentralized Web caching framework and its resource management schemes present a comprehensive case study to examine the P2P model. Our results and experiences can be used for related and further studies in distributed computing. (5) The proposed data integrity and communication anonymity protocols address limits and weaknesses of existing ones, and place a solid foundation for us to continue our work in this important area

    Performance Evaluation of Ad Hoc Routing in a Swarm of Autonomous Aerial Vehicles

    Get PDF
    This thesis investigates the performance of three mobile ad hoc routing protocols in the context of a swarm of autonomous unmanned aerial vehicles (UAVs). It is proposed that a wireless network of nodes having an average of 5.1774 log n neighbors, where n is the total number of nodes in the network, has a high probability of having no partitions. By decreasing transmission range while ensuring network connectivity, and implementing multi-hop routing between nodes, spatial multiplexing is exploited whereby multiple pairs of nodes simultaneously transmit on the same channel. The proposal is evaluated using the Greedy Perimeter Stateless Routing (GPSR), Optimized Link State Routing (OLSR), and Ad hoc On-demand Distance Vector (AODV) routing protocols in the context of a swarm of UAVs using the OPNET network simulation tool. The first-known implementation of GPSR in OPNET is constructed, and routing performance is observed when routing protocol, number of nodes, transmission range, and traffic workload are varied. Performance is evaluated based on proportion of packets successfully delivered, average packet hop count, and average end-to-end delay of packets received. Results indicate that the routing protocol choice has a significant impact on routing performance. While GPSR successfully delivers 50% more packets than OLSR, and experiences a 53% smaller end-to-end delay than AODV when routing packets in a swarm of UAVs, increasing transmission range and using direct transmission to destination nodes with no routing results in a level of performance not achieved using any of the routing protocols evaluated

    Analytical Modeling of High Performance Reconfigurable Computers: Prediction and Analysis of System Performance.

    Get PDF
    The use of a network of shared, heterogeneous workstations each harboring a Reconfigurable Computing (RC) system offers high performance users an inexpensive platform for a wide range of computationally demanding problems. However, effectively using the full potential of these systems can be challenging without the knowledge of the system’s performance characteristics. While some performance models exist for shared, heterogeneous workstations, none thus far account for the addition of Reconfigurable Computing systems. This dissertation develops and validates an analytic performance modeling methodology for a class of fork-join algorithms executing on a High Performance Reconfigurable Computing (HPRC) platform. The model includes the effects of the reconfigurable device, application load imbalance, background user load, basic message passing communication, and processor heterogeneity. Three fork-join class of applications, a Boolean Satisfiability Solver, a Matrix-Vector Multiplication algorithm, and an Advanced Encryption Standard algorithm are used to validate the model with homogeneous and simulated heterogeneous workstations. A synthetic load is used to validate the model under various loading conditions including simulating heterogeneity by making some workstations appear slower than others by the use of background loading. The performance modeling methodology proves to be accurate in characterizing the effects of reconfigurable devices, application load imbalance, background user load and heterogeneity for applications running on shared, homogeneous and heterogeneous HPRC resources. The model error in all cases was found to be less than five percent for application runtimes greater than thirty seconds and less than fifteen percent for runtimes less than thirty seconds. The performance modeling methodology enables us to characterize applications running on shared HPRC resources. Cost functions are used to impose system usage policies and the results of vii the modeling methodology are utilized to find the optimal (or near-optimal) set of workstations to use for a given application. The usage policies investigated include determining the computational costs for the workstations and balancing the priority of the background user load with the parallel application. The applications studied fall within the Master-Worker paradigm and are well suited for a grid computing approach. A method for using NetSolve, a grid middleware, with the model and cost functions is introduced whereby users can produce optimal workstation sets and schedules for Master-Worker applications running on shared HPRC resources

    Bandwidth-aware distributed ad-hoc grids in deployed wireless sensor networks

    Get PDF
    Nowadays, cost effective sensor networks can be deployed as a result of a plethora of recent engineering advances in wireless technology, storage miniaturisation, consolidated microprocessor design, and sensing technologies. Whilst sensor systems are becoming relatively cheap to deploy, two issues arise in their typical realisations: (i) the types of low-cost sensors often employed are capable of limited resolution and tend to produce noisy data; (ii) network bandwidths are relatively low and the energetic costs of using the radio to communicate are relatively high. To reduce the transmission of unnecessary data, there is a strong argument for performing local computation. However, this can require greater computational capacity than is available on a single low-power processor. Traditionally, such a problem has been addressed by using load balancing: fragmenting processes into tasks and distributing them amongst the least loaded nodes. However, the act of distributing tasks, and any subsequent communication between them, imposes a geographically defined load on the network. Because of the shared broadcast nature of the radio channels and MAC layers in common use, any communication within an area will be slowed by additional traffic, delaying the computation and reporting that relied on the availability of the network. In this dissertation, we explore the tradeoff between the distribution of computation, needed to enhance the computational abilities of networks of resource-constrained nodes, and the creation of network traffic that results from that distribution. We devise an application-independent distribution paradigm and a set of load distribution algorithms to allow computationally intensive applications to be collaboratively computed on resource-constrained devices. Then, we empirically investigate the effects of network traffic information on the distribution performance. We thus devise bandwidth-aware task offload mechanisms that, combining both nodes computational capabilities and local network conditions, investigate the impacts of making informed offload decisions on system performance. The highly deployment-specific nature of radio communication means that simulations that are capable of producing validated, high-quality, results are extremely hard to construct. Consequently, to produce meaningful results, our experiments have used empirical analysis based on a network of motes located at UCL, running a variety of I/O-bound, CPU-bound and mixed tasks. Using this setup, we have established that even relatively simple load sharing algorithms can improve performance over a range of different artificially generated scenarios, with more or less timely contextual information. In addition, we have taken a realistic application, based on location estimation, and implemented that across the same network with results that support the conclusions drawn from the artificially generated traffic

    Multilevel Parallel Communications

    Get PDF
    The research reported in this thesis investigates the use of parallelism at multiple levels to realize high-speed networks that offer advantages in throughput, cost, reliability, and flexibility over alternative approaches. This research specifically considers use of parallelism at two levels: the upper level and the lower level. At the upper level, N protocol processors perform functions included in the transport and network layers. At the lower level, M channels provide data and physical layer functions. The resulting system provides very high bandwidth to an application. A key concept of this research is the use of replicated channels to provide a single, high bandwidth channel to a single application. The parallelism provided by the network is transparent to communicating applications, thus differentiating this strategy from schemes that provide a collection of disjoint channels between applications on different nodes. Another innovative aspect of this research is that parallelism is exploited at multiple layers of the network to provide high throughput not only at the physical layer, but also at upper protocol layers. Schedulers are used to distribute data from a single stream to multiple channels and to merge data from multiple channels to reconstruct a single coherent stream. High throughput is possible by providing the combined bandwidth of multiple channels to a single source and destination through use of parallelism at multiple protocol layers. This strategy is cost effective since systems can be built using standard technologies that benefit from the economies of a broad applications base. The exotic and revolutionary components needed in non-parallel approaches to build high speed networks are not required. The replicated channels can be used to achieve high reliability as well. Multilevel parallelism is flexible since the degree of parallelism provided at any level can be matched to protocol processing demands and application requirements

    Efficient Communication Acceleration for Next-Gen Scale-up Deep Learning Training Platforms

    Full text link
    Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators (e.g., GPU/TPU) via fast, customized interconnects. As the size of DL models and the compute efficiency of the accelerators has continued to increase, there has also been a corresponding steady increase in the bandwidth of these interconnects.Systems today provide 100s of gigabytes (GBs) of inter-connect bandwidth via a mix of solutions such as Multi-Chip packaging modules (MCM) and proprietary interconnects(e.g., NVlink) that together from the scale-up network of accelerators. However, as we identify in this work, a significant portion of this bandwidth goes under-utilized. This is because(i) using compute cores for executing collective operations such as all-reduce decreases overall compute efficiency, and(ii) there is memory bandwidth contention between the accesses for arithmetic operations vs those for collectives, and(iii) there are significant internal bus congestions that increase the latency of communication operations. To address this challenge, we propose a novel microarchitecture, calledAccelerator Collectives Engine(ACE), forDL collective communication offload. ACE is a smart net-work interface (NIC) tuned to cope with the high-bandwidth and low latency requirements of scale-up networks and is able to efficiently drive the various scale-up network systems(e.g. switch-based or point-to-point topologies). We evaluate the benefits of the ACE with micro-benchmarks (e.g. single collective performance) and popular DL models using an end-to-end DL training simulator. For modern DL workloads, ACE on average increases the net-work bandwidth utilization by 1.97X, resulting in 2.71X and 1.44X speedup in iteration time for ResNet-50 and GNMT, respectively

    Survey of Transportation of Adaptive Multimedia Streaming service in Internet

    Full text link
    [DE] World Wide Web is the greatest boon towards the technological advancement of modern era. Using the benefits of Internet globally, anywhere and anytime, users can avail the benefits of accessing live and on demand video services. The streaming media systems such as YouTube, Netflix, and Apple Music are reining the multimedia world with frequent popularity among users. A key concern of quality perceived for video streaming applications over Internet is the Quality of Experience (QoE) that users go through. Due to changing network conditions, bit rate and initial delay and the multimedia file freezes or provide poor video quality to the end users, researchers across industry and academia are explored HTTP Adaptive Streaming (HAS), which split the video content into multiple segments and offer the clients at varying qualities. The video player at the client side plays a vital role in buffer management and choosing the appropriate bit rate for each such segment of video to be transmitted. A higher bit rate transmitted video pauses in between whereas, a lower bit rate video lacks in quality, requiring a tradeoff between them. The need of the hour was to adaptively varying the bit rate and video quality to match the transmission media conditions. Further, The main aim of this paper is to give an overview on the state of the art HAS techniques across multimedia and networking domains. A detailed survey was conducted to analyze challenges and solutions in adaptive streaming algorithms, QoE, network protocols, buffering and etc. It also focuses on various challenges on QoE influence factors in a fluctuating network condition, which are often ignored in present HAS methodologies. Furthermore, this survey will enable network and multimedia researchers a fair amount of understanding about the latest happenings of adaptive streaming and the necessary improvements that can be incorporated in future developments.Abdullah, MTA.; Lloret, J.; Canovas Solbes, A.; GarcĂ­a-GarcĂ­a, L. (2017). Survey of Transportation of Adaptive Multimedia Streaming service in Internet. Network Protocols and Algorithms. 9(1-2):85-125. doi:10.5296/npa.v9i1-2.12412S8512591-

    Dynamic load balancing of parallel road traffic simulation

    Get PDF
    The objective of this research was to investigate, develop and evaluate dynamic load-balancing strategies for parallel execution of microscopic road traffic simulations. Urban road traffic simulation presents irregular, and dynamically varying distributed computational load for a parallel processor system. The dynamic nature of road traffic simulation systems lead to uneven load distribution during simulation, even for a system that starts off with even load distributions. Load balancing is a potential way of achieving improved performance by reallocating work from highly loaded processors to lightly loaded processors leading to a reduction in the overall computational time. In dynamic load balancing, workloads are adjusted continually or periodically throughout the computation. In this thesis load balancing strategies were evaluated and some load balancing policies developed. A load index and a profitability determination algorithms were developed. These were used to enhance two load balancing algorithms. One of the algorithms exhibits local communications and distributed load evaluation between the neighbour partitions (diffusion algorithm) and the other algorithm exhibits both local and global communications while the decision making is centralized (MaS algorithm). The enhanced algorithms were implemented and synthesized with a research parallel traffic simulation. The performance of the research parallel traffic simulator, optimized with the two modified dynamic load balancing strategies were studied
    • …
    corecore