99 research outputs found

    Performance investigation of a document retrieval system on a voice-data integrated token ring local area network

    Get PDF
    Lately, the interest in integration of voice and data on local computer networks has been on the rise. Subsequently, much research has been devoted to exploring various techniques that are implementable using the existing standards. This research has focused on the design issues in implementing a document retrieval system on a token ring network. The presence of voice and data traffic on the network complicates the protocol design further. The performance requirements of these traffic types are different. Voice creates stream traffic on a network, where as data traffic is bursty. Voice packets need to be delivered within a limited time interval, whereas the data emphasizes on error-free delivery. The necessity and the technological feasibility with off-the-shelf components has prompted this study. A possible solution is discussed in this dissertation;During the course of this research, due to the time consuming nature of simulation experiments, a need for efficient simulation techniques was felt. Thus, as a byproduct of the initial goal of protocol design, an approximate version of the regenerative simulation was developed and is discussed here in detail;Lastly, modeling difficulties encountered in forming an analytical model are listed and a performance analysis of the subsystems of interest is given

    Datacenter Architectures for the Microservices Era

    Full text link
    Modern internet services are shifting away from single-binary, monolithic services into numerous loosely-coupled microservices that interact via Remote Procedure Calls (RPCs), to improve programmability, reliability, manageability, and scalability of cloud services. Computer system designers are faced with many new challenges with microservice-based architectures, as individual RPCs/tasks are only a few microseconds in most microservices. In this dissertation, I seek to address the most notable challenges that arise due to the dissimilarities of the modern microservice based and classic monolithic cloud services, and design novel server architectures and runtime systems that enable efficient execution of µs-scale microservices on modern hardware. In the first part of my dissertation, I seek to address the problem of Killer Microseconds, which refers to µs-scale “holes” in CPU schedules caused by stalls to access fast I/O devices or brief idle times between requests in high throughput µs-scale microservices. Whereas modern computing platforms can efficiently hide ns-scale and ms-scale stalls through micro-architectural techniques and OS context switching, they lack efficient support to hide the latency of µs-scale stalls. In chapter II, I propose Duplexity, a heterogeneous server architecture that employs aggressive multithreading to hide the latency of killer microseconds, without sacrificing the Quality-of-Service (QoS) of latency-sensitive microservices. Duplexity is able to achieve 1.9× higher core utilization and 2.7× lower iso-throughput 99th-percentile tail latency over an SMT-based server design, on average. In chapters III-IV, I comprehensively investigate the problem of tail latency in the context of microservices and address multiple aspects of it. First, in chapter III, I characterize the tail latency behavior of microservices and provide general guidelines for optimizing computer systems from a queuing perspective to minimize tail latency. Queuing is a major contributor to end-to-end tail latency, wherein nominal tasks are enqueued behind rare, long ones, due to Head-of-Line (HoL) blocking. Next, in chapter IV, I introduce Q-Zilla, a scheduling framework to tackle tail latency from a queuing perspective, and CoreZilla, a microarchitectural instantiation of the framework. Q-Zilla is composed of the ServerQueue Decoupled Size-Interval Task Assignment (SQD-SITA) scheduling algorithm and the Express-lane Simultaneous Multithreading (ESMT) microarchitecture, which together seek to address HoL blocking by providing an “express-lane” for short tasks, protecting them from queuing behind rare, long ones. By combining the ESMT microarchitecture and the SQD-SITA scheduling algorithm, CoreZilla is able to improves tail latency over a conventional SMT core with 2, 4, and 8 contexts by 2.25×, 3.23×, and 4.38×, on average, respectively, and outperform a theoretical 32-core scale-up organization by 12%, on average, with 8 contexts. Finally, in chapters V-VI, I investigate the tail latency problem of microservices from a cluster, rather than server-level, perspective. Whereas Service Level Objectives (SLOs) define end-to-end latency targets for the entire service to ensure user satisfaction, with microservice-based applications, it is unclear how to scale individual microservices when end-to-end SLOs are violated or underutilized. I introduce Parslo as an analytical framework for partial SLO allocation in virtualized cloud microservices. Parslo takes a microservice graph as an input and employs a Gradient Descent-based approach to allocate “partial SLOs” to different microservice nodes, enabling independent auto-scaling of individual microservices. Parslo achieves the optimal solution, minimizing the total cost for the entire service deployment, and is applicable to general microservice graphs.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/167978/1/miramir_1.pd

    Analysis of buffer allocations in time-dependent and stochastic flow lines

    Full text link
    This thesis reviews and classifies the literature on the Buffer Allocation Problem under steady-state conditions and on performance evaluation approaches for queueing systems with time-dependent parameters. Subsequently, new performance evaluation approaches are developed. Finally, a local search algorithm for the derivation of time-dependent buffer allocations is proposed. The algorithm is based on numerically observed monotonicity properties of the system performance in the time-dependent buffer allocations. Numerical examples illustrate that time-dependent buffer allocations represent an adequate way of minimizing the average WIP in the flow line while achieving a desired service level

    Telecommunications Networks

    Get PDF
    This book guides readers through the basics of rapidly emerging networks to more advanced concepts and future expectations of Telecommunications Networks. It identifies and examines the most pressing research issues in Telecommunications and it contains chapters written by leading researchers, academics and industry professionals. Telecommunications Networks - Current Status and Future Trends covers surveys of recent publications that investigate key areas of interest such as: IMS, eTOM, 3G/4G, optimization problems, modeling, simulation, quality of service, etc. This book, that is suitable for both PhD and master students, is organized into six sections: New Generation Networks, Quality of Services, Sensor Networks, Telecommunications, Traffic Engineering and Routing

    Architectural Enhancements for Data Transport in Datacenter Systems

    Full text link
    Datacenter systems run myriad applications, which frequently communicate with each other and/or Input/Output (I/O) devices—including network adapters, storage devices, and accelerators. Due to the growing speed of I/O devices and the emergence of microservice-based programming models, the I/O software stacks have become a critical factor in end-to-end communication performance. As such, I/O software stacks have been evolving rapidly in recent years. Datacenters rely on fast, efficient “Software Data Planes”, which orchestrate data transfer between applications and I/O devices. The goal of this dissertation is to enhance the performance, efficiency, and scalability of software data planes by diagnosing their existing issues and addressing them through hardware-software solutions. In the first step, I characterize challenges of modern software data planes, which bypass the operating system kernel to avoid associated overheads. Since traditional interrupts and system calls cannot be delivered to user code without kernel assistance, kernel-bypass data planes use spinning cores on I/O queues to identify work/data arrival. Spin-polling obviously wastes CPU cycles on checking empty queues; however, I show that it entails even more drawbacks: (1) Full-tilt spinning cores perform more (useless) polling work when there is less work pending in the queues. (2) Spin-polling scales poorly with the number of polled queues due to processor cache capacity constraints, especially when traffic is unbalanced. (3) Spin-polling also scales poorly with the number of cores due to the overhead of polling and operation rate limits. (4) Whereas shared queues can mitigate load imbalance and head-of-line blocking, synchronization overheads of spinning on them limit their potential benefits. Next, I propose a notification accelerator, dubbed HyperPlane, which replaces spin-polling in software data planes. Design principles of HyperPlane are: (1) not iterating on empty I/O queues to find work/data in ready ones, (2) blocking/halting when all queues are empty rather than spinning fruitlessly, and (3) allowing multiple cores to efficiently monitor a shared set of queues. These principles lead to queue scalability, work proportionality, and enjoying theoretical merits of shared queues. HyperPlane is realized with a programming model front-end and a hardware microarchitecture back-end. Evaluation of HyperPlane shows its significant advantage in terms of throughput, average/tail latency, and energy efficiency over a state-of-the-art spin-polling-based software data plane, with very small power and area overheads. Finally, I focus on the data transfer aspect in software data planes. Cache misses incurred by accessing I/O data are a major bottleneck in software data planes. Despite considerable efforts put into delivering I/O data directly to the last-level cache, some access latency is still exposed. Cores cannot prefetch such data to nearer caches in today's systems because of the complex access pattern of data buffers and the lack of an appropriate notification mechanism that can trigger the prefetch operations. As such, I propose HyperData, a data transfer accelerator based on targeted prefetching. HyperData prefetches exact (rather than predicted) data buffers (or a required subset to avoid cache pollution) to the L1 cache of the consumer core at the right time. Prefetching can be done for both core-peripheral and core-core communications. HyperData's prefetcher is programmable and supports various queue formats—namely, direct (regular), indirect (Virtio), and multi-consumer queues. I show that with a minor overhead, HyperData effectively hides data access latency in software data planes, thereby improving both application- and system-level performance and efficiency.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/169826/1/hosseing_1.pd

    Workload Modeling for Computer Systems Performance Evaluation

    Full text link

    Parallel and Distributed Computing

    Get PDF
    The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing

    Investigation of the consumer electronics bus

    Get PDF
    The objectives of this dissertation are to investigate the performance of the Consumer Electronics Bus (CEBus) and to develop a theoretical formulation of the Carrier Sense Multiple Access with Contention Detection and Contention Resolution (CSMA/CDCR) with three priority classes protocol utilized by the CEBus A new priority channel assigned multiple access with embedded priority resolution (PAMA/PR) theoretical model is formulated. It incorporates the main features of the CEBus with three priority classes. The analytical results for throughput and delay obtained by this formulation were compared to simulation experiments. A close agreement has been found thus validated both theory and simulation models Moreover, the performance of the CEBus implemented with two physical media, the power line (PL) and twisted pair (TP) communication lines, was investigated by measuring message and channel throughputs and mean packet and message delays. The router was modeled as a node which can handle three priority levels simultaneously. Satisfactory performance was obtained. Finally, a gateway joining the CEBus to ISDN was designed and its perfor-mance was evaluated. This gateway provides access to ISDN-based services to the CEBus. The ISDN and CEBus system network architecture, gateway wiring, and data and signaling interface between the CEBus and ISDN were designed, analyzed, and discussed. Again, satisfactory performance was found

    AI meets CRNs : a prospective review on the application of deep architectures in spectrum management

    Get PDF
    The spectrum low utilization and high demand conundrum created a bottleneck towards ful lling the requirements of next-generation networks. The cognitive radio (CR) technology was advocated as a de facto technology to alleviate the scarcity and under-utilization of spectrum resources by exploiting temporarily vacant spectrum holes of the licensed spectrum bands. As a result, the CR technology became the rst step towards the intelligentization of mobile and wireless networks, and in order to strengthen its intelligent operation, the cognitive engine needs to be enhanced through the exploitation of arti cial intelligence (AI) strategies. Since comprehensive literature reviews covering the integration and application of deep architectures in cognitive radio networks (CRNs) are still lacking, this article aims at lling the gap by presenting a detailed review that addresses the integration of deep architectures into the intricacies of spectrum management. This is a prospective review whose primary objective is to provide an in-depth exploration of the recent trends in AI strategies employed in mobile and wireless communication networks. The existing reviews in this area have not considered the relevance of incorporating the mathematical fundamentals of each AI strategy and how to tailor them to speci c mobile and wireless networking problems. Therefore, this reviewaddresses that problem by detailing howdeep architectures can be integrated into spectrum management problems. Beyond reviewing different ways in which deep architectures can be integrated into spectrum management, model selection strategies and how different deep architectures can be tailored into the CR space to achieve better performance in complex environments are then reported in the context of future research directions.The Sentech Chair in Broadband Wireless Multimedia Communications (BWMC) at the University of Pretoria.http://ieeexplore.ieee.org/xpl/RecentIssue.jsp?punumber=6287639am2022Electrical, Electronic and Computer Engineerin
    • …
    corecore