345 research outputs found

    Response-Time Analysis for Non-Preemptive Global Scheduling with {FIFO} Spin Locks

    Get PDF
    Motivated by the lack of response-time analyses for non-preemptive global scheduling that consider shared resources, this paper provides such an analysis for global job-level fixed-priority (JLFP) scheduling policies and FIFO-ordered spin locks. The proposed analysis computes response-time bounds for a set of resource-sharing jobs subject to release jitter and execution-time uncertainties by implicitly exploring all possible execution scenarios using state-abstraction and state-pruning techniques. A large-scale empirical evaluation of the proposed analysis shows it to be substantially less pessimistic than simple execution-time inflation methods, thanks to the explicit modeling of contention for shared resources and scenario-aware blocking analysis

    Support for Programming Models in Network-on-Chip-based Many-core Systems

    Get PDF

    A Generic Software Modeling Framework for Building Heterogeneous Distributed and Parallel Software Systems

    Get PDF
    Heterogeneous distributed and parallel computing environments are highly dependent on hardware and communication protocols. The result is significant difficulty in software reuse, portability across platforms, interoperability, and an increased overall development effort. A new systems engineering approach is needed for parallel processing systems in heterogeneous environments. The generic modeling framework de-emphasizes platform- specific development while exploiting software reuse (and platform-specific capabilities) with a simple, well defined, and easily integrated set of abstractions providing a high level of heterogeneous interoperability

    Architecting Efficient Data Centers.

    Full text link
    Data center power consumption has become a key constraint in continuing to scale Internet services. As our society’s reliance on “the Cloud” continues to grow, companies require an ever-increasing amount of computational capacity to support their customers. Massive warehouse-scale data centers have emerged, requiring 30MW or more of total power capacity. Over the lifetime of a typical high-scale data center, power-related costs make up 50% of the total cost of ownership (TCO). Furthermore, the aggregate effect of data center power consumption across the country cannot be ignored. In total, data center energy usage has reached approximately 2% of aggregate consumption in the United States and continues to grow. This thesis addresses the need to increase computational efficiency to address this grow- ing problem. It proposes a new classes of power management techniques: coordinated full-system idle low-power modes to increase the energy proportionality of modern servers. First, we introduce the PowerNap server architecture, a coordinated full-system idle low- power mode which transitions in and out of an ultra-low power nap state to save power during brief idle periods. While effective for uniprocessor systems, PowerNap relies on full-system idleness and we show that such idleness disappears as the number of cores per processor continues to increase. We expose this problem in a case study of Google Web search in which we demonstrate that coordinated full-system active power modes are necessary to reach energy proportionality and that PowerNap is ineffective because of a lack of idleness. To recover full-system idleness, we introduce DreamWeaver, architectural support for deep sleep. DreamWeaver allows a server to exchange latency for full-system idleness, allowing PowerNap-enabled servers to be effective and provides a better latency- power savings tradeoff than existing approaches. Finally, this thesis investigates workloads which achieve efficiency through methodical cluster provisioning techniques. Using the popular memcached workload, this thesis provides examples of provisioning clusters for cost-efficiency given latency, throughput, and data set size targets.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/91499/1/meisner_1.pd

    C++ Design Patterns for Low-latency Applications Including High-frequency Trading

    Full text link
    This work aims to bridge the existing knowledge gap in the optimisation of latency-critical code, specifically focusing on high-frequency trading (HFT) systems. The research culminates in three main contributions: the creation of a Low-Latency Programming Repository, the optimisation of a market-neutral statistical arbitrage pairs trading strategy, and the implementation of the Disruptor pattern in C++. The repository serves as a practical guide and is enriched with rigorous statistical benchmarking, while the trading strategy optimisation led to substantial improvements in speed and profitability. The Disruptor pattern showcased significant performance enhancement over traditional queuing methods. Evaluation metrics include speed, cache utilisation, and statistical significance, among others. Techniques like Cache Warming and Constexpr showed the most significant gains in latency reduction. Future directions involve expanding the repository, testing the optimised trading algorithm in a live trading environment, and integrating the Disruptor pattern with the trading algorithm for comprehensive system benchmarking. The work is oriented towards academics and industry practitioners seeking to improve performance in latency-sensitive applications

    "Virtual malleability" applied to MPI jobs to improve their execution in a multiprogrammed environment"

    Get PDF
    This work focuses on scheduling of MPI jobs when executing in shared-memory multiprocessors (SMPs). The objective was to obtain the best performance in response time in multiprogrammed multiprocessors systems using batch systems, assuming all the jobs have the same priority. To achieve that purpose, the benefits of supporting malleability on MPI jobs to reduce fragmentation and consequently improve the performance of the system were studied. The contributions made in this work can be summarized as follows:· Virtual malleability: A mechanism where a job is assigned a dynamic processor partition, where the number of processes is greater than the number of processors. The partition size is modified at runtime, according to external requirements such as the load of the system, by varying the multiprogramming level, making the job contend for resources with itself. In addition to this, a mechanism which decides at runtime if applying local or global process queues to an application depending on the load balancing between processes of it. · A job scheduling policy, that takes decisions such as how many processes to start with and the maximum multiprogramming degree based on the type and number of applications running and queued. Moreover, as soon as a job finishes execution and where there are queued jobs, this algorithm analyzes whether it is better to start execution of another job immediately or just wait until there are more resources available. · A new alternative to backfilling strategies for the problema of window execution time expiring. Virtual malleability is applied to the backfilled job, reducing its partition size but without aborting or suspending it as in traditional backfilling. The evaluation of this thesis has been done using a practical approach. All the proposals were implemented, modifying the three scheduling levels: queuing system, processor scheduler and runtime library. The impact of the contributions were studied under several types of workloads, varying machine utilization, communication and, balance degree of the applications, multiprogramming level, and job size. Results showed that it is possible to offer malleability over MPI jobs. An application obtained better performance when contending for the resources with itself than with other applications, especially in workloads with high machine utilization. Load imbalance was taken into account obtaining better performance if applying the right queue type to each application independently.The job scheduling policy proposed exploited virtual malleability by choosing at the beginning of execution some parameters like the number of processes and maximum multiprogramming level. It performed well under bursty workloads with low to medium machine utilizations. However as the load increases, virtual malleability was not enough. That is because, when the machine is heavily loaded, the jobs, once shrunk are not able to expand, so they must be executed all the time with a partition smaller than the job size, thus degrading performance. Thus, at this point the job scheduling policy concentrated just in moldability.Fragmentation was alleviated also by applying backfilling techniques to the job scheduling algorithm. Virtual malleability showed to be an interesting improvement in the window expiring problem. Backfilled jobs even on a smaller partition, can continue execution reducing memory swapping generated by aborts/suspensions In this way the queueing system is prevented from reinserting the backfilled job in the queue and re-executing it in the future.Postprint (published version

    Design of a High Capacity, Scalable, and Green Wireless Communication System Leveraging the Unlicensed Spectrum

    Get PDF
    The stunning demand for mobile wireless data that has been recently growing at an exponential rate requires a several fold increase in spectrum. The use of unlicensed spectrum is thus critically needed to aid the existing licensed spectrum to meet such a huge mobile wireless data traffic growth demand in a cost effective manner. The deployment of Long Term Evolution (LTE) in the unlicensed spectrum (LTE-U) has recently been gaining significant industry momentum. The lower transmit power regulation of the unlicensed spectrum makes LTE deployment in the unlicensed spectrum suitable only for a small cell. A small cell utilizing LTE-L (LTE in licensed spectrum), and LTE-U (LTE in unlicensed spectrum) will therefore significantly reduce the total cost of ownership (TCO) of a small cell, while providing the additional mobile wireless data offload capacity from Macro Cell to small cell in LTE Heterogeneous Networks (HetNet), to meet such an increase in wireless data demand. The U.S. 5 GHz Unlicensed National Information Infrastructure (U-NII) bands that are currently under consideration for LTE deployment in the unlicensed spectrum contain only a limited number of 20 MHZ channels. Thus in a dense multi-operator deployment scenario, one or more LTE-U small cells have to co-exist and share the same 20 MHz unlicensed channel with each other and with the incumbent Wi-Fi. This dissertation presents a proactive small cell interference mitigation strategy for improving the spectral efficiency of LTE networks in the unlicensed spectrum. It describes the scenario and demonstrate via simulation results, that in the absence of an explicit interference mitigation mechanism, there will be a significant degradation in the overall LTE-U system performance for LTE-U co-channel co-existence in countries such as U.S. that do not mandate Listen-Before-Talk (LBT) regulations. An unlicensed spectrum Inter Cell Interference Coordination (usICIC) mechanism is then presented as a time-domain multiplexing technique for interference mitigation for the sharing of an unlicensed channel by multi-operator LTE-U small cells. Through extensive simulation results, it is demonstrated that the proposed usICIC mechanism will result in 40% or more improvement in the overall LTE-U system performance (throughput) leading to increased wireless communication system capacity. The ever increasing demand for mobile wireless data is also resulting in a dramatic expansion of wireless network infrastructure by all service providers resulting in significant escalation in energy consumption by the wireless networks. This not only has an impact on the recurring operational expanse (OPEX) for the service providers, but importantly the resulting increase in greenhouse gas emission is not good for the environment. Energy efficiency has thus become one of the critical tenets in the design and deployment of Green wireless communication systems. Consequently the market trend for next-generation communication systems has been towards miniaturization to meet this stunning ever increasing demand for mobile wireless data, leading towards the need for scalable distributed and parallel processing system architecture that is energy efficient, and high capacity. Reducing cost and size while increasing capacity, ensuring scalability, and achieving energy efficiency requires several design paradigm shifts. This dissertation presents the design for a next generation wireless communication system that employs new energy efficient distributed and parallel processing system architecture to achieve these goals while leveraging the unlicensed spectrum to significantly increase (by a factor of two) the capacity of the wireless communication system. This design not only significantly reduces the upfront CAPEX, but also the recurring OPEX for the service providers to maintain their next generation wireless communication networks
    • …
    corecore