642 research outputs found
An Analysis of Linux Scalability to Many Cores
URL to paper from conference siteThis paper analyzes the scalability of seven system applications
(Exim, memcached, Apache, PostgreSQL, gmake,
Psearchy, and MapReduce) running on Linux on a 48-
core computer. Except for gmake, all applications trigger
scalability bottlenecks inside a recent Linux kernel. Using
mostly standard parallel programming techniques—
this paper introduces one new technique, sloppy counters—
these bottlenecks can be removed from the kernel
or avoided by changing the applications slightly. Modifying
the kernel required in total 3002 lines of code changes.
A speculative conclusion from this analysis is that there
is no scalability reason to give up on traditional operating
system organizations just yet.Quanta Computer (Firm)National Science Foundation (U.S.) (0834415)National Science Foundation (U.S.) (0915164)Microsoft Research (Fellowship)Irwin Mark Jacobs and Joan Klein Jacobs Presidential Fellowshi
Improving network connection locality on multicore systems
Incoming and outgoing processing for a given TCP connection often execute on different cores: an incoming packet is typically processed on the core that receives the interrupt, while outgoing data processing occurs on the core running the relevant user code. As a result, accesses to read/write connection state (such as TCP control blocks) often involve cache invalidations and data movement between cores' caches. These can take hundreds of processor cycles, enough to significantly reduce performance.
We present a new design, called Affinity-Accept, that causes all processing for a given TCP connection to occur on the same core. Affinity-Accept arranges for the network interface to determine the core on which application processing for each new connection occurs, in a lightweight way; it adjusts the card's choices only in response to imbalances in CPU scheduling. Measurements show that for the Apache web server serving static files on a 48-core AMD system, Affinity-Accept reduces time spent in the TCP stack by 30% and improves overall throughput by 24%.National Science Foundation (U.S.). (Grant number CNS-0834415)National Science Foundation (U.S.). (Grant number CNS-0915164)Quanta Computer (Firm
Self-tuning of disk input–output in operating systems
The final publication is available via http://dx.doi.org/10.1016/j.jss.2011.07.030One of the most difficult and hard to learn tasks in computer system management is tuning the kernel parameters in order to get the maximum performance. Traditionally, this tuning has been set using either fixed configurations or the subjective administrator's criteria. The main bottleneck among the subsystems managed by the operating systems is disk input/output (I/O). An evolutionary module has been developed to perform the tuning of this subsystem automatically, using an adaptive and dynamic approach. Any computer change, both at the hardware level, and due to the nature of the workload itself, will make our module adapt automatically and in a transparent way. Thus, system administrators are released from this kind of task and able to achieve some optimal performances adapted to the framework of each of their systems. The experiment made shows a productivity increase in 88.2% of cases and an average improvement of 29.63% with regard to the default configuration of the Linux operating system. A decrease of the average latency was achieved in 77.5% of cases and the mean decrease in the request processing time of I/O was 12.79%
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
Load Balancing in Heterogeneous Cloud Environments by Using PROMETHEE Method
Abstract: Efficient Scheduling of tasks in a cloud environment improves resources utilization thereby meeting users' requirements. One of the most important objectives of a scheduling algorithm in cloud environment is a balanced load distribution over various resources for enhancing the overall performance of the cloud. Such a scheduling is complex in nature due to the dynamicity of resources and incoming application specifications. In this paper, we employ PROMETHEE decision making model to design a scheduling algorithm, called PROMETHEE Load Balancing (PLB).This paper formulates the load balancing issue as a multi-criteria decision making problem and aims to achieve well-balanced load across virtual machines for maximizing the overall throughput of the cloud. Extensive simulation results in CloudSim environment show that the proposed algorithm outperforms existing algorithms in terms of load balancing index (LBI), VM load variation, makespan, average execution time and waiting time
- …