1,763 research outputs found
Improving I/O performance through an in-kernel disk simulator
This paper presents two mechanisms that can significantly improve the I/O performance of both hard and solid-state drives for read operations: KDSim and REDCAP. KDSim is an in-kernel disk simulator that provides a framework for simultaneously simulating the performance obtained by different I/O system mechanisms and algorithms, and for dynamically turning them on and off, or selecting between different options or policies, to improve the overall system performance. REDCAP is a RAM-based disk cache that effectively enlarges the built-in cache present in disk drives. By using KDSim, this cache is dynamically activated/deactivated according to the throughput achieved. Results show that, by using KDSim and REDCAP together, a system can improve its I/O performance up to 88% for workloads with some spatial locality on both hard and solid-state drives, while it achieves the same performance as a âregular systemâ for workloads with random or sequential access patterns.Peer ReviewedPostprint (author's final draft
Leveraging Program Analysis to Reduce User-Perceived Latency in Mobile Applications
Reducing network latency in mobile applications is an effective way of
improving the mobile user experience and has tangible economic benefits. This
paper presents PALOMA, a novel client-centric technique for reducing the
network latency by prefetching HTTP requests in Android apps. Our work
leverages string analysis and callback control-flow analysis to automatically
instrument apps using PALOMA's rigorous formulation of scenarios that address
"what" and "when" to prefetch. PALOMA has been shown to incur significant
runtime savings (several hundred milliseconds per prefetchable HTTP request),
both when applied on a reusable evaluation benchmark we have developed and on
real applicationsComment: ICSE 201
Three-dimensional memory vectorization for high bandwidth media memory systems
Vector processors have good performance, cost and adaptability when targeting multimedia applications. However, for a significant number of media programs, conventional memory configurations fail to deliver enough memory references per cycle to feed the SIMD functional units. This paper addresses the problem of the memory bandwidth. We propose a novel mechanism suitable for 2-dimensional vector architectures and targeted at providing high effective bandwidth for SIMD memory instructions. The basis of this mechanism is the extension of the scope of vectorization at the memory level, so that 3-dimensional memory patterns can be fetched into a second-level register file. By fetching long blocks of data and by reusing 2-dimensional memory streams at this second-level register file, we obtain a significant increase in the effective memory bandwidth. As side benefits, the new 3-dimensional load instructions provide a high robustness to memory latency and a significant reduction of the cache activity, thus reducing power and energy requirements. At the investment of a 50% more area than a regular SIMD register file, we have measured and average speed-up of 13% and the potential for power savings in the L2 cache of a 30%.Peer ReviewedPostprint (published version
CloudTree: A Library to Extend Cloud Services for Trees
In this work, we propose a library that enables on a cloud the creation and
management of tree data structures from a cloud client. As a proof of concept,
we implement a new cloud service CloudTree. With CloudTree, users are able to
organize big data into tree data structures of their choice that are physically
stored in a cloud. We use caching, prefetching, and aggregation techniques in
the design and implementation of CloudTree to enhance performance. We have
implemented the services of Binary Search Trees (BST) and Prefix Trees as
current members in CloudTree and have benchmarked their performance using the
Amazon Cloud. The idea and techniques in the design and implementation of a BST
and prefix tree is generic and thus can also be used for other types of trees
such as B-tree, and other link-based data structures such as linked lists and
graphs. Preliminary experimental results show that CloudTree is useful and
efficient for various big data applications
Survey of Branch Prediction, Pipelining, Memory Systems as Related to Computer Architecture
This paper is a survey of topics introduced in Computer Engineering Course CEC470: Computer Architecture (CEC470). The topics covered in this paper provide much more depth than what was provided in CEC470, in addition to exploring new concepts not touched on in the course. Topics presented include branch prediction, pipelining, registers, memory, and the operating system, as well as some general design considerations for computer architecture as a whole.
The design considerations explored include a discussion on different types of instruction types specific to the ARM Instruction Set Architecture, known as ARM and Thumb, as well as an exploration of the differences between heterogeneous and homogeneous multi-processors.
Further sections explain the interoperability of various portions of the computer architecture with a focus on performance optimizations. Branch prediction is introduced, and the quality improvement which branch prediction provides is detailed. An explanation of pipelining is given followed by how pipelining on different types of processors may be difficult. Registers, one of the fundamental parts of a computer, are explained in detail, as well as their importance to computer systems as a whole.
The memory and operating systems sections tie this paper together by delving deeper into the architecture of computers, then resurfacing with how the software and hardware interact through the operating system.
This paper concludes by tying each section discussed together and presenting the importance of computer architecture
Performance analysis and improvement of PostgreSQL
PostgreSQL is a database management system, used in many different applications throughout the industry. As databases often are the bottlenecks in the performance of applications, their performance becomes crucial. Better performance can either be achieved by using more and faster hardware, or by making the software more efficient. In this master thesis we do a performance analysis of the PostgreSQL database server from the perspective of compiler optimizations, file systems and software prefetching. We will also show how a data structure used in PostgreSQL can benefit from manually introducing software prefetching, as it is hard for the compiler to predict cache misses and insert prefetching instructions in a profitable way.PostgreSQL Àr en populÀr databasserver som anvÀnds i stora delar av industrin. Databasservrar anvÀnds för att lagra, bearbeta och hÀmta uppgifter Ät t.ex. företag, organisationer och universitet som de förlitar sig pÄ i sitt arbete. Ofta blir dock dessa databaser flaskhalsen i hur snabbt arbete kan utföras, och dÀrför har vi analyserat och förbÀttrat en populÀr databasserver
- âŠ