Search CORE

120 research outputs found

Competitive Parallel Disk Prefetching and Buffer Management

Author: Barve Rakesh
Kallahalla Mahesh
Varman Peter J.
Vitter Jeffrey Scott
Publication venue: 'Elsevier BV'
Publication date: 21/03/2011
Field of study

We provide a competitive analysis framework for online prefetching and buffer management algorithms in parallel I/O systems, using a read-once model of block references. This has widespread applicability to key I/O-bound applications such as external merging and concurrent playback of multiple video streams. Two realistic lookahead models, global lookahead and local lookahead, are defined. Algorithms NOM and GREED based on these two forms of lookahead are analyzed for shared buffer and distributed buffer configurations, both of which occur frequently in existing systems. An important aspect of our work is that we show how to implement both the models of lookahead in practice using the simple techniques of forecasting and flushing. Given a -disk parallel I/O system and a globally shared I/O buffer that can hold upto disk blocks, we derive a lower bound of on the competitive ratio of any deterministic online prefetching algorithm with lookahead. NOM is shown to match the lower bound using global -block lookahead. In contrast, using only local lookahead results in an competitive ratio. When the buffer is distributed into portions of blocks each, the algorithm GREED based on local lookahead is shown to be optimal, and NOM is within a constant factor of optimal. Thus we provide a theoretical basis for the intuition that global lookahead is more valuable for prefetching in the case of a shared buffer configuration whereas it is enough to provide local lookahead in case of the distributed configuration. Finally, we analyze the performance of these algorithms for reference strings generated by a uniformly-random stochastic process and we show that they achieve the minimal expected number of I/Os. These results also give bounds on the worst-case expected performance of algorithms which employ randomization in the data layout

KU ScholarWorks

Prefetching techniques for client server object-oriented database systems

Author: Knafla Nils
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

The performance of many object-oriented database applications suffers from the page fetch latency which is determined by the expense of disk access. In this work we suggest several prefetching techniques to avoid, or at least to reduce, page fetch latency. In practice no prediction technique is perfect and no prefetching technique can entirely eliminate delay due to page fetch latency. Therefore we are interested in the trade-off between the level of accuracy required for obtaining good results in terms of elapsed time reduction and the processing overhead needed to achieve this level of accuracy. If prefetching accuracy is high then the total elapsed time of an application can be reduced significantly otherwise if the prefetching accuracy is low, many incorrect pages are prefetched and the extra load on the client, network, server and disks decreases the whole system performance. Access pattern of object-oriented databases are often complex and usually hard to predict accurately. The ..

CiteSeerX

Edinburgh Research Archive

Parallel Out-of-Core Sorting: The Third Way

Author: Chaudhry Geeta
Publication venue: Dartmouth Digital Commons
Publication date: 12/03/2004
Field of study

Sorting very large datasets is a key subroutine in almost any application that is built on top of a large database. Two ways to sort out-of-core data dominate the literature: merging-based algorithms and partitioning-based algorithms. Within these two paradigms, all the programs that sort out-of-core data on a cluster rely on assumptions about the input distribution. We propose a third way of out-of-core sorting: oblivious algorithms. In all, we have developed six programs that sort out-of-core data on a cluster. The first three programs, based completely on Leighton\u27s columnsort algorithm, have a restriction on the maximum problem size that they can sort. The other three programs relax this restriction; two are based on our original algorithmic extensions to columnsort. We present experimental results to show that our algorithms perform well. To the best of our knowledge, the programs presented in this thesis are the first to sort out-of-core data on a cluster without making any simplifying assumptions about the distribution of the data to be sorted

Dartmouth Digital Commons (Dartmouth College)

Recommended from our members

Effectiveness of Cloud Services for Scientific and VoD Applications

Author: Krishnappa Dilip Kumar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 17/03/2015
Field of study

Cloud platforms have emerged as the primary data warehouse for a variety of applications, such as DropBox, iCloud, Google Music, etc. These applications allow users to store data in the cloud and access it from anywhere in the world. Commercial clouds are also well suited for providing high-end servers for rent to execute applications that require computation resources sporadically. Cloud users only pay for the time they actually use the hardware and the amount of data that is transmitted to and from the cloud, which has the potential to be more cost effective than purchasing, hosting, and maintaining dedicated hardware. In this dissertation, we look into the efficiency of the cloud Infrastructure-as-a-Service (IaaS) model for two real time high bandwidth applications: A scientific application of short-term weather forecasting and Video on Demand services. We show that, cloud services are efficient in both network and computation for real time scientific application of weather forecasting. We present a related list reordering approach, which reduces the network traffic of serving videos from VoD services and improve the efficiency of caches deployed to serve them. Also, we present transcoding policies to reduce the transcoding workload and present prediction models to maintain performance of providing ABR streaming of VoD services at the client with online transcoding in the cloud

ScholarWorks@UMass Amherst

An enhanced active caching strategy for data-intensive computations in distributed GIS

Author: B Yu
C Yang
D-J Park
I Foster
J Eidsvik
J Fernandez
Jizhe Xia
K Matthias
K Sashi
L Shi
L Wang
L Xiong
Li Rui
Li Rui
Liu Jianliang
LJ Zhao
MF Goodchild
MG D’Urso
MN Boulos
S Pan
SA Krashakov
Serdar Yeşilmurat
SH Fuller
Shaoming Pan
Sorn Jarukasemratana
T Wang
Wang Gai Ge
Wang Hao
Wang Tao
Xicheng Tan
Xuan Shi
Y Yao
Yanwen Chong
YC Zhao
YC Zhao
Zhengquan Xu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

EFFECTIVE GROUPING FOR ENERGY AND PERFORMANCE: CONSTRUCTION OF ADAPTIVE, SUSTAINABLE, AND MAINTAINABLE DATA STORAGE

Author: Essary David
Publication venue
Publication date: 08/06/2011
Field of study

The performance gap between processors and storage systems has been increasingly critical overthe years. Yet the performance disparity remains, and further, storage energy consumption israpidly becoming a new critical problem. While smarter caching and predictive techniques domuch to alleviate this disparity, the problem persists, and data storage remains a growing contributorto latency and energy consumption.Attempts have been made at data layout maintenance, or intelligent physical placement ofdata, yet in practice, basic heuristics remain predominant. Problems that early studies soughtto solve via layout strategies were proven to be NP-Hard, and data layout maintenance todayremains more art than science. With unknown potential and a domain inherently full of uncertainty,layout maintenance persists as an area largely untapped by modern systems. But uncertainty inworkloads does not imply randomness; access patterns have exhibited repeatable, stable behavior.Predictive information can be gathered, analyzed, and exploited to improve data layouts. Ourgoal is a dynamic, robust, sustainable predictive engine, aimed at improving existing layouts byreplicating data at the storage device level.We present a comprehensive discussion of the design and construction of such a predictive engine,including workload evaluation, where we present and evaluate classical workloads as well asour own highly detailed traces collected over an extended period. We demonstrate significant gainsthrough an initial static grouping mechanism, and compare against an optimal grouping method ofour own construction, and further show significant improvement over competing techniques. We also explore and illustrate the challenges faced when moving from static to dynamic (i.e. online)grouping, and provide motivation and solutions for addressing these challenges. These challengesinclude metadata storage, appropriate predictive collocation, online performance, and physicalplacement. We reduced the metadata needed by several orders of magnitude, reducing the requiredvolume from more than 14% of total storage down to less than 12%. We also demonstrate how ourcollocation strategies outperform competing techniques. Finally, we present our complete modeland evaluate a prototype implementation against real hardware. This model was demonstrated tobe capable of reducing device-level accesses by up to 65%

D-Scholarship@Pitt

Topics in access, storage, and sensor networks

Author: Bhatia Swapnil
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/2010
Field of study

In the first part of this dissertation, Data Over Cable Service Interface Specification (DOCSIS) and IEEE 802.3ah Ethernet Passive Optical Network (ETON), two access networking standards, are studied. We study the impact of two parameters of the DOCSIS protocol and derive the probability of message collision in the 802.3ah device discovery scheme. We survey existing bandwidth allocation schemes for EPONs, derive the average grant size in one such scheme, and study the performance of the shortest-job-first heuristic. In the second part of this dissertation, we study networks of mobile sensors. We make progress towards an architecture for disconnected collections of mobile sensors. We propose a new design abstraction called tours which facilitates the combination of mobility and communication into a single design primitive and enables the system of sensors to reorganize into desirable topologies alter failures. We also initiate a study of computation in mobile sensor networks. We study the relationship between two distributed computational models of mobile sensor networks: population protocols and self-similar functions. We define the notion of a self-similar predicate and show when it is computable by a population protocol. Transition graphs of population protocols lead its to the consideration of graph powers. We consider the direct product of graphs and its new variant which we call the lexicographic direct product (or the clique product). We show that invariants concerning transposable walks in direct graph powers and transposable independent sets in graph families generated by the lexicographic direct product are uncomputable. The last part of this dissertation makes contributions to the area of storage systems. We propose a sequential access detect ion and prefetching scheme and a dynamic cache sizing scheme for large storage systems. We evaluate the cache sizing scheme theoretically and through simulations. We compute the expected hit ratio of our and competing schemes and bound the expected size of our dynamic cache sufficient to obtain an optimal hit ratio. We also develop a stand-alone simulator for studying our proposed scheme and integrate it with an empirically validated disk simulator

UNH Scholars' Repository

09491 Abstracts Collection -- Graph Search Engineering

Author: Brim Lubos
Edelkamp Stefan
Hansen Eric A.
Sanders Peter
Publication venue: Dagstuhl Seminar Proceedings. 09491 - Graph Search Engineering
Publication date: 01/01/2010
Field of study

From the 29th November to the 4th December 2009, the Dagstuhl Seminar 09491 ``Graph Search Engineering \u27\u27 was held in Schloss Dagstuhl~--~Leibniz Center for Informatics. During the seminar, several participants presented their current research, and ongoing work and open problems were discussed. Abstracts of the presentations given during the seminar as well as abstracts of seminar results and ideas are put together in this paper. The first section describes the seminar topics and goals in general. Links to extended abstracts or full papers are provided, if available

Dagstuhl Research Online Publication Server

HSP-Wrap: The Design and Evaluation of Reusable Parallelism for a Subclass of Data-Intensive Applications

Author: Giblock Paul R.
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2015
Field of study

There is an increasing gap between the rate at which data is generated by scientific and non-scientific fields and the rate at which data can be processed by available computing resources. In this paper, we introduce the fields of Bioinformatics and Cheminformatics; two fields where big data has become a problem due to continuing advances in the technologies that drives these fields: such as gene sequencing and small ligand exploration. We introduce high performance computing as a means to process this growing base of data in order to facilitate knowledge discovery. We enumerate goals of the project including reusability, efficiency, reliability, and scalability. We then describe the implementation of a software scheduler which aims to improve input and output performance of a targeted collection of informatics tools, as well as the profiling and optimization needed to tune the software. We evaluate the performance of the software with a scalability study of the Bioinformatics tools BLAST, HMMER, and MUSCLE; as well as the Cheminformatics tool DOCK6

University of Tennessee, Knoxville: Trace