828 research outputs found

    D^2PS: A Dependable Data Provisioning Service in Multi-tenant Cloud Environment

    Get PDF
    Software as a Service (SaaS) is a software delivery and business model widely used by Cloud computing. Instead of purchasing and maintaining a software suite permanently, customers only need to lease the software on-demand. The domain of high assurance distributed systems has focused greatly on the areas of fault tolerance and dependability. In a multi-tenant context, it is particularly important to store, manage and provision data services to customers in a highly efficient and dependable manner due to a large number of file operations involved in running such services. It is also desirable to allow a user group to share and cooperate (e.g., co-edit) on some specific data. In this paper we present a dependable data provisioning service in a multi-tenant Cloud environment. We describe a metadata management approach and leverage multiple replicated metadata caching to shorten the file access time, with the improved efficiency of data sharing. In order to reduce frequent data transmission and data access latency, we introduce a distributed cooperative disk cache mechanism that supports effective cache placement and pull-push cache synchronization. In addition, we use efficient component failover to enhance the service dependability whilst avoiding negative impact from system failures. Our experimental results show that our system can significantly reduce both unused data transmission and response latency. Specifically, over 50% network transmission and operational latency can be saved for random reads while 28.24% network traffic and 25% response latency can be reduced for random write operations. We believe that these findings are demonstrating positive results along the right direction of resolving storage-related challenges in a multi-tenant Cloud environment

    On Improving the Robustness of Partitionable Internet-Based Mobile Ad Hoc Networks

    Get PDF
    Recent technological advances in portability, mobility support, and high speed wireless communications and users' insatiable interest in accessing the Internet have fueled to development of mobile wireless networks. Internet-based mobile ad hoc network (IMANET) is emerging as a ubiquitous communication infrastructure that combines a mobile ad hoc network (MANET) and the Internet to provide universal information accessibility. However, communication performance may be seriously degraded by network partitions resulted from frequent changes of the network topology. In this paper, we propose an enhanced least recently used replacement policy as a part of the aggregate cache mechanism to improve the information accessibility and reduce the access latency in the presence of network partitioning. The enhanced aggregate cache is analyzed and also evaluated by simulation. Extensive simulation experiments are conducted under various network topologies by using three different mobility models: random waypoint, Manhattan grid, and mo -di -fied random waypoint. The simulation results indicate that the proposed policy significantly improves communication performance in varying network topologies, and relieves the network partition problem to a great extent

    Improving Response Time and Though put of Search Engine with Web Caching

    Get PDF
    Large web search engines need to be able to process thousands of queries per second on collections of billions of web pages. As a result, query processing is a major performance bottleneck and cost factor in current search engines, and a number of techniques are employed to increase query throughput, including massively parallel processing, index compression, early termination, and caching. Caching is a useful technique for Web systems that are accessed by a large number of users. It enables a shorter average response time, it reduces the workload on back-end servers, and it reduces the overall amount of utilized bandwidth. Our contribution in this paper can be split into two parts. In the first part, we proposed Cached Search Algorithm (CSA) on top of the multiple search engines like Google, Yahoo and Bing and achieved the better response time while accessing the resulting web pages. In the second part, we design and implemented the Cached Search Engine and the performance evaluated based on the training data (WEPS dataset [1]) and the test data (Mobile dataset). The Cached Search outperforms the better by reducing the response time of search engine and to increase response throughput of the searched results

    Hardware accelerated redundancy elimination in network system

    Get PDF
    With the tremendous growth in the amount of information stored on remote locations and cloud systems, many service providers are seeking ways to reduce the amount of redundant information sent across networks by using data de-duplication techniques. Data de-duplication can reduce network traffic without the loss of information, and consequently increase available network bandwidth by reducing redundant traffic. However, due to the heavy computation required for detecting and reducing redundant data transmission, de-duplication itself can become a bottleneck in high capacity links. We completed two parts of work in this research study, Hardware Accelerated Redundancy Elimination in Network Systems (HARENS) and Distributed Redundancy Elimination System Simulation (DRESS). HARENS can significantly improve the performance of redundancy elimination algorithm in a network system by leveraging General Purpose Graphic Processing Unit (GPGPU) techniques as well as other big data optimizations such as the use of a hierarchical multi-threaded pipeline, single machine Map-Reduce, and memory efficiency techniques. Our results indicate that throughput can be increased by a factor of 9 times compared to a naive implementation of the data de-duplication algorithm, providing a net transmission increase of up to 3.0 Gigabits per second (Gbps). DRESS provides further acceleration to the redundancy elimination in network system by deploying HARENS as the server\u27s side redundancy elimination module, and four cooperative distributed byte caches on the clients\u27 side. A client\u27s side distributed byte cache broadcast its cached chunks by sending hash values to other byte caches, so that they can keep a record of all the chunks in the cooperative distributed cache system. When duplications are detected, a client\u27s side byte cache can fetch a chunk directly from either its own cache or peer byte caches rather than server\u27s side redundancy elimination module. Our results indicate that bandwidth savings of the redundancy elimination system with cooperative distributed byte cache can be increased by 12% compared to the one without distributed byte cache, when transferring about 48 Gigabits of data

    Distributed Caching for Processing Raw Arrays

    Get PDF
    As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority - by as much as two orders of magnitude - of the proposed framework over existing techniques in terms of cache overhead and workload execution time

    Optimization inWeb Caching: Cache Management, Capacity Planning, and Content Naming

    Full text link
    Caching is fundamental to performance in distributed information retrieval systems such as the World Wide Web. This thesis introduces novel techniques for optimizing performance and cost-effectiveness in Web cache hierarchies. When requests are served by nearby caches rather than distant servers, server loads and network traffic decrease and transactions are faster. Cache system design and management, however, face extraordinary challenges in loosely-organized environments like the Web, where the many components involved in content creation, transport, and consumption are owned and administered by different entities. Such environments call for decentralized algorithms in which stakeholders act on local information and private preferences. In this thesis I consider problems of optimally designing new Web cache hierarchies and optimizing existing ones. The methods I introduce span the Web from point of content creation to point of consumption: I quantify the impact of content-naming practices on cache performance; present techniques for variable-quality-of-service cache management; describe how a decentralized algorithm can compute economically-optimal cache sizes in a branching two-level cache hierarchy; and introduce a new protocol extension that eliminates redundant data transfers and allows “dynamic” content to be cached consistently. To evaluate several of my new methods, I conducted trace-driven simulations on an unprecedented scale. This in turn required novel workload measurement methods and efficient new characterization and simulation techniques. The performance benefits of my proposed protocol extension are evaluated using two extraordinarily large and detailed workload traces collected in a traditional corporate network environment and an unconventional thin-client system. My empirical research follows a simple but powerful paradigm: measure on a large scale an important production environment’s exogenous workload; identify performance bounds inherent in the workload, independent of the system currently serving it; identify gaps between actual and potential performance in the environment under study; and finally devise ways to close these gaps through component modifications or through improved inter-component integration. This approach may be applicable to a wide range of Web services as they mature.Ph.D.Computer Science and EngineeringUniversity of Michiganhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/1/kelly-optimization_web_caching.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/90029/2/kelly-optimization_web_caching.ps.bz

    A peer distributed web caching system with incremental update scheme

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Distributed Caching for Complex Querying of Raw Arrays

    Get PDF
    As applications continue to generate multi-dimensional data at exponentially increasing rates, fast analytics to extract meaningful results is becoming extremely important. The database community has developed array databases that alleviate this problem through a series of techniques. In-situ mechanisms provide direct access to raw data in the original format---without loading and partitioning. Parallel processing scales to the largest datasets. In-memory caching reduces latency when the same data are accessed across a workload of queries. However, we are not aware of any work on distributed caching of multi-dimensional raw arrays. In this paper, we introduce a distributed framework for cost-based caching of multi-dimensional arrays in native format. Given a set of files that contain portions of an array and an online query workload, the framework computes an effective caching plan in two stages. First, the plan identifies the cells to be cached locally from each of the input files by continuously refining an evolving R-tree index. In the second stage, an optimal assignment of cells to nodes that collocates dependent cells in order to minimize the overall data transfer is determined. We design cache eviction and placement heuristic algorithms that consider the historical query workload. A thorough experimental evaluation over two real datasets in three file formats confirms the superiority -- by as much as two orders of magnitude -- of the proposed framework over existing techniques in terms of cache overhead and workload execution time
    corecore