106 research outputs found

    Scalable Storage for Digital Libraries

    Get PDF
    I propose a storage system optimised for digital libraries. Its key features are its heterogeneous scalability; its integration and exploitation of rich semantic metadata associated with digital objects; its use of a name space; and its aggressive performance optimisation in the digital library domain

    Corporate influence and the academic computer science discipline. [4: CMU]

    Get PDF
    Prosopographical work on the four major centers for computer research in the United States has now been conducted, resulting in big questions about the independence of, so called, computer science

    ATP: a Datacenter Approximate Transmission Protocol

    Full text link
    Many datacenter applications such as machine learning and streaming systems do not need the complete set of data to perform their computation. Current approximate applications in datacenters run on a reliable network layer like TCP. To improve performance, they either let sender select a subset of data and transmit them to the receiver or transmit all the data and let receiver drop some of them. These approaches are network oblivious and unnecessarily transmit more data, affecting both application runtime and network bandwidth usage. On the other hand, running approximate application on a lossy network with UDP cannot guarantee the accuracy of application computation. We propose to run approximate applications on a lossy network and to allow packet loss in a controlled manner. Specifically, we designed a new network protocol called Approximate Transmission Protocol, or ATP, for datacenter approximate applications. ATP opportunistically exploits available network bandwidth as much as possible, while performing a loss-based rate control algorithm to avoid bandwidth waste and re-transmission. It also ensures bandwidth fair sharing across flows and improves accurate applications' performance by leaving more switch buffer space to accurate flows. We evaluated ATP with both simulation and real implementation using two macro-benchmarks and two real applications, Apache Kafka and Flink. Our evaluation results show that ATP reduces application runtime by 13.9% to 74.6% compared to a TCP-based solution that drops packets at sender, and it improves accuracy by up to 94.0% compared to UDP

    Slight-Delay Shaped Variable Bit Rate (SD-SVBR) Technique for Video Transmission

    Get PDF
    The aim of this thesis is to present a new shaped Variable Bit Rate (VBR) for video transmission, which plays a crucial role in delivering video traffic over the Internet. This is due to the surge of video media applications over the Internet and the video typically has the characteristic of a highly bursty traffic, which leads to the Internet bandwidth fluctuation. This new shaped algorithm, referred to as Slight Delay - Shaped Variable Bit Rate (SD-SVBR), is aimed at controlling the video rate for video application transmission. It is designed based on the Shaped VBR (SVBR) algorithm and was implemented in the Network Simulator 2 (ns-2). SVBR algorithm is devised for real-time video applications and it has several limitations and weaknesses due to its embedded estimation or prediction processes. SVBR faces several problems, such as the occurrence of unwanted sharp decrease in data rate, buffer overflow, the existence of a low data rate, and the generation of a cyclical negative fluctuation. The new algorithm is capable of producing a high data rate and at the same time a better quantization parameter (QP) stability video sequence. In addition, the data rate is shaped efficiently to prevent unwanted sharp increment or decrement, and to avoid buffer overflow. To achieve the aim, SD-SVBR has three strategies, which are processing the next Group of Picture (GoP) video sequence and obtaining the QP-to-data rate list, dimensioning the data rate to a higher utilization of the leaky-bucket, and implementing a QP smoothing method by carefully measuring the effects of following the previous QP value. However, this algorithm has to be combined with a network feedback algorithm to produce a better overall video rate control. A combination of several video clips, which consisted of a varied video rate, has been used for the purpose of evaluating SD-SVBR performance. The results showed that SD-SVBR gains an impressive overall Peak Signal-to-Noise Ratio (PSNR) value. In addition, in almost all cases, it gains a high video rate but without buffer overflow, utilizes the buffer well, and interestingly, it is still able to obtain smoother QP fluctuation

    On I/O Performance and Cost Efficiency of Cloud Storage: A Client\u27s Perspective

    Get PDF
    Cloud storage has gained increasing popularity in the past few years. In cloud storage, data are stored in the service provider’s data centers; users access data via the network and pay the fees based on the service usage. For such a new storage model, our prior wisdom and optimization schemes on conventional storage may not remain valid nor applicable to the emerging cloud storage. In this dissertation, we focus on understanding and optimizing the I/O performance and cost efficiency of cloud storage from a client’s perspective. We first conduct a comprehensive study to gain insight into the I/O performance behaviors of cloud storage from the client side. Through extensive experiments, we have obtained several critical findings and useful implications for system optimization. We then design a client cache framework, called Pacaca, to further improve end-to-end performance of cloud storage. Pacaca seamlessly integrates parallelized prefetching and cost-aware caching by utilizing the parallelism potential and object correlations of cloud storage. In addition to improving system performance, we have also made efforts to reduce the monetary cost of using cloud storage services by proposing a latency- and cost-aware client caching scheme, called GDS-LC, which can achieve two optimization goals for using cloud storage services: low access latency and low monetary cost. Our experimental results show that our proposed client-side solutions significantly outperform traditional methods. Our study contributes to inspiring the community to reconsider system optimization methods in the cloud environment, especially for the purpose of integrating cloud storage into the current storage stack as a primary storage layer

    Data Centric Cache Measurement Using Hardware and Software Instrumentation

    Get PDF
    The speed at which microprocessors can perform computations is increasing faster than the speed of access to main memory, making efficient use of memory caches ever more important. Because of this, information about the cache behavior of applications is valuable for performance tuning. To be most useful to a programmer, this information should be presented in a way that relates it to data structures at the source code level; we will refer to this as data centric cache information. This disser-tation examines the problem of how to collect such information. We describe tech-niques for accomplishing this using hardware performance monitors and software in-strumentation. We discuss both performance monitoring features that are present in existing processors and a proposed feature for future designs. The first technique we describe uses sampling of cache miss addresses, relat-ing them to data structures. We present the results of experiments using an imple-mentation of this technique inside a simulator, which show that it can collect the de-sired information accurately and with low overhead. We then discuss a tool called Cache Scope that implements this on actual hardware, the Intel Itanium 2 processor. Experiments with this tool validate that perturbation and overhead can be kept low in a real-world setting. We present examples of tuning the performance of two applica-tions based on data from this tool. By changing only the layout of data structures, we achieved approximately 24% and 19% reductions in running time. We also describe a technique that uses a proposed hardware feature that pro-vides information about cache evictions to sample eviction addresses. We present results from an implementation of this technique inside a simulator, showing that even though this requires storing considerably more data than sampling cache misses, we are still able to collect information accurate enough to be useful while keeping overhead low. We discuss an example of performance tuning in which we were able to reduce the running time of an application by 8% using information gained from this tool

    Optimal caching of large multi-dimensional datasets

    Get PDF
    We propose a novel organization for multi-dimensional data based on the conceptof macro-voxels. This organization improves computer performance by enhancingspatial and temporal locality. Caching of macro-voxels not only reduces therequired storage space but also leads to an efficient organization of the dataset resulting in faster data access. We have developed a macro-voxel caching theory that predicts the optimal macro-voxel sizes required for minimum cache size and access time. The model also identifies a region of trade-off between time and storage, which can be exploited in making an efficient choice of macro-voxel size for this scheme. Based on the macro-voxel caching model, we have implemented a macro-voxel I/O layer in C, intended to be used as an interface between applications and datasets. It is capable of both scattered access, typical in online applications, and row/column access, typical in batched applications. We integrated this I/O layer in the ALIGN program (online application) which aligns images based on 3D distance maps; this improved access time by a factor of 3 when accessing local disks and a factor of 20 for remote disks. We also applied the macro-voxel caching scheme on SPEC.s Seismic (batched application) benchmark datasets which improved the read process by a factor of 8.Ph.D., Electrical and Computer Engineering -- Drexel University, 200

    Towards Scalable Network Traffic Measurement With Sketches

    Get PDF
    Driven by the ever-increasing data volume through the Internet, the per-port speed of network devices reached 400 Gbps, and high-end switches are capable of processing 25.6 Tbps of network traffic. To improve the efficiency and security of the network, network traffic measurement becomes more important than ever. For fast and accurate traffic measurement, managing an accurate working set of active flows (WSAF) at line rates is a key challenge. WSAF is usually located in high-speed but expensive memories, such as TCAM or SRAM, and thus their capacity is quite limited. To scale up the per-flow measurement, we pursue three thrusts. In the first thrust, we propose to use In-DRAM WSAF and put a compact data structure (i.e., sketch) called FlowRegulator before WSAF to compensate for DRAM\u27s slow access time. Per our results, FlowRegulator can substantially reduce massive influxes to WSAF without compromising measurement accuracy. In the second thrust, we integrate our sketch into a network system and propose an SDN-based WLAN monitoring and management framework called RFlow+, which can overcome the limitations of existing traffic measurement solutions (e.g., OpenFlow and sFlow), such as a limited view, incomplete flow statistics, and poor trade-off between measurement accuracy and CPU/network overheads. In the third thrust, we introduce a novel sampling scheme to deal with the poor trade-off that is provided by the standard simple random sampling (SRS). Even though SRS has been widely used in practice because of its simplicity, it provides non-uniform sampling rates for different flows, because it samples packets over an aggregated data flow. Starting with a simple idea that independent per-flow packet sampling provides the most accurate estimation of each flow, we introduce a new concept of per-flow systematic sampling, aiming to provide the same sampling rate across all flows. In addition, we provide a concrete sampling method called SketchFlow, which approximates the idea of the per-flow systematic sampling using a sketch saturation event
    • …
    corecore