693 research outputs found

    Stochastic Query Covering for Fast Approximate Document Retrieval

    Get PDF
    We design algorithms that, given a collection of documents and a distribution over user queries, return a small subset of the document collection in such a way that we can efficiently provide high-quality answers to user queries using only the selected subset. This approach has applications when space is a constraint or when the query-processing time increases significantly with the size of the collection. We study our algorithms through the lens of stochastic analysis and prove that even though they use only a small fraction of the entire collection, they can provide answers to most user queries, achieving a performance close to the optimal. To complement our theoretical findings, we experimentally show the versatility of our approach by considering two important cases in the context of Web search. In the first case, we favor the retrieval of documents that are relevant to the query, whereas in the second case we aim for document diversification. Both the theoretical and the experimental analysis provide strong evidence of the potential value of query covering in diverse application scenarios

    Evaluation of the Management of the Department of Defense\u27s Wholesale Ammunition Stockpile

    Get PDF
    This research solicited expert opinions regarding how well the Single Manager for Conventional Ammunition (SMCA) manages the DoD wholesale ammunition stockpile. Members of the Army, Navy, Air Force, and Marine Corps, as well as members of the SMCA, were surveyed twice. The first survey contained four statements, each referring to a different area of responsibility for the SMCA. These four areas of responsibility were SMCA storage of ammunition, SMCA demilitarization of ammunition, SMCA tiering program for depots, and SMCA customer support. The respondents were asked to provide positive and negative aspects for each area, as well as recommendations for improvements. The second survey sought to revalidate and summarize the expert opinions provided in the first survey regarding problem areas or areas for improvement. By evaluating how the respondents responded to each statement or question on the second survey, conclusions were drawn as to what the experts believed were the positive and negative aspects of the SMCA, as well as areas the experts believed could be improved. The study concluded that SMCA does well storing ammunition and managing demilitarization, that the tiering plan, conceptually, is a good idea and that SMCA customer satisfaction is an area that requires additional attention

    Improving Storage with Stackable Extensions

    Get PDF
    Storage is a central part of computing. Driven by exponentially increasing content generation rate and a widening performance gap between memory and secondary storage, researchers are in the perennial quest to push for further innovation. This has resulted in novel ways to “squeeze” more capacity and performance out of current and emerging storage technology. Adding intelligence and leveraging new types of storage devices has opened the door to a whole new class of optimizations to save cost, improve performance, and reduce energy consumption. In this dissertation, we first develop, analyze, and evaluate three storage exten- sions. Our first extension tracks application access patterns and writes data in the way individual applications most commonly access it to benefit from the sequential throughput of disks. Our second extension uses a lower power flash device as a cache to save energy and turn off the disk during idle periods. Our third extension is designed to leverage the characteristics of both disks and solid state devices by placing data in the most appropriate device to improve performance and save power. In developing these systems, we learned that extending the storage stack is a complex process. Implementing new ideas incurs a prolonged and cumbersome de- velopment process and requires developers to have advanced knowledge of the entire system to ensure that extensions accomplish their goal without compromising data recoverability. Futhermore, storage administrators are often reluctant to deploy specific storage extensions without understanding how they interact with other ex- tensions and if the extension ultimately achieves the intended goal. We address these challenges by using a combination of approaches. First, we simplify the stor- age extension development process with system-level infrastructure that implements core functionality commonly needed for storage extension development. Second, we develop a formal theory to assist administrators deploy storage extensions while guaranteeing that the given high level goals are satisfied. There are, however, some cases for which our theory is inconclusive. For such scenarios we present an experi- mental methodology that allows administrators to pick an extension that performs best for a given workload. Our evaluation demostrates the benefits of both the infrastructure and the formal theory

    Big Data Security (Volume 3)

    Get PDF
    After a short description of the key concepts of big data the book explores on the secrecy and security threats posed especially by cloud based data storage. It delivers conceptual frameworks and models along with case studies of recent technology

    Towards Autonomous and Efficient Machine Learning Systems

    Get PDF
    Computation-intensive machine learning (ML) applications are becoming some of the most popular workloads running atop cloud infrastructure. While training ML applications, practitioners face the challenge of tuning various system-level parameters, such as the number of training nodes, communication topology during training, instance type, and the number of serving nodes, to meet the SLO requirements for bursty workload during the inference. Similarly, efficient resource utilization is another key challenge in cloud computing. This dissertation proposes high-performing and efficient ML systems to speed up training time and inference tasks while enabling automated and robust system management.To train an ML model in a distributed fashion we focus on strategies to mitigate the resource provisioning overhead and improve the training speed without impacting the model accuracy. More specifically, a system for autonomic and adaptive scheduling is built atop serverless computing that dynamically optimizes deployment and resource scaling for ML training tasks for cost-effectiveness and fast training. Similarly, a dynamic client selection framework is developed to address the stragglers problem caused by resource heterogeneity, data quality, and data quantity in a privacy-preserving Federated Learning (FL) environment without impacting the model accuracy.For serving bursty ML workloads we focus on developing highly scalable and adaptive strategies to serve the dynamically changing workload in a cost-effective manner in an autonomic fashion. We develop a framework that optimizes batching parameters on the fly using a lightweight profiler and an analytical model. We also devise strategies for serving ML workloads of varying sizes, leading to non-deterministic service time in a cost-effective manner. More specifically, we develop an SLO-aware framework that first analyzes the request size variations and workload variation to estimate the number of serving functions and intelligently route requests to multiple serving functions. Finally, resource utilization of burstable instances is optimized to benefit the cloud provider and end-user through a careful orchestration of resources (i.e., CPU, network, and I/O) using an analytical model and lightweight profiling, while complying with a user-defined SLO