30 research outputs found

    DiSTRICT: Dialogue State Tracking with Retriever Driven In-Context Tuning

    Full text link
    Dialogue State Tracking (DST), a key component of task-oriented conversation systems, represents user intentions by determining the values of pre-defined slots in an ongoing dialogue. Existing approaches use hand-crafted templates and additional slot information to fine-tune and prompt large pre-trained language models and elicit slot values from the dialogue context. Significant manual effort and domain knowledge is required to design effective prompts, limiting the generalizability of these approaches to new domains and tasks. In this work, we propose DiSTRICT, a generalizable in-context tuning approach for DST that retrieves highly relevant training examples for a given dialogue to fine-tune the model without any hand-crafted templates. Experiments with the MultiWOZ benchmark datasets show that DiSTRICT outperforms existing approaches in various zero-shot and few-shot settings using a much smaller model, thereby providing an important advantage for real-world deployments that often have limited resource availability

    One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers

    Full text link
    Today's cloud service architectures follow a "one size fits all" deployment strategy where the same service version instantiation is provided to the end users. However, consumers are broad and different applications have different accuracy and responsiveness requirements, which as we demonstrate renders the "one size fits all" approach inefficient in practice. We use a production-grade speech recognition engine, which serves several thousands of users, and an open source computer vision based system, to explain our point. To overcome the limitations of the "one size fits all" approach, we recommend Tolerance Tiers where each MLaaS tier exposes an accuracy/responsiveness characteristic, and consumers can programmatically select a tier. We evaluate our proposal on the CPU-based automatic speech recognition (ASR) engine and cutting-edge neural networks for image classification deployed on both CPUs and GPUs. The results show that our proposed approach provides an MLaaS cloud service architecture that can be tuned by the end API user or consumer to outperform the conventional "one size fits all" approach.Comment: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS

    Exploring Optimal Compilation Unit Shapes for an Embedded Just-In-Time Compiler

    No full text
    This paper investigates strategies for finding optimal compilation unit shapes for a Java Just-in-Time (JIT) compiler. The standard compilation unit for a JIT compiler is a whole method. However, compiling all executing methods is an ill-suited solution in the embedded domain where space constraints limit the storage available to hold the compiled code. A more effective strategy for an embedded JIT compiler is to only compile the frequently executed core of the application while less important portions continue to be interpreted. We explore in this paper various code selection strategies that are targeted at minimizing compiled code size while maximizing the time spent in compiled code versus interpretation. Our evaluation shows that using methods as the only permissible compilation unit, even when using profiling support to compile only hot methods, cannot yield optimal results. We develop in this paper hybrid compilation unit strategies that select methods and either dynamic code traces or loops for compilation. The combination of methods with traces or loops significantly outperforms the solely method-based strategy by allowing for reductions in compiled code size without sacrificing execution coverage of the compiled code

    Concurrency Analysis in the Presence of Procedures Using a Data-Flow Framework

    No full text
    Abstract- Although the data-flow framework is a powerful tool to statically analyze a program, current data-flow analysis techniques have not addressed the effect of procedures on concurrency analysis. This work develops a data race detection technique using a data-flow framework that analyzes concurrent events in a program in which tasks and procedures interact. There are no restrictions placed on the interactions between procedures and tasks, and thus recursion is permitted. Solving a system of data-flow equations, the technique computes a partial execution order for regions in the program by considering the control flow within a program unit, communication between tasks, and the cdlinglcreation context of procedures and tasks. From the computed execution order, concurrent events are determined as unordered events. We show how the information about concurrent events can be used in debugging to automatically detect data races. 1

    Multiple page size modeling and optimization

    No full text
    With the growing awareness that individual hardware cores will not continue to produce the same level of performance improvement, there is a need to develop an integrated approach to performance optimization. In this paper we present a paradigm for Continuous Program Optimization (CPO), whereby automatic agents monitor and optimize application and system performance. The monitoring data is used to analyze and create models of application and system behavior. Using this analysis, we describe how CPO agents can improve the performance of both the application and the underlying system. Using the CPO paradigm, we implemented cooperating page size optimization agents that automatically optimize large page usage. An offline agent uses vertically integrated performance data to produce a page size benefit analysis for different categories of data structures within an application. We show how an online CPO agent can use the results of the predictive analysis to automatically improve application performance. We validate that the predictions made by the CPO agent reflect the actual performance gains of up to 60 % across a range of scientific applications including the SPECcpu2000 floating point benchmarks and two large high performance computing (HPC) applications. 1
    corecore