30 research outputs found
DiSTRICT: Dialogue State Tracking with Retriever Driven In-Context Tuning
Dialogue State Tracking (DST), a key component of task-oriented conversation
systems, represents user intentions by determining the values of pre-defined
slots in an ongoing dialogue. Existing approaches use hand-crafted templates
and additional slot information to fine-tune and prompt large pre-trained
language models and elicit slot values from the dialogue context. Significant
manual effort and domain knowledge is required to design effective prompts,
limiting the generalizability of these approaches to new domains and tasks. In
this work, we propose DiSTRICT, a generalizable in-context tuning approach for
DST that retrieves highly relevant training examples for a given dialogue to
fine-tune the model without any hand-crafted templates. Experiments with the
MultiWOZ benchmark datasets show that DiSTRICT outperforms existing approaches
in various zero-shot and few-shot settings using a much smaller model, thereby
providing an important advantage for real-world deployments that often have
limited resource availability
One Size Does Not Fit All: Quantifying and Exposing the Accuracy-Latency Trade-off in Machine Learning Cloud Service APIs via Tolerance Tiers
Today's cloud service architectures follow a "one size fits all" deployment
strategy where the same service version instantiation is provided to the end
users. However, consumers are broad and different applications have different
accuracy and responsiveness requirements, which as we demonstrate renders the
"one size fits all" approach inefficient in practice. We use a production-grade
speech recognition engine, which serves several thousands of users, and an open
source computer vision based system, to explain our point. To overcome the
limitations of the "one size fits all" approach, we recommend Tolerance Tiers
where each MLaaS tier exposes an accuracy/responsiveness characteristic, and
consumers can programmatically select a tier. We evaluate our proposal on the
CPU-based automatic speech recognition (ASR) engine and cutting-edge neural
networks for image classification deployed on both CPUs and GPUs. The results
show that our proposed approach provides an MLaaS cloud service architecture
that can be tuned by the end API user or consumer to outperform the
conventional "one size fits all" approach.Comment: 2019 IEEE International Symposium on Performance Analysis of Systems
and Software (ISPASS
Exploring Optimal Compilation Unit Shapes for an Embedded Just-In-Time Compiler
This paper investigates strategies for finding optimal compilation unit shapes for a Java Just-in-Time (JIT) compiler. The standard compilation unit for a JIT compiler is a whole method. However, compiling all executing methods is an ill-suited solution in the embedded domain where space constraints limit the storage available to hold the compiled code. A more effective strategy for an embedded JIT compiler is to only compile the frequently executed core of the application while less important portions continue to be interpreted. We explore in this paper various code selection strategies that are targeted at minimizing compiled code size while maximizing the time spent in compiled code versus interpretation. Our evaluation shows that using methods as the only permissible compilation unit, even when using profiling support to compile only hot methods, cannot yield optimal results. We develop in this paper hybrid compilation unit strategies that select methods and either dynamic code traces or loops for compilation. The combination of methods with traces or loops significantly outperforms the solely method-based strategy by allowing for reductions in compiled code size without sacrificing execution coverage of the compiled code
Concurrency Analysis in the Presence of Procedures Using a Data-Flow Framework
Abstract- Although the data-flow framework is a powerful tool to statically analyze a program, current data-flow analysis techniques have not addressed the effect of procedures on concurrency analysis. This work develops a data race detection technique using a data-flow framework that analyzes concurrent events in a program in which tasks and procedures interact. There are no restrictions placed on the interactions between procedures and tasks, and thus recursion is permitted. Solving a system of data-flow equations, the technique computes a partial execution order for regions in the program by considering the control flow within a program unit, communication between tasks, and the cdlinglcreation context of procedures and tasks. From the computed execution order, concurrent events are determined as unordered events. We show how the information about concurrent events can be used in debugging to automatically detect data races. 1
Multiple page size modeling and optimization
With the growing awareness that individual hardware cores will not continue to produce the same level of performance improvement, there is a need to develop an integrated approach to performance optimization. In this paper we present a paradigm for Continuous Program Optimization (CPO), whereby automatic agents monitor and optimize application and system performance. The monitoring data is used to analyze and create models of application and system behavior. Using this analysis, we describe how CPO agents can improve the performance of both the application and the underlying system. Using the CPO paradigm, we implemented cooperating page size optimization agents that automatically optimize large page usage. An offline agent uses vertically integrated performance data to produce a page size benefit analysis for different categories of data structures within an application. We show how an online CPO agent can use the results of the predictive analysis to automatically improve application performance. We validate that the predictions made by the CPO agent reflect the actual performance gains of up to 60 % across a range of scientific applications including the SPECcpu2000 floating point benchmarks and two large high performance computing (HPC) applications. 1