17 research outputs found

    Automatic Online Evaluation of Intelligent Assistants

    Full text link
    Voice-activated intelligent assistants, such as Siri, Google Now, and Cortana, are prevalent on mobile devices. However, it is chal-lenging to evaluate them due to the varied and evolving number of tasks supported, e.g., voice command, web search, and chat. Since each task may have its own procedure and a unique form of correct answers, it is expensive to evaluate each task individually. This pa-per is the first attempt to solve this challenge. We develop con-sistent and automatic approaches that can evaluate different tasks in voice-activated intelligent assistants. We use implicit feedback from users to predict whether users are satisfied with the intelligent assistant as well as its components, i.e., speech recognition and in-tent classification. Using this approach, we can potentially evaluate and compare different tasks within and across intelligent assistants according to the predicted user satisfaction rates. Our approach is characterized by an automatic scheme of categorizing user-system interaction into task-independent dialog actions, e.g., the user is commanding, selecting, or confirming an action. We use the action sequence in a session to predict user satisfaction and the quality of speech recognition and intent classification. We also incorporate other features to further improve our approach, including features derived from previous work on web search satisfaction prediction, and those utilizing acoustic characteristics of voice requests. We evaluate our approach using data collected from a user study. Re-sults show our approach can accurately identify satisfactory and un-satisfactory sessions

    Automatic Online Evaluation of Intelligent Assistants

    Get PDF
    ABSTRACT Voice-activated intelligent assistants, such as Siri, Google Now, and Cortana, are prevalent on mobile devices. However, it is challenging to evaluate them due to the varied and evolving number of tasks supported, e.g., voice command, web search, and chat. Since each task may have its own procedure and a unique form of correct answers, it is expensive to evaluate each task individually. This paper is the first attempt to solve this challenge. We develop consistent and automatic approaches that can evaluate different tasks in voice-activated intelligent assistants. We use implicit feedback from users to predict whether users are satisfied with the intelligent assistant as well as its components, i.e., speech recognition and intent classification. Using this approach, we can potentially evaluate and compare different tasks within and across intelligent assistants according to the predicted user satisfaction rates. Our approach is characterized by an automatic scheme of categorizing user-system interaction into task-independent dialog actions, e.g., the user is commanding, selecting, or confirming an action. We use the action sequence in a session to predict user satisfaction and the quality of speech recognition and intent classification. We also incorporate other features to further improve our approach, including features derived from previous work on web search satisfaction prediction, and those utilizing acoustic characteristics of voice requests. We evaluate our approach using data collected from a user study. Results show our approach can accurately identify satisfactory and unsatisfactory sessions

    SUPERVISED NEURAL NETWORK TRAINING USING THE MINIMUM ERROR ENTROPY CRITERION WITH VARIABLE-SIZE AND FINITE-SUPPORT KERNEL ESTIMATES

    No full text
    Abstract. The insufficiency of mere second-order statistics in many application areas have been discovered and more advanced concepts including higher-order statistics, especially those stemming from information theory like error entropy minimization are now being studied and applied in many contexts by researchers in machine learning and signal processing. The main drawback of using minimization of output error entropy for adaptive system training is the computational load when fixed-size kernel estimates are employed. Entropy estimators based on sample spacing, on the other hand, have lower computational cost, however they are not differentiable, which makes them unsuitable for adaptive learning. In this paper, a nonparametric entropy estimator that blends the desirable properties of both techniques in a variable-size finite-support kernel estimation methodology. This yields an estimator suitable for adaptation, yet has computational complexity similar to sample spacing techniques. The estimator is illustrated in supervised adaptive system training using the minimum error entropy criterion. I. INTODUCTION Since the earlier work of Wiener on adaptive filtering mean square error (MSE) has been used as a widely accepted criterion for adaptive system training Although Gaussianity assumption has proven to provide successful solutions for many practical problems, it is evident that this approach needs to be refined while dealing with non-linear systems. Moreover, the insufficiency of mere second-order statistics in many application areas have been discovered and more advanced concepts including higher-order statistics, especially those stemming from information theory are now being studied and applied in many contexts in machine learning and signal processing Entropy is introduced by Shannon as a measure of the average information in a given probability distribution function Since analytical data distributions are not available in many practical situations, in the plug-in approach to nonparametric entropy estimation In this paper we propose a continuously differentiable entropy estimation technique based on a variable-size finite-support kernel entropy estimator tha

    MEAN SHIFT SPECTRAL CLUSTERING FOR PERCEPTUAL IMAGE SEGMENTATION

    No full text
    ABSTRACT Segmentation is a fundamental problem in image processing having a wide range of applications. Image segmentation algorithms in the literature range from a cost criterion based optimization techniques to various heuristic methods. In this paper, we propose utilizing mean shift spectral clustering for perceptually better image segmentation results

    Spectral Feature Projections That Maximize Shannon Mutual Information with

    No full text
    Abstract Determining optimal subspace projections that can maintain task-relevant information in the data is an important problem in machine learning and pattern recognition. In this paper, we propose a nonparametric nonlinear subspace projection technique that maintains class separability maximally under the Shannon mutual information (MI) criterion. Employing kernel density estimates for nonparametric estimation of MI makes possible an interesting marriage of kernel density estimation-based information theoretic methods and kernel machines, which have the ability to determine nonparametric nonlinear solutions for difficult problems in machine learning. Significant computational savings are achieved by translating the definition of the desired projection into the kernel-induced feature space, which leads to obtain analytical solution

    Evaluating New Search Engine Configurations with Pre-existing Judgments and Clicks

    No full text
    We provide a novel method of evaluating search results, which allows usto combine existingeditorial judgments with the relevance estimates generated by click-based user browsing models. There are evaluation methods in the literature that use clicks and editorial judgments together, but our approach is novel in the sense that it allows us to predict the impact of unseen search models without online tests to collect clicks and without requesting new editorial data, since we are only re-using existing editorial data, and clicks observed for previous result set configurations. Since the user browsing model and the pre-existing editorial data cannot providerelevanceestimates foralldocumentsfor theselected set of queries, one important challenge is to obtain this performance estimation where there are a lot of ranked documents with missing relevance values. We introduce a query and rank based smoothing to overcome this problem. We show that a hybrid of these smoothing techniques performs better than both query and position based smoothing, and despite the high percentage of missing judgments, the resulting method is significantly correlated (0.74) with DCG values evaluatedusing fully judged datasets, and approaches inter-annotator agreement. We show that previously published techniques, applicable to frequent queries, degrade when applied to a random sample of queries, with a correlation of only 0.29. While our experiments focus on evaluation using DCG, our method is also applicable to other commonly used metrics
    corecore