385 research outputs found

    Learning Stochastic Shortest Path with Linear Function Approximation

    Full text link
    We study the stochastic shortest path (SSP) problem in reinforcement learning with linear function approximation, where the transition kernel is represented as a linear mixture of unknown models. We call this class of SSP problems as linear mixture SSPs. We propose a novel algorithm with Hoeffding-type confidence sets for learning the linear mixture SSP, which can attain an O~(dB1.5K/cmin)\tilde{\mathcal{O}}(d B_{\star}^{1.5}\sqrt{K/c_{\min}}) regret. Here KK is the number of episodes, dd is the dimension of the feature mapping in the mixture model, BB_{\star} bounds the expected cumulative cost of the optimal policy, and cmin>0c_{\min}>0 is the lower bound of the cost function. Our algorithm also applies to the case when cmin=0c_{\min} = 0, and an O~(K2/3)\tilde{\mathcal{O}}(K^{2/3}) regret is guaranteed. To the best of our knowledge, this is the first algorithm with a sublinear regret guarantee for learning linear mixture SSP. Moreover, we design a refined Bernstein-type confidence set and propose an improved algorithm, which provably achieves an O~(dBK/cmin)\tilde{\mathcal{O}}(d B_{\star}\sqrt{K/c_{\min}}) regret. In complement to the regret upper bounds, we also prove a lower bound of Ω(dBK)\Omega(dB_{\star} \sqrt{K}). Hence, our improved algorithm matches the lower bound up to a 1/cmin1/\sqrt{c_{\min}} factor and poly-logarithmic factors, achieving a near-optimal regret guarantee.Comment: 46 pages, 1 figure. In ICML 202

    Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition

    Full text link
    In recent years, speech emotion recognition technology is of great significance in industrial applications such as call centers, social robots and health care. The combination of speech recognition and speech emotion recognition can improve the feedback efficiency and the quality of service. Thus, the speech emotion recognition has been attracted much attention in both industry and academic. Since emotions existing in an entire utterance may have varied probabilities, speech emotion is likely to be ambiguous, which poses great challenges to recognition tasks. However, previous studies commonly assigned a single-label or multi-label to each utterance in certain. Therefore, their algorithms result in low accuracies because of the inappropriate representation. Inspired by the optimally interacting theory, we address the ambiguous speech emotions by proposing a novel multi-classifier interactive learning (MCIL) method. In MCIL, multiple different classifiers first mimic several individuals, who have inconsistent cognitions of ambiguous emotions, and construct new ambiguous labels (the emotion probability distribution). Then, they are retrained with the new labels to interact with their cognitions. This procedure enables each classifier to learn better representations of ambiguous data from others, and further improves the recognition ability. The experiments on three benchmark corpora (MAS, IEMOCAP, and FAU-AIBO) demonstrate that MCIL does not only improve each classifier's performance, but also raises their recognition consistency from moderate to substantial.Comment: 10 pages, 4 figure

    Delving into Out-of-Distribution Detection with Vision-Language Representations

    Full text link
    Recognizing out-of-distribution (OOD) samples is critical for machine learning systems deployed in the open world. The vast majority of OOD detection methods are driven by a single modality (e.g., either vision or language), leaving the rich information in multi-modal representations untapped. Inspired by the recent success of vision-language pre-training, this paper enriches the landscape of OOD detection from a single-modal to a multi-modal regime. Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective zero-shot OOD detection method based on aligning visual features with textual concepts. We contribute in-depth analysis and theoretical insights to understand the effectiveness of MCM. Extensive experiments demonstrate that MCM achieves superior performance on a wide variety of real-world tasks. MCM with vision-language features outperforms a common baseline with pure visual features on a hard OOD task with semantically similar classes by 13.1% (AUROC). Code is available at https://github.com/deeplearning-wisc/MCM.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS 2022

    Four-dimensional Cone Beam CT Reconstruction and Enhancement using a Temporal Non-Local Means Method

    Full text link
    Four-dimensional Cone Beam Computed Tomography (4D-CBCT) has been developed to provide respiratory phase resolved volumetric imaging in image guided radiation therapy (IGRT). Inadequate number of projections in each phase bin results in low quality 4D-CBCT images with obvious streaking artifacts. In this work, we propose two novel 4D-CBCT algorithms: an iterative reconstruction algorithm and an enhancement algorithm, utilizing a temporal nonlocal means (TNLM) method. We define a TNLM energy term for a given set of 4D-CBCT images. Minimization of this term favors those 4D-CBCT images such that any anatomical features at one spatial point at one phase can be found in a nearby spatial point at neighboring phases. 4D-CBCT reconstruction is achieved by minimizing a total energy containing a data fidelity term and the TNLM energy term. As for the image enhancement, 4D-CBCT images generated by the FDK algorithm are enhanced by minimizing the TNLM function while keeping the enhanced images close to the FDK results. A forward-backward splitting algorithm and a Gauss-Jacobi iteration method are employed to solve the problems. The algorithms are implemented on GPU to achieve a high computational efficiency. The reconstruction algorithm and the enhancement algorithm generate visually similar 4D-CBCT images, both better than the FDK results. Quantitative evaluations indicate that, compared with the FDK results, our reconstruction method improves contrast-to-noise-ratio (CNR) by a factor of 2.56~3.13 and our enhancement method increases the CNR by 2.75~3.33 times. The enhancement method also removes over 80% of the streak artifacts from the FDK results. The total computation time is ~460 sec for the reconstruction algorithm and ~610 sec for the enhancement algorithm on an NVIDIA Tesla C1060 GPU card.Comment: 20 pages, 3 figures, 2 table

    GPU-based Fast Low-dose Cone Beam CT Reconstruction via Total Variation

    Full text link
    Cone-beam CT (CBCT) has been widely used in image guided radiation therapy (IGRT) to acquire updated volumetric anatomical information before treatment fractions for accurate patient alignment purpose. However, the excessive x-ray imaging dose from serial CBCT scans raises a clinical concern in most IGRT procedures. The excessive imaging dose can be effectively reduced by reducing the number of x-ray projections and/or lowering mAs levels in a CBCT scan. The goal of this work is to develop a fast GPU-based algorithm to reconstruct high quality CBCT images from undersampled and noisy projection data so as to lower the imaging dose. The CBCT is reconstructed by minimizing an energy functional consisting of a data fidelity term and a total variation regularization term. We developed a GPU-friendly version of the forward-backward splitting algorithm to solve this model. A multi-grid technique is also employed. We test our CBCT reconstruction algorithm on a digital NCAT phantom and a head-and-neck patient case. The performance under low mAs is also validated using a physical Catphan phantom and a head-and-neck Rando phantom. It is found that 40 x-ray projections are sufficient to reconstruct CBCT images with satisfactory quality for IGRT patient alignment purpose. Phantom experiments indicated that CBCT images can be successfully reconstructed with our algorithm under as low as 0.1 mAs/projection level. Comparing with currently widely used full-fan head-and-neck scanning protocol of about 360 projections with 0.4 mAs/projection, it is estimated that an overall 36 times dose reduction has been achieved with our algorithm. Moreover, the reconstruction time is about 130 sec on an NVIDIA Tesla C1060 GPU card, which is estimated ~100 times faster than similar iterative reconstruction approaches.Comment: 20 pages, 10 figures, Paper was revised and more testing cases were added
    corecore