385 research outputs found
Learning Stochastic Shortest Path with Linear Function Approximation
We study the stochastic shortest path (SSP) problem in reinforcement learning
with linear function approximation, where the transition kernel is represented
as a linear mixture of unknown models. We call this class of SSP problems as
linear mixture SSPs. We propose a novel algorithm with Hoeffding-type
confidence sets for learning the linear mixture SSP, which can attain an
regret. Here is
the number of episodes, is the dimension of the feature mapping in the
mixture model, bounds the expected cumulative cost of the optimal
policy, and is the lower bound of the cost function. Our algorithm
also applies to the case when , and an
regret is guaranteed. To the best of our
knowledge, this is the first algorithm with a sublinear regret guarantee for
learning linear mixture SSP. Moreover, we design a refined Bernstein-type
confidence set and propose an improved algorithm, which provably achieves an
regret. In complement to
the regret upper bounds, we also prove a lower bound of . Hence, our improved algorithm matches the lower bound up to a
factor and poly-logarithmic factors, achieving a
near-optimal regret guarantee.Comment: 46 pages, 1 figure. In ICML 202
Multi-Classifier Interactive Learning for Ambiguous Speech Emotion Recognition
In recent years, speech emotion recognition technology is of great
significance in industrial applications such as call centers, social robots and
health care. The combination of speech recognition and speech emotion
recognition can improve the feedback efficiency and the quality of service.
Thus, the speech emotion recognition has been attracted much attention in both
industry and academic. Since emotions existing in an entire utterance may have
varied probabilities, speech emotion is likely to be ambiguous, which poses
great challenges to recognition tasks. However, previous studies commonly
assigned a single-label or multi-label to each utterance in certain. Therefore,
their algorithms result in low accuracies because of the inappropriate
representation. Inspired by the optimally interacting theory, we address the
ambiguous speech emotions by proposing a novel multi-classifier interactive
learning (MCIL) method. In MCIL, multiple different classifiers first mimic
several individuals, who have inconsistent cognitions of ambiguous emotions,
and construct new ambiguous labels (the emotion probability distribution).
Then, they are retrained with the new labels to interact with their cognitions.
This procedure enables each classifier to learn better representations of
ambiguous data from others, and further improves the recognition ability. The
experiments on three benchmark corpora (MAS, IEMOCAP, and FAU-AIBO) demonstrate
that MCIL does not only improve each classifier's performance, but also raises
their recognition consistency from moderate to substantial.Comment: 10 pages, 4 figure
Delving into Out-of-Distribution Detection with Vision-Language Representations
Recognizing out-of-distribution (OOD) samples is critical for machine
learning systems deployed in the open world. The vast majority of OOD detection
methods are driven by a single modality (e.g., either vision or language),
leaving the rich information in multi-modal representations untapped. Inspired
by the recent success of vision-language pre-training, this paper enriches the
landscape of OOD detection from a single-modal to a multi-modal regime.
Particularly, we propose Maximum Concept Matching (MCM), a simple yet effective
zero-shot OOD detection method based on aligning visual features with textual
concepts. We contribute in-depth analysis and theoretical insights to
understand the effectiveness of MCM. Extensive experiments demonstrate that MCM
achieves superior performance on a wide variety of real-world tasks. MCM with
vision-language features outperforms a common baseline with pure visual
features on a hard OOD task with semantically similar classes by 13.1% (AUROC).
Code is available at https://github.com/deeplearning-wisc/MCM.Comment: 36th Conference on Neural Information Processing Systems (NeurIPS
2022
Four-dimensional Cone Beam CT Reconstruction and Enhancement using a Temporal Non-Local Means Method
Four-dimensional Cone Beam Computed Tomography (4D-CBCT) has been developed
to provide respiratory phase resolved volumetric imaging in image guided
radiation therapy (IGRT). Inadequate number of projections in each phase bin
results in low quality 4D-CBCT images with obvious streaking artifacts. In this
work, we propose two novel 4D-CBCT algorithms: an iterative reconstruction
algorithm and an enhancement algorithm, utilizing a temporal nonlocal means
(TNLM) method. We define a TNLM energy term for a given set of 4D-CBCT images.
Minimization of this term favors those 4D-CBCT images such that any anatomical
features at one spatial point at one phase can be found in a nearby spatial
point at neighboring phases. 4D-CBCT reconstruction is achieved by minimizing a
total energy containing a data fidelity term and the TNLM energy term. As for
the image enhancement, 4D-CBCT images generated by the FDK algorithm are
enhanced by minimizing the TNLM function while keeping the enhanced images
close to the FDK results. A forward-backward splitting algorithm and a
Gauss-Jacobi iteration method are employed to solve the problems. The
algorithms are implemented on GPU to achieve a high computational efficiency.
The reconstruction algorithm and the enhancement algorithm generate visually
similar 4D-CBCT images, both better than the FDK results. Quantitative
evaluations indicate that, compared with the FDK results, our reconstruction
method improves contrast-to-noise-ratio (CNR) by a factor of 2.56~3.13 and our
enhancement method increases the CNR by 2.75~3.33 times. The enhancement method
also removes over 80% of the streak artifacts from the FDK results. The total
computation time is ~460 sec for the reconstruction algorithm and ~610 sec for
the enhancement algorithm on an NVIDIA Tesla C1060 GPU card.Comment: 20 pages, 3 figures, 2 table
GPU-based Fast Low-dose Cone Beam CT Reconstruction via Total Variation
Cone-beam CT (CBCT) has been widely used in image guided radiation therapy
(IGRT) to acquire updated volumetric anatomical information before treatment
fractions for accurate patient alignment purpose. However, the excessive x-ray
imaging dose from serial CBCT scans raises a clinical concern in most IGRT
procedures. The excessive imaging dose can be effectively reduced by reducing
the number of x-ray projections and/or lowering mAs levels in a CBCT scan. The
goal of this work is to develop a fast GPU-based algorithm to reconstruct high
quality CBCT images from undersampled and noisy projection data so as to lower
the imaging dose. The CBCT is reconstructed by minimizing an energy functional
consisting of a data fidelity term and a total variation regularization term.
We developed a GPU-friendly version of the forward-backward splitting algorithm
to solve this model. A multi-grid technique is also employed. We test our CBCT
reconstruction algorithm on a digital NCAT phantom and a head-and-neck patient
case. The performance under low mAs is also validated using a physical Catphan
phantom and a head-and-neck Rando phantom. It is found that 40 x-ray
projections are sufficient to reconstruct CBCT images with satisfactory quality
for IGRT patient alignment purpose. Phantom experiments indicated that CBCT
images can be successfully reconstructed with our algorithm under as low as 0.1
mAs/projection level. Comparing with currently widely used full-fan
head-and-neck scanning protocol of about 360 projections with 0.4
mAs/projection, it is estimated that an overall 36 times dose reduction has
been achieved with our algorithm. Moreover, the reconstruction time is about
130 sec on an NVIDIA Tesla C1060 GPU card, which is estimated ~100 times faster
than similar iterative reconstruction approaches.Comment: 20 pages, 10 figures, Paper was revised and more testing cases were
added
- …