157 research outputs found
Analytical description of high-aperture STED resolution with 0-2 vortex phase modulation
Stimulated emission depletion (STED) can achieve optical super-resolution,
with the optical diffraction limit broken by the suppression on the periphery
of the fluorescent focal spot. Previously, it is generally experimentally
accepted that there exists an inverse square root relationship with the STED
power and the resolution, yet without strict analytical description. In this
paper, we have analytically verified the relationship between the STED power
and the achievable resolution from vector optical theory for the widely used
0-2 vortex phase modulation. Electromagnetic fields of the focal region of
a high numerical aperture objective are calculated and approximated into
polynomials, and analytical expression of resolution as a function of the STED
intensity has been derived. As a result, the resolution can be estimated
directly from the measurement of the saturation power of the dye and the STED
power applied.Comment: (19 pages
On Computation and Application of Optimal Transport
The Optimal Transport (OT) problem naturally arises in various machine learning problems, where one needs to align data from multiple sources. For example, the training data and application scenarios oftentimes have a domain gap, e.g., the training data is annotated photos collected in the daytime, yet the application scenario is in dark hours. In this case, we need to align the two datasets, so that the annotation information can be shared across them. During my Ph.D. study, I propose scalable algorithms for efficient OT computation, and its novel applications in end-to-end learning.
For OT computation, I consider both discrete cases and continuous cases. For the discrete cases, I develop an Inexact Proximal point method for exact Optimal Transport problem (IPOT) with the proximal operator approximately evaluated at each iteration using projections to the probability simplex. The algorithm (a) converges to exact Wasserstein distance with theoretical guarantee and robust regularization parameter selection, (b) alleviates numerical stability issue, (c) has similar computational complexity to Sinkhorn, and (d) avoids the shrinking problem when apply to generative models. Furthermore, a new algorithm is proposed based on IPOT to obtain sharper Wasserstein barycenter.
For the continuous cases, I propose an implicit generative learning-based framework called SPOT (Scalable Push-forward of Optimal Transport). Specifically, we approximate the optimal transport plan by a pushforward of a reference distribution, and cast the optimal transport problem into a minimax problem. We then can solve OT problems efficiently using primal dual stochastic gradient-type algorithms.
To explore the connections between OT and end-to-end learning, I developed a differentiable top-k operator, and a differentiable permutation step.
For the top-k operation, i.e., finding the k largest or smallest elements from a collection of scores, is an important model component used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulting model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely the SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator. Specifically, our SOFT top-k operator approximates the output of the top-k operation as the solution of an Entropic Optimal Transport (EOT) problem. The gradient of the SOFT operator can then be efficiently approximated based on the optimality conditions of EOT problem. We apply the proposed operator to the k-nearest neighbors and beam search algorithms, and demonstrate improved performance.
For the differentiable permutation step, I connect optimal transport to a variant of regression problem, where the correspondence between input and output data is not available. Such shuffled data is commonly observed in many real-world problems. Taking flow cytometry as an example, the measuring instruments may not be able to maintain the correspondence between the samples and the measurements. Due to the combinatorial nature of the problem, most existing methods are only applicable when the sample size is small, and limited to linear regression models. To overcome such bottlenecks, we propose a new computational framework -- ROBOT -- for the shuffled regression problem, which is applicable to large data and complex nonlinear models. Specifically, we reformulate the regression without correspondence as a continuous optimization problem. Then by exploiting the interaction between the regression model and the data correspondence, we develop a hypergradient approach based on differentiable programming techniques. Such a hypergradient approach essentially views the data correspondence as an operator of the regression, and therefore allows us to find a better descent direction for the model parameter by differentiating through the data correspondence. ROBOT can be further extended to the inexact correspondence setting, where there may not be an exact alignment between the input and output data. Thorough numerical experiments show that ROBOT achieves better performance than existing methods in both linear and nonlinear regression tasks, including real-world applications such as flow cytometry and multi-object tracking.Ph.D
Personalized Abstractive Summarization by Tri-agent Generation Pipeline
Tailoring outputs from large language models, like ChatGPT, to implicit user
preferences remains a challenge despite their impressive generative
capabilities. In this paper, we propose a tri-agent generation pipeline
comprising a generator, an instructor, and an editor to enhance output
personalization. The generator produces an initial output, the instructor
automatically generates editing instructions based on user preferences, and the
editor refines the output to align with those preferences. The inference-only
large language model (ChatGPT) serves as both the generator and editor, with a
smaller model acting as the instructor to guide output generation. We train the
instructor using editor-steered reinforcement learning, leveraging feedback
from a large-scale editor model to optimize instruction generation.
Experimental results on two abstractive summarization datasets demonstrate the
effectiveness of our approach in generating outputs that better meet user
expectations. Code is available at
\url{https://github.com/Wendy-Xiao/chatgpt_editing_summ}Comment: Accepted at EACL 2024 Finding
QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning
This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve
TTS quality with lower supervised data requirements via Vector-Quantized
Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more
unlabeled speech audio. This framework comprises two VQ-S3R learners: first,
the principal learner aims to provide a generative Multi-Stage Multi-Codebook
(MSMC) VQ-S3R via the MSMC-VQ-GAN combined with the contrastive S3RL, while
decoding it back to the high-quality audio; then, the associate learner further
abstracts the MSMC representation into a highly-compact VQ representation
through a VQ-VAE. These two generative VQ-S3R learners provide profitable
speech representations and pre-trained models for TTS, significantly improving
synthesis quality with the lower requirement for supervised data. QS-TTS is
evaluated comprehensively under various scenarios via subjective and objective
tests in experiments. The results powerfully demonstrate the superior
performance of QS-TTS, winning the highest MOS over supervised or
semi-supervised baseline TTS approaches, especially in low-resource scenarios.
Moreover, comparing various speech representations and transfer learning
methods in TTS further validates the notable improvement of the proposed
VQ-S3RL to TTS, showing the best audio quality and intelligibility metrics. The
trend of slower decay in the synthesis quality of QS-TTS with decreasing
supervised data further highlights its lower requirements for supervised data,
indicating its great potential in low-resource scenarios
- …