112 research outputs found
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
In clinical scenarios, multi-specialist consultation could significantly
benefit the diagnosis, especially for intricate cases. This inspires us to
explore a "multi-expert joint diagnosis" mechanism to upgrade the existing
"single expert" framework commonly seen in the current literature. To this end,
we propose METransformer, a method to realize this idea with a
transformer-based backbone. The key design of our method is the introduction of
multiple learnable "expert" tokens into both the transformer encoder and
decoder. In the encoder, each expert token interacts with both vision tokens
and other expert tokens to learn to attend different image regions for image
representation. These expert tokens are encouraged to capture complementary
information by an orthogonal loss that minimizes their overlap. In the decoder,
each attended expert token guides the cross-attention between input words and
visual tokens, thus influencing the generated report. A metrics-based expert
voting strategy is further developed to generate the final report. By the
multi-experts concept, our model enjoys the merits of an ensemble-based
approach but through a manner that is computationally more efficient and
supports more sophisticated interactions among experts. Experimental results
demonstrate the promising performance of our proposed model on two widely used
benchmarks. Last but not least, the framework-level innovation makes our work
ready to incorporate advances on existing "single-expert" models to further
improve its performance.Comment: Accepted by CVPR202
Differentially Private Bootstrap: New Privacy Analysis and Inference Strategies
Differentially private (DP) mechanisms protect individual-level information
by introducing randomness into the statistical analysis procedure. Despite the
availability of numerous DP tools, there remains a lack of general techniques
for conducting statistical inference under DP. We examine a DP bootstrap
procedure that releases multiple private bootstrap estimates to infer the
sampling distribution and construct confidence intervals (CIs). Our privacy
analysis presents new results on the privacy cost of a single DP bootstrap
estimate, applicable to any DP mechanisms, and identifies some misapplications
of the bootstrap in the existing literature. Using the Gaussian-DP (GDP)
framework (Dong et al.,2022), we show that the release of DP bootstrap
estimates from mechanisms satisfying -GDP
asymptotically satisfies -GDP as goes to infinity. Moreover, we use
deconvolution with the DP bootstrap estimates to accurately infer the sampling
distribution, which is novel in DP. We derive CIs from our density estimate for
tasks such as population mean estimation, logistic regression, and quantile
regression, and we compare them to existing methods using simulations and
real-world experiments on 2016 Canada Census data. Our private CIs achieve the
nominal coverage level and offer the first approach to private inference for
quantile regression
Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder
Medical Visual Question Answering (VQA) systems play a supporting role to
understand clinic-relevant information carried by medical images. The questions
to a medical image include two categories: close-end (such as Yes/No question)
and open-end. To obtain answers, the majority of the existing medical VQA
methods relies on classification approaches, while a few works attempt to use
generation approaches or a mixture of the two. The classification approaches
are relatively simple but perform poorly on long open-end questions. To bridge
this gap, in this paper, we propose a new Transformer based framework for
medical VQA (named as Q2ATransformer), which integrates the advantages of both
the classification and the generation approaches and provides a unified
treatment for the close-end and open-end questions. Specifically, we introduce
an additional Transformer decoder with a set of learnable candidate answer
embeddings to query the existence of each answer class to a given
image-question pair. Through the Transformer attention, the candidate answer
embeddings interact with the fused features of the image-question pair to make
the decision. In this way, despite being a classification-based approach, our
method provides a mechanism to interact with the answer information for
prediction like the generation-based approaches. On the other hand, by
classification, we mitigate the task difficulty by reducing the search space of
answers. Our method achieves new state-of-the-art performance on two medical
VQA benchmarks. Especially, for the open-end questions, we achieve 79.19% on
VQA-RAD and 54.85% on PathVQA, with 16.09% and 41.45% absolute improvements,
respectively
Variance Reduction on Adaptive Stochastic Mirror Descent
We study the idea of variance reduction applied to adaptive stochastic mirror
descent algorithms in nonsmooth nonconvex finite-sum optimization problems. We
propose a simple yet generalized adaptive mirror descent algorithm with
variance reduction named SVRAMD and provide its convergence analysis in
different settings. We prove that variance reduction reduces the gradient
complexity of most adaptive mirror descent algorithms and boost their
convergence. In particular, our general theory implies variance reduction can
be applied to algorithms using time-varying step sizes and self-adaptive
algorithms such as AdaGrad and RMSProp. Moreover, our convergence rates recover
the best existing rates of non-adaptive algorithms. We check the validity of
our claims using experiments in deep learning.Comment: NeurIPS 2020 OPT worksho
Online Regularization for High-Dimensional Dynamic Pricing Algorithms
We propose a novel \textit{online regularization} scheme for
revenue-maximization in high-dimensional dynamic pricing algorithms. The online
regularization scheme equips the proposed optimistic online regularized maximum
likelihood pricing (\texttt{OORMLP}) algorithm with three major advantages:
encode market noise knowledge into pricing process optimism; empower online
statistical learning with always-validity over all decision points; envelop
prediction error process with time-uniform non-asymptotic oracle inequalities.
This type of non-asymptotic inference results allows us to design safer and
more robust dynamic pricing algorithms in practice. In theory, the proposed
\texttt{OORMLP} algorithm exploits the sparsity structure of high-dimensional
models and obtains a logarithmic regret in a decision horizon. These
theoretical advances are made possible by proposing an optimistic online LASSO
procedure that resolves dynamic pricing problems at the \textit{process} level,
based on a novel use of non-asymptotic martingale concentration. In
experiments, we evaluate \texttt{OORMLP} in different synthetic pricing problem
settings and observe that \texttt{OORMLP} performs better than \texttt{RMLP}
proposed in \cite{javanmard2019dynamic}
- …