120 research outputs found
Poisson approximation for stochastic processes summed over amenable groups
We generalize the Poisson limit theorem to binary functions of random objects
whose law is invariant under the action of an amenable group. Examples include
stationary random fields, exchangeable sequences, and exchangeable graphs. A
celebrated result of E. Lindenstrauss shows that normalized sums over certain
increasing subsets of such groups approximate expectations. Our results clarify
that the corresponding unnormalized sums of binary statistics are
asymptotically Poisson, provided suitable mixing conditions hold. They extend
further to randomly subsampled sums and also show that strict invariance of the
distribution is not needed if the requisite mixing condition defined by the
group holds. We illustrate the results with applications to random fields,
Cayley graphs, and Poisson processes on groups
Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency
Deep neural networks are proven to be vulnerable to backdoor attacks.
Detecting the trigger samples during the inference stage, i.e., the test-time
trigger sample detection, can prevent the backdoor from being triggered.
However, existing detection methods often require the defenders to have high
accessibility to victim models, extra clean data, or knowledge about the
appearance of backdoor triggers, limiting their practicality. In this paper, we
propose the test-time corruption robustness consistency evaluation (TeCo), a
novel test-time trigger sample detection method that only needs the hard-label
outputs of the victim models without any extra information. Our journey begins
with the intriguing observation that the backdoor-infected models have similar
performance across different image corruptions for the clean images, but
perform discrepantly for the trigger samples. Based on this phenomenon, we
design TeCo to evaluate test-time robustness consistency by calculating the
deviation of severity that leads to predictions' transition across different
corruptions. Extensive experiments demonstrate that compared with
state-of-the-art defenses, which even require either certain information about
the trigger types or accessibility of clean data, TeCo outperforms them on
different backdoor attacks, datasets, and model architectures, enjoying a
higher AUROC by 10% and 5 times of stability.Comment: Accepted by CVPR2023. Code is available at
https://github.com/CGCL-codes/TeC
Enhancing Medical Task Performance in GPT-4V: A Comprehensive Study on Prompt Engineering Strategies
OpenAI's latest large vision-language model (LVLM), GPT-4V(ision), has piqued
considerable interest for its potential in medical applications. Despite its
promise, recent studies and internal reviews highlight its underperformance in
specialized medical tasks. This paper explores the boundary of GPT-4V's
capabilities in medicine, particularly in processing complex imaging data from
endoscopies, CT scans, and MRIs etc. Leveraging open-source datasets, we
assessed its foundational competencies, identifying substantial areas for
enhancement. Our research emphasizes prompt engineering, an often-underutilized
strategy for improving AI responsiveness. Through iterative testing, we refined
the model's prompts, significantly improving its interpretative accuracy and
relevance in medical imaging. From our comprehensive evaluations, we distilled
10 effective prompt engineering techniques, each fortifying GPT-4V's medical
acumen. These methodical enhancements facilitate more reliable, precise, and
clinically valuable insights from GPT-4V, advancing its operability in critical
healthcare environments. Our findings are pivotal for those employing AI in
medicine, providing clear, actionable guidance on harnessing GPT-4V's full
diagnostic potential
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training
Large-scale models pre-trained on large-scale datasets have profoundly
advanced the development of deep learning. However, the state-of-the-art models
for medical image segmentation are still small-scale, with their parameters
only in the tens of millions. Further scaling them up to higher orders of
magnitude is rarely explored. An overarching goal of exploring large-scale
models is to train them on large-scale medical segmentation datasets for better
transfer capacities. In this work, we design a series of Scalable and
Transferable U-Net (STU-Net) models, with parameter sizes ranging from 14
million to 1.4 billion. Notably, the 1.4B STU-Net is the largest medical image
segmentation model to date. Our STU-Net is based on nnU-Net framework due to
its popularity and impressive performance. We first refine the default
convolutional blocks in nnU-Net to make them scalable. Then, we empirically
evaluate different scaling combinations of network depth and width, discovering
that it is optimal to scale model depth and width together. We train our
scalable STU-Net models on a large-scale TotalSegmentator dataset and find that
increasing model size brings a stronger performance gain. This observation
reveals that a large model is promising in medical image segmentation.
Furthermore, we evaluate the transferability of our model on 14 downstream
datasets for direct inference and 3 datasets for further fine-tuning, covering
various modalities and segmentation targets. We observe good performance of our
pre-trained model in both direct inference and fine-tuning. The code and
pre-trained models are available at https://github.com/Ziyan-Huang/STU-Net
Excited-state spectroscopy of spin defects in hexagonal boron nitride
We used optically detected magnetic resonance (ODMR) technique to directly
probe electron-spin resonance transitions in the excited state of
negatively-charged boron vacancy (VB-) defects in hexagonal boron nitride (hBN)
at room temperature. The data showed that the excited state has a zero-field
splitting of ~ 2.1 GHz, a g factor similar to the ground state and two types of
hyperfine splitting ~ 90 MHz and ~ 18.8 MHz respectively. Pulsed ODMR
experiments were conducted to further verify observed resonant peaks
corresponding to spin transitions in the excited state. In addition, negative
peaks in photoluminescence and ODMR contrast as a function of magnetic field
magnitude and angle at level anti-crossing were observed and explained by
coherent spin precession and anisotropic relaxation. This work provided
significant insights for studying the structure of VB- excited states, which
might be used for quantum information processing and nanoscale quantum sensing
XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques
Reinforcement Learning (RL) has demonstrated substantial potential across
diverse fields, yet understanding its decision-making process, especially in
real-world scenarios where rationality and safety are paramount, is an ongoing
challenge. This paper delves in to Explainable RL (XRL), a subfield of
Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our
focus rests on state-explaining techniques, a crucial subset within XRL
methods, as they reveal the underlying factors influencing an agent's actions
at any given time. Despite their significant role, the lack of a unified
evaluation framework hinders assessment of their accuracy and effectiveness. To
address this, we introduce XRL-Bench, a unified standardized benchmark tailored
for the evaluation and comparison of XRL methods, encompassing three main
modules: standard RL environments, explainers based on state importance, and
standard evaluators. XRL-Bench supports both tabular and image data for state
explanation. We also propose TabularSHAP, an innovative and competitive XRL
method. We demonstrate the practical utility of TabularSHAP in real-world
online gaming services and offer an open-source benchmark platform for the
straightforward implementation and evaluation of XRL methods. Our contributions
facilitate the continued progression of XRL technology.Comment: 10 pages, 5 figure
Evaluation and Analysis of Hallucination in Large Vision-Language Models
Large Vision-Language Models (LVLMs) have recently achieved remarkable
success. However, LVLMs are still plagued by the hallucination problem, which
limits the practicality in many scenarios. Hallucination refers to the
information of LVLMs' responses that does not exist in the visual input, which
poses potential risks of substantial consequences. There has been limited work
studying hallucination evaluation in LVLMs. In this paper, we propose
Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based
hallucination evaluation framework. HaELM achieves an approximate 95%
performance comparable to ChatGPT and has additional advantages including low
cost, reproducibility, privacy preservation and local deployment. Leveraging
the HaELM, we evaluate the hallucination in current LVLMs. Furthermore, we
analyze the factors contributing to hallucination in LVLMs and offer helpful
suggestions to mitigate the hallucination problem. Our training data and human
annotation hallucination data will be made public soon.Comment: 11 pages, 5 figure
A-Eval: A Benchmark for Cross-Dataset Evaluation of Abdominal Multi-Organ Segmentation
Although deep learning have revolutionized abdominal multi-organ
segmentation, models often struggle with generalization due to training on
small, specific datasets. With the recent emergence of large-scale datasets,
some important questions arise: \textbf{Can models trained on these datasets
generalize well on different ones? If yes/no, how to further improve their
generalizability?} To address these questions, we introduce A-Eval, a benchmark
for the cross-dataset Evaluation ('Eval') of Abdominal ('A') multi-organ
segmentation. We employ training sets from four large-scale public datasets:
FLARE22, AMOS, WORD, and TotalSegmentator, each providing extensive labels for
abdominal multi-organ segmentation. For evaluation, we incorporate the
validation sets from these datasets along with the training set from the BTCV
dataset, forming a robust benchmark comprising five distinct datasets. We
evaluate the generalizability of various models using the A-Eval benchmark,
with a focus on diverse data usage scenarios: training on individual datasets
independently, utilizing unlabeled data via pseudo-labeling, mixing different
modalities, and joint training across all available datasets. Additionally, we
explore the impact of model sizes on cross-dataset generalizability. Through
these analyses, we underline the importance of effective data usage in
enhancing models' generalization capabilities, offering valuable insights for
assembling large-scale datasets and improving training strategies. The code and
pre-trained models are available at
\href{https://github.com/uni-medical/A-Eval}{https://github.com/uni-medical/A-Eval}
CODA: A Real-World Road Corner Case Dataset for Object Detection in Autonomous Driving
Contemporary deep-learning object detection methods for autonomous driving
usually assume prefixed categories of common traffic participants, such as
pedestrians and cars. Most existing detectors are unable to detect uncommon
objects and corner cases (e.g., a dog crossing a street), which may lead to
severe accidents in some situations, making the timeline for the real-world
application of reliable autonomous driving uncertain. One main reason that
impedes the development of truly reliably self-driving systems is the lack of
public datasets for evaluating the performance of object detectors on corner
cases. Hence, we introduce a challenging dataset named CODA that exposes this
critical problem of vision-based detectors. The dataset consists of 1500
carefully selected real-world driving scenes, each containing four object-level
corner cases (on average), spanning more than 30 object categories. On CODA,
the performance of standard object detectors trained on large-scale autonomous
driving datasets significantly drops to no more than 12.8% in mAR. Moreover, we
experiment with the state-of-the-art open-world object detector and find that
it also fails to reliably identify the novel objects in CODA, suggesting that a
robust perception system for autonomous driving is probably still far from
reach. We expect our CODA dataset to facilitate further research in reliable
detection for real-world autonomous driving. Our dataset will be released at
https://coda-dataset.github.io.Comment: ECCV 202
- …