122 research outputs found
Partitioned Sampling of Public Opinions Based on Their Social Dynamics
Public opinion polling is usually done by random sampling from the entire
population, treating individual opinions as independent. In the real world,
individuals' opinions are often correlated, e.g., among friends in a social
network. In this paper, we explore the idea of partitioned sampling, which
partitions individuals with high opinion similarities into groups and then
samples every group separately to obtain an accurate estimate of the population
opinion. We rigorously formulate the above idea as an optimization problem. We
then show that the simple partitions which contain only one sample in each
group are always better, and reduce finding the optimal simple partition to a
well-studied Min-r-Partition problem. We adapt an approximation algorithm and a
heuristic algorithm to solve the optimization problem. Moreover, to obtain
opinion similarity efficiently, we adapt a well-known opinion evolution model
to characterize social interactions, and provide an exact computation of
opinion similarities based on the model. We use both synthetic and real-world
datasets to demonstrate that the partitioned sampling method results in
significant improvement in sampling quality and it is robust when some opinion
similarities are inaccurate or even missing
OTMatch: Improving Semi-Supervised Learning with Optimal Transport
Semi-supervised learning has made remarkable strides by effectively utilizing
a limited amount of labeled data while capitalizing on the abundant information
present in unlabeled data. However, current algorithms often prioritize
aligning image predictions with specific classes generated through
self-training techniques, thereby neglecting the inherent relationships that
exist within these classes. In this paper, we present a new approach called
OTMatch, which leverages semantic relationships among classes by employing an
optimal transport loss function. By utilizing optimal transport, our proposed
method consistently outperforms established state-of-the-art methods. Notably,
we observed a substantial improvement of a certain percentage in accuracy
compared to the current state-of-the-art method, FreeMatch. OTMatch achieves
3.18%, 3.46%, and 1.28% error rate reduction over FreeMatch on CIFAR-10 with 1
label per class, STL-10 with 4 labels per class, and ImageNet with 100 labels
per class, respectively. This demonstrates the effectiveness and superiority of
our approach in harnessing semantic relationships to enhance learning
performance in a semi-supervised setting
DiffKendall: A Novel Approach for Few-Shot Learning with Differentiable Kendall's Rank Correlation
Few-shot learning aims to adapt models trained on the base dataset to novel
tasks where the categories are not seen by the model before. This often leads
to a relatively uniform distribution of feature values across channels on novel
classes, posing challenges in determining channel importance for novel tasks.
Standard few-shot learning methods employ geometric similarity metrics such as
cosine similarity and negative Euclidean distance to gauge the semantic
relatedness between two features. However, features with high geometric
similarities may carry distinct semantics, especially in the context of
few-shot learning. In this paper, we demonstrate that the importance ranking of
feature channels is a more reliable indicator for few-shot learning than
geometric similarity metrics. We observe that replacing the geometric
similarity metric with Kendall's rank correlation only during inference is able
to improve the performance of few-shot learning across a wide range of datasets
with different domains. Furthermore, we propose a carefully designed
differentiable loss for meta-training to address the non-differentiability
issue of Kendall's rank correlation. Extensive experiments demonstrate that the
proposed rank-correlation-based approach substantially enhances few-shot
learning performance
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Few-shot learning has been studied to adapt models to tasks with very few
samples. It holds profound significance, particularly in clinical tasks, due to
the high annotation cost of medical images. Several works have explored
few-shot learning on medical images, yet they still require a large number of
medical images for pre-training models to gain domain-specific priors. Vision
foundation models recently have achieved remarkable success in natural images.
Hence, adapting rapidly advancing vision foundation models from natural images
to few-shot clinical tasks holds great promise. MedFMC has recently organized a
challenge to shed more light on this topic at NeurIPS 2023. In this work, we
present our challenge solution. We observe that a simple variant of fine-tuning
with partial freezing shows remarkable performance. Empirical evidence
demonstrates that this approach could outperform various common fine-tuning
methods under limited sample sizes. Additionally, we explore enhanced
utilization of semantic supervision to boost performance. We propose a novel
approach that contextualizes labels via large language models (LLMs). Our
findings reveal that the context generated by LLMs significantly enhances the
discrimination of semantic embeddings for similar categories, resulting in a
notable performance improvement of 3%-5% in 1-shot settings compared to
commonly employed one-hot labels and other semantic supervision methods. Our
solution secures the 1st place in the MedFMC challenge
Learning with Noisily-labeled Class-imbalanced Data
Real-world large-scale datasets are both noisily labeled and
class-imbalanced. The issues seriously hurt the generalization of trained
models. It is hence significant to address the simultaneous incorrect labeling
and class-imbalance, i.e., the problem of learning with noisy labels on
long-tailed data. Previous works develop several methods for the problem.
However, they always rely on strong assumptions that are invalid or hard to be
checked in practice. In this paper, to handle the problem and address the
limitations of prior works, we propose a representation calibration method
RCAL. Specifically, RCAL works with the representations extracted by
unsupervised contrastive learning. We assume that without incorrect labeling
and class imbalance, the representations of instances in each class conform to
a multivariate Gaussian distribution, which is much milder and easier to be
checked. Based on the assumption, we recover underlying representation
distributions from polluted ones resulting from mislabeled and class-imbalanced
data. Additional data points are then sampled from the recovered distributions
to help generalization. Moreover, during classifier training, representation
learning takes advantage of representation robustness brought by contrastive
learning, which further improves the classifier performance. Experiments on
multiple benchmarks justify our claims and confirm the superiority of the
proposed method
AutoEval-Video: An Automatic Benchmark for Assessing Large Vision Language Models in Open-Ended Video Question Answering
We propose a novel and challenging benchmark, AutoEval-Video, to
comprehensively evaluate large vision-language models in open-ended video
question answering. The comprehensiveness of AutoEval-Video is demonstrated in
two aspects: 1) AutoEval-Video constructs open-ended video-questions across 9
skill dimensions, addressing capabilities of perception, comprehension, and
generation. 2) AutoEval-Video contains newly collected videos that cover over
40 distinct themes. To efficiently evaluate responses to the open-ended
questions, we employ an LLM-based evaluation approach, but instead of merely
providing a reference answer, we annotate unique evaluation rules for every
single instance (video-question pair). To maximize the robustness of these
rules, we develop a novel adversarial annotation mechanism. By using
instance-specific rules as prompt, GPT-4, as an automatic evaluator, can
achieve a stable evaluation accuracy of around 97.0\%, comparable to the 94.9\%
- 97.5\% accuracy of a human evaluator. Furthermore, we assess the performance
of eight large vision-language models on AutoEval-Video. Among them,
GPT-4V(ision) significantly outperforms other models, achieving an accuracy of
32.2\%. However, there is still substantial room for improvement compared to
human accuracy of 72.8\%. By conducting an extensive case study, we uncover
several drawbacks of GPT-4V, such as limited temporal and dynamic
comprehension, and overly general responses. Code is available at
\href{https://github.com/Xiuyuan-Chen/AutoEval-Video}{\color{magenta}https://github.com/Xiuyuan-Chen/AutoEval-Video}
- …