62 research outputs found
UNITS: Unsupervised Intermediate Training Stage for Scene Text Detection
Recent scene text detection methods are almost based on deep learning and
data-driven. Synthetic data is commonly adopted for pre-training due to
expensive annotation cost. However, there are obvious domain discrepancies
between synthetic data and real-world data. It may lead to sub-optimal
performance to directly adopt the model initialized by synthetic data in the
fine-tuning stage. In this paper, we propose a new training paradigm for scene
text detection, which introduces an \textbf{UN}supervised \textbf{I}ntermediate
\textbf{T}raining \textbf{S}tage (UNITS) that builds a buffer path to
real-world data and can alleviate the gap between the pre-training stage and
fine-tuning stage. Three training strategies are further explored to perceive
information from real-world data in an unsupervised way. With UNITS, scene text
detectors are improved without introducing any parameters and computations
during inference. Extensive experimental results show consistent performance
improvements on three public datasets.Comment: Accepted by ICME 202
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
Despite the stunning ability to generate high-quality images by recent
text-to-image models, current approaches often struggle to effectively compose
objects with different attributes and relationships into a complex and coherent
scene. We propose T2I-CompBench, a comprehensive benchmark for open-world
compositional text-to-image generation, consisting of 6,000 compositional text
prompts from 3 categories (attribute binding, object relationships, and complex
compositions) and 6 sub-categories (color binding, shape binding, texture
binding, spatial relationships, non-spatial relationships, and complex
compositions). We further propose several evaluation metrics specifically
designed to evaluate compositional text-to-image generation. We introduce a new
approach, Generative mOdel fine-tuning with Reward-driven Sample selection
(GORS), to boost the compositional text-to-image generation abilities of
pretrained text-to-image models. Extensive experiments and evaluations are
conducted to benchmark previous methods on T2I-CompBench, and to validate the
effectiveness of our proposed evaluation metrics and GORS approach. Project
page is available at https://karine-h.github.io/T2I-CompBench/.Comment: Project page: https://karine-h.github.io/T2I-CompBench
Progressive-Hint Prompting Improves Reasoning in Large Language Models
The performance of Large Language Models (LLMs) in reasoning tasks depends
heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency
being critical methods that enhance this ability. However, these methods do not
fully exploit the answers generated by the LLM to guide subsequent responses.
This paper proposes a new prompting method, named Progressive-Hint Prompting
(PHP), that enables automatic multiple interactions between users and LLMs by
using previously generated answers as hints to progressively guide toward the
correct answers. PHP is orthogonal to CoT and self-consistency, making it easy
to combine with state-of-the-art techniques to further improve performance. We
conducted an extensive and comprehensive evaluation to demonstrate the
effectiveness of the proposed method. Our experimental results on six
benchmarks show that combining CoT and self-consistency with PHP significantly
improves accuracy while remaining highly efficient. For instance, with
text-davinci-003, we observed a 4.2% improvement on GSM8K with greedy decoding
compared to Complex CoT, and a 46.17% reduction in sample paths with
self-consistency. With GPT-4 and PHP, we achieve state-of-the-art performances
on SVAMP (91.9%), GSM8K (95.5%) and AQuA (79.9%).Comment: Tech Repor
- …