3,331 research outputs found
L-CAD: Language-based Colorization with Any-level Descriptions using Diffusion Priors
Language-based colorization produces plausible and visually pleasing colors
under the guidance of user-friendly natural language descriptions. Previous
methods implicitly assume that users provide comprehensive color descriptions
for most of the objects in the image, which leads to suboptimal performance. In
this paper, we propose a unified model to perform language-based colorization
with any-level descriptions. We leverage the pretrained cross-modality
generative model for its robust language understanding and rich color priors to
handle the inherent ambiguity of any-level descriptions. We further design
modules to align with input conditions to preserve local spatial structures and
prevent the ghosting effect. With the proposed novel sampling strategy, our
model achieves instance-aware colorization in diverse and complex scenarios.
Extensive experimental results demonstrate our advantages of effectively
handling any-level descriptions and outperforming both language-based and
automatic colorization methods. The code and pretrained models are available
at: https://github.com/changzheng123/L-CAD
Towards Robust Neural Image Compression: Adversarial Attack and Model Finetuning
Deep neural network based image compression has been extensively studied.
Model robustness is largely overlooked, though it is crucial to service
enabling. We perform the adversarial attack by injecting a small amount of
noise perturbation to original source images, and then encode these adversarial
examples using prevailing learnt image compression models. Experiments report
severe distortion in the reconstruction of adversarial examples, revealing the
general vulnerability of existing methods, regardless of the settings used in
underlying compression model (e.g., network architecture, loss function,
quality scale) and optimization strategy used for injecting perturbation (e.g.,
noise threshold, signal distance measurement). Later, we apply the iterative
adversarial finetuning to refine pretrained models. In each iteration, random
source images and adversarial examples are mixed to update underlying model.
Results show the effectiveness of the proposed finetuning strategy by
substantially improving the compression model robustness. Overall, our
methodology is simple, effective, and generalizable, making it attractive for
developing robust learnt image compression solution. All materials have been
made publicly accessible at https://njuvision.github.io/RobustNIC for
reproducible research.Comment: This paper has been completely rewritte
Digital Image Access & Retrieval
The 33th Annual Clinic on Library Applications of Data Processing, held at the University of Illinois at Urbana-Champaign in March of 1996, addressed the theme of "Digital Image Access & Retrieval." The papers from this conference cover a wide range of topics concerning digital imaging technology for visual resource collections. Papers covered three general areas: (1) systems, planning, and implementation; (2) automatic and semi-automatic indexing; and (3) preservation with the bulk of the conference focusing on indexing and retrieval.published or submitted for publicatio
Curved Gabor Filters for Fingerprint Image Enhancement
Gabor filters play an important role in many application areas for the
enhancement of various types of images and the extraction of Gabor features.
For the purpose of enhancing curved structures in noisy images, we introduce
curved Gabor filters which locally adapt their shape to the direction of flow.
These curved Gabor filters enable the choice of filter parameters which
increase the smoothing power without creating artifacts in the enhanced image.
In this paper, curved Gabor filters are applied to the curved ridge and valley
structure of low-quality fingerprint images. First, we combine two orientation
field estimation methods in order to obtain a more robust estimation for very
noisy images. Next, curved regions are constructed by following the respective
local orientation and they are used for estimating the local ridge frequency.
Lastly, curved Gabor filters are defined based on curved regions and they are
applied for the enhancement of low-quality fingerprint images. Experimental
results on the FVC2004 databases show improvements of this approach in
comparison to state-of-the-art enhancement methods
Recent Developments in Video Surveillance
With surveillance cameras installed everywhere and continuously streaming thousands of hours of video, how can that huge amount of data be analyzed or even be useful? Is it possible to search those countless hours of videos for subjects or events of interest? Shouldn’t the presence of a car stopped at a railroad crossing trigger an alarm system to prevent a potential accident? In the chapters selected for this book, experts in video surveillance provide answers to these questions and other interesting problems, skillfully blending research experience with practical real life applications. Academic researchers will find a reliable compilation of relevant literature in addition to pointers to current advances in the field. Industry practitioners will find useful hints about state-of-the-art applications. The book also provides directions for open problems where further advances can be pursued
ImageNet Large Scale Visual Recognition Challenge
The ImageNet Large Scale Visual Recognition Challenge is a benchmark in
object category classification and detection on hundreds of object categories
and millions of images. The challenge has been run annually from 2010 to
present, attracting participation from more than fifty institutions.
This paper describes the creation of this benchmark dataset and the advances
in object recognition that have been possible as a result. We discuss the
challenges of collecting large-scale ground truth annotation, highlight key
breakthroughs in categorical object recognition, provide a detailed analysis of
the current state of the field of large-scale image classification and object
detection, and compare the state-of-the-art computer vision accuracy with human
accuracy. We conclude with lessons learned in the five years of the challenge,
and propose future directions and improvements.Comment: 43 pages, 16 figures. v3 includes additional comparisons with PASCAL
VOC (per-category comparisons in Table 3, distribution of localization
difficulty in Fig 16), a list of queries used for obtaining object detection
images (Appendix C), and some additional reference
- …