393 research outputs found
Open Set Logo Detection and Retrieval
Current logo retrieval research focuses on closed set scenarios. We argue
that the logo domain is too large for this strategy and requires an open set
approach. To foster research in this direction, a large-scale logo dataset,
called Logos in the Wild, is collected and released to the public. A typical
open set logo retrieval application is, for example, assessing the
effectiveness of advertisement in sports event broadcasts. Given a query sample
in shape of a logo image, the task is to find all further occurrences of this
logo in a set of images or videos. Currently, common logo retrieval approaches
are unsuitable for this task because of their closed world assumption. Thus, an
open set logo retrieval method is proposed in this work which allows searching
for previously unseen logos by a single query sample. A two stage concept with
separate logo detection and comparison is proposed where both modules are based
on task specific CNNs. If trained with the Logos in the Wild data, significant
performance improvements are observed, especially compared with
state-of-the-art closed set approaches.Comment: accepted at VISAPP 201
On Designing Tattoo Registration and Matching Approaches in the Visible and SWIR Bands
Face, iris and fingerprint based biometric systems are well explored areas of research. However, there are law enforcement and military applications where neither of the aforementioned modalities may be available to be exploited for human identification. In such applications, soft biometrics may be the only clue available that can be used for identification or verification purposes. Tattoo is an example of such a soft biometric trait. Unlike face-based biometric systems that used in both same-spectral and cross-spectral matching scenarios, tattoo-based human identification is still a not fully explored area of research. At this point in time there are no pre-processing, feature extraction and matching algorithms using tattoo images captured at multiple bands. This thesis is focused on exploring solutions on two main challenging problems. The first one is cross-spectral tattoo matching. The proposed algorithmic approach is using as an input raw Short-Wave Infrared (SWIR) band tattoo images and matches them successfully against their visible band counterparts. The SWIR tattoo images are captured at 1100 nm, 1200 nm, 1300 nm, 1400 nm and 1500 nm. After an empirical study where multiple photometric normalization techniques were used to pre-process the original multi-band tattoo images, only one was determined to significantly improve cross spectral tattoo matching performance. The second challenging problem was to develop a fully automatic visible-based tattoo image registration system based on SIFT descriptors and the RANSAC algorithm with a homography model. The proposed automated registration approach significantly improves the operational cost of a tattoo image identification system (using large scale tattoo image datasets), where the alignment of a pair of tattoo images by system operators needs to be performed manually. At the same time, tattoo matching accuracy is also improved (before vs. after automated alignment) by 45.87% for the NIST-Tatt-C database and 12.65% for the WVU-Tatt database
TTS: Hilbert Transform-based Generative Adversarial Network for Tattoo and Scene Text Spotting
Text spotting in natural scenes is of increasing interest and significance due to its critical role in several applications, such as visual question answering, named entity recognition and event rumor detection on social media. One of the newly emerging challenging problems is Tattoo Text Spotting (TTS) in images for assisting forensic teams and for person identification. Unlike the generally simpler scene text addressed by current state-of-the-art methods, tattoo text is typically characterized by the presence of decorative backgrounds, calligraphic handwriting and several distortions due to the deformable nature of the skin. This paper describes the first approach to address TTS in a real-world application context by designing an end-to-end text spotting method employing a Hilbert transform-based Generative Adversarial Network (GAN). To reduce the complexity of the TTS task, the proposed approach first detects fine details in the image using the Hilbert transform and the Optimum Phase Congruency (OPC). To overcome the challenges of only having a relatively small number of training samples, a GAN is then used for generating suitable text samples and descriptors for text spotting (i.e. both detection and recognition). The superior performance of the proposed TTS approach, for both tattoo and general scene text, over the state-of-the-art methods is demonstrated on a new TTS-specific dataset (publicly available 1) as well as on the existing benchmark natural scene text datasets: Total-Text, CTW1500 and ICDAR 2015
COCO-O: A Benchmark for Object Detectors under Natural Distribution Shifts
Practical object detection application can lose its effectiveness on image
inputs with natural distribution shifts. This problem leads the research
community to pay more attention on the robustness of detectors under
Out-Of-Distribution (OOD) inputs. Existing works construct datasets to
benchmark the detector's OOD robustness for a specific application scenario,
e.g., Autonomous Driving. However, these datasets lack universality and are
hard to benchmark general detectors built on common tasks such as COCO. To give
a more comprehensive robustness assessment, we introduce
COCO-O(ut-of-distribution), a test dataset based on COCO with 6 types of
natural distribution shifts. COCO-O has a large distribution gap with training
data and results in a significant 55.7% relative performance drop on a Faster
R-CNN detector. We leverage COCO-O to conduct experiments on more than 100
modern object detectors to investigate if their improvements are credible or
just over-fitting to the COCO test set. Unfortunately, most classic detectors
in early years do not exhibit strong OOD generalization. We further study the
robustness effect on recent breakthroughs of detector's architecture design,
augmentation and pre-training techniques. Some empirical findings are revealed:
1) Compared with detection head or neck, backbone is the most important part
for robustness; 2) An end-to-end detection transformer design brings no
enhancement, and may even reduce robustness; 3) Large-scale foundation models
have made a great leap on robust object detection. We hope our COCO-O could
provide a rich testbed for robustness study of object detection. The dataset
will be available at
\url{https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_o}.Comment: To appear in ICCV2023,
https://github.com/alibaba/easyrobust/tree/main/benchmarks/coco_
A Mobile App for Wound Localization using Deep Learning
We present an automated wound localizer from 2D wound and ulcer images by
using deep neural network, as the first step towards building an automated and
complete wound diagnostic system. The wound localizer has been developed by
using YOLOv3 model, which is then turned into an iOS mobile application. The
developed localizer can detect the wound and its surrounding tissues and
isolate the localized wounded region from images, which would be very helpful
for future processing such as wound segmentation and classification due to the
removal of unnecessary regions from wound images. For Mobile App development
with video processing, a lighter version of YOLOv3 named tiny-YOLOv3 has been
used. The model is trained and tested on our own image dataset in collaboration
with AZH Wound and Vascular Center, Milwaukee, Wisconsin. The YOLOv3 model is
compared with SSD model, showing that YOLOv3 gives a mAP value of 93.9%, which
is much better than the SSD model (86.4%). The robustness and reliability of
these models are also tested on a publicly available dataset named Medetec and
shows a very good performance as well.Comment: 8 pages, 5 figures, 1 tabl
MOON: A Mixed Objective Optimization Network for the Recognition of Facial Attributes
Attribute recognition, particularly facial, extracts many labels for each
image. While some multi-task vision problems can be decomposed into separate
tasks and stages, e.g., training independent models for each task, for a
growing set of problems joint optimization across all tasks has been shown to
improve performance. We show that for deep convolutional neural network (DCNN)
facial attribute extraction, multi-task optimization is better. Unfortunately,
it can be difficult to apply joint optimization to DCNNs when training data is
imbalanced, and re-balancing multi-label data directly is structurally
infeasible, since adding/removing data to balance one label will change the
sampling of the other labels. This paper addresses the multi-label imbalance
problem by introducing a novel mixed objective optimization network (MOON) with
a loss function that mixes multiple task objectives with domain adaptive
re-weighting of propagated loss. Experiments demonstrate that not only does
MOON advance the state of the art in facial attribute recognition, but it also
outperforms independently trained DCNNs using the same data. When using facial
attributes for the LFW face recognition task, we show that our balanced (domain
adapted) network outperforms the unbalanced trained network.Comment: Post-print of manuscript accepted to the European Conference on
Computer Vision (ECCV) 2016
http://link.springer.com/chapter/10.1007%2F978-3-319-46454-1_
- …