546 research outputs found
Methods for data-related problems in person re-ID
In the last years, the ever-increasing need for public security has attracted wide attention in person re-ID. State-of-the-art techniques have achieved impressive results on academic datasets, which are nearly saturated. However, when it comes to deploying a re-ID system in a practical surveillance scenario, several challenges arise. 1) Full person views are often unavailable, and missing body parts make the comparison very challenging due to significant misalignment of the views. 2) Low diversity in training data introduces bias in re-ID systems. 3) The available data might come from different modalities, e.g., text and images. This thesis proposes Partial Matching Net (PMN) that detects body joints, aligns partial views, and hallucinates the missing parts based on the information present in the frame and a learned model of a person. The aligned and reconstructed views are then combined into a joint representation and used for matching images. The thesis also investigates different types of bias that typically occur in re-ID scenarios when the similarity between two persons is due to the same pose, body part, or camera view, rather than to the ID-related cues. It proposes a general approach to mitigate these effects named Bias-Control (BC) framework with two training streams leveraging adversarial and multitask learning to reduce bias-related features. Finally, the thesis investigates a novel mechanism for matching data across visual and text modalities. It proposes a framework Text (TAVD) with two complementary modules: Text attribute feature aggregation (TA) that aggregates multiple semantic attributes in a bimodal space for globally matching text descriptions with images and Visual feature decomposition (VD) which performs feature embedding for locally matching image regions with text attributes. The results and comparison to state of the art on different benchmarks show that the proposed solutions are effective strategies for person re-ID.Open Acces
Learning to Predict Image-based Rendering Artifacts with Respect to a Hidden Reference Image
Image metrics predict the perceived per-pixel difference between a reference
image and its degraded (e. g., re-rendered) version. In several important
applications, the reference image is not available and image metrics cannot be
applied. We devise a neural network architecture and training procedure that
allows predicting the MSE, SSIM or VGG16 image difference from the distorted
image alone while the reference is not observed. This is enabled by two
insights: The first is to inject sufficiently many un-distorted natural image
patches, which can be found in arbitrary amounts and are known to have no
perceivable difference to themselves. This avoids false positives. The second
is to balance the learning, where it is carefully made sure that all image
errors are equally likely, avoiding false negatives. Surprisingly, we observe,
that the resulting no-reference metric, subjectively, can even perform better
than the reference-based one, as it had to become robust against
mis-alignments. We evaluate the effectiveness of our approach in an image-based
rendering context, both quantitatively and qualitatively. Finally, we
demonstrate two applications which reduce light field capture time and provide
guidance for interactive depth adjustment.Comment: 13 pages, 11 figure
Dual-Neighborhood Deep Fusion Network for Point Cloud Analysis
Recently, deep neural networks have made remarkable achievements in 3D point
cloud classification. However, existing classification methods are mainly
implemented on idealized point clouds and suffer heavy degradation of
per-formance on non-idealized scenarios. To handle this prob-lem, a feature
representation learning method, named Dual-Neighborhood Deep Fusion Network
(DNDFN), is proposed to serve as an improved point cloud encoder for the task
of non-idealized point cloud classification. DNDFN utilizes a trainable
neighborhood learning method called TN-Learning to capture the global key
neighborhood. Then, the global neighborhood is fused with the local
neighbor-hood to help the network achieve more powerful reasoning ability.
Besides, an Information Transfer Convolution (IT-Conv) is proposed for DNDFN to
learn the edge infor-mation between point-pairs and benefits the feature
transfer procedure. The transmission of information in IT-Conv is similar to
the propagation of information in the graph which makes DNDFN closer to the
human reasoning mode. Extensive experiments on existing benchmarks especially
non-idealized datasets verify the effectiveness of DNDFN and DNDFN achieves the
state of the arts.Comment: ICMEW202
- …