Search CORE

21 research outputs found

Weakly supervised segment annotation via expectation kernel density estimation

Author: Li Qingwu
Lu Jianfeng
Wang Liantao
Publication venue
Publication date: 14/12/2018
Field of study

Since the labelling for the positive images/videos is ambiguous in weakly supervised segment annotation, negative mining based methods that only use the intra-class information emerge. In these methods, negative instances are utilized to penalize unknown instances to rank their likelihood of being an object, which can be considered as a voting in terms of similarity. However, these methods 1) ignore the information contained in positive bags, 2) only rank the likelihood but cannot generate an explicit decision function. In this paper, we propose a voting scheme involving not only the definite negative instances but also the ambiguous positive instances to make use of the extra useful information in the weakly labelled positive bags. In the scheme, each instance votes for its label with a magnitude arising from the similarity, and the ambiguous positive instances are assigned soft labels that are iteratively updated during the voting. It overcomes the limitations of voting using only the negative bags. We also propose an expectation kernel density estimation (eKDE) algorithm to gain further insight into the voting mechanism. Experimental results demonstrate the superiority of our scheme beyond the baselines.Comment: 9 pages, 2 figure

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Coupled Depth Learning

Author: Baig Mohammad Haris
Torresani Lorenzo
Publication venue
Publication date: 09/02/2016
Field of study

In this paper we propose a method for estimating depth from a single image using a coarse to fine approach. We argue that modeling the fine depth details is easier after a coarse depth map has been computed. We express a global (coarse) depth map of an image as a linear combination of a depth basis learned from training examples. The depth basis captures spatial and statistical regularities and reduces the problem of global depth estimation to the task of predicting the input-specific coefficients in the linear combination. This is formulated as a regression problem from a holistic representation of the image. Crucially, the depth basis and the regression function are {\bf coupled} and jointly optimized by our learning scheme. We demonstrate that this results in a significant improvement in accuracy compared to direct regression of depth pixel values or approaches learning the depth basis disjointly from the regression function. The global depth estimate is then used as a guidance by a local refinement method that introduces depth details that were not captured at the global level. Experiments on the NYUv2 and KITTI datasets show that our method outperforms the existing state-of-the-art at a considerably lower computational cost for both training and testing.Comment: 10 pages, 3 Figures, 4 Tables with quantitative evaluation

arXiv.org e-Print Archive

Crossref

HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods

Author: Li Yongyuan
Liang Chao
Qin Xiuyuan
Wei Mingqiang
Publication venue
Publication date: 14/09/2023
Field of study

Talking Face Generation (TFG) aims to reconstruct facial movements to achieve high natural lip movements from audio and facial features that are under potential connections. Existing TFG methods have made significant advancements to produce natural and realistic images. However, most work rarely takes visual quality into consideration. It is challenging to ensure lip synchronization while avoiding visual quality degradation in cross-modal generation methods. To address this issue, we propose a universal High-Definition Teeth Restoration Network, dubbed HDTR-Net, for arbitrary TFG methods. HDTR-Net can enhance teeth regions at an extremely fast speed while maintaining synchronization, and temporal consistency. In particular, we propose a Fine-Grained Feature Fusion (FGFF) module to effectively capture fine texture feature information around teeth and surrounding regions, and use these features to fine-grain the feature map to enhance the clarity of teeth. Extensive experiments show that our method can be adapted to arbitrary TFG methods without suffering from lip synchronization and frame coherence. Another advantage of HDTR-Net is its real-time generation ability. Also under the condition of high-definition restoration of talking face video synthesis, its inference speed is

300\%

faster than the current state-of-the-art face restoration based on super-resolution.Comment: 15pages, 6 figures, PRCV202

arXiv.org e-Print Archive

(Hyper)-graphical models in biomedical image analysis

Author: Alchatzidis
Alchatzidis
Arora
Baudin
Baudin
Ben Glocker
Besbes
Boykov
Boykov
Boykov
Chittajallu
Enzo Ferrante
Evangelia I. Zacharaki
Fecamp
Ferrante
Ferrante
Fix
Geman
Glocker
Glocker
Glocker
Glocker
Grady
Greig
Ishikawa
Kadoury
Kadoury
Kappes
Khandelwal
Kohli
Koller
Kolmogorov
Komodakis
Komodakis
Komodakis
Komodakis
Nikos Komodakis
Nikos Paragios
Osokin
Ou
Parisot
Parisot
Pearl
Potetz
Rother
Sarah Parisot
Shekhovtsov
Sotiras
Uzunbas
Wang
Wang
Xiang
Yedidia
Zikic
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref