Search CORE

2,029 research outputs found

Component-based Attention for Large-scale Trademark Retrieval

Author: Denman Simon
Fookes Clinton
Mau Sandra
Sivapalan Sabesan
Sridharan Sridha
Tursun Osman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 22/10/2019
Field of study

The demand for large-scale trademark retrieval (TR) systems has significantly increased to combat the rise in international trademark infringement. Unfortunately, the ranking accuracy of current approaches using either hand-crafted or pre-trained deep convolution neural network (DCNN) features is inadequate for large-scale deployments. We show in this paper that the ranking accuracy of TR systems can be significantly improved by incorporating hard and soft attention mechanisms, which direct attention to critical information such as figurative elements and reduce attention given to distracting and uninformative elements such as text and background. Our proposed approach achieves state-of-the-art results on a challenging large-scale trademark dataset.Comment: Fix typos related to authors' informatio

arXiv.org e-Print Archive

Queensland University of Technology ePrints Archive

Deep speech inpainting of time-frequency masks

Author: Beckmann Pierre
Cernak Milos
Kegler Mikolaj
Publication venue: 'International Speech Communication Association'
Publication date: 29/08/2020
Field of study

Transient loud intrusions, often occurring in noisy environments, can completely overpower speech signal and lead to an inevitable loss of information. While existing algorithms for noise suppression can yield impressive results, their efficacy remains limited for very low signal-to-noise ratios or when parts of the signal are missing. To address these limitations, here we propose an end-to-end framework for speech inpainting, the context-based retrieval of missing or severely distorted parts of time-frequency representation of speech. The framework is based on a convolutional U-Net trained via deep feature losses, obtained using speechVGG, a deep speech feature extractor pre-trained on an auxiliary word classification task. Our evaluation results demonstrate that the proposed framework can recover large portions of missing or distorted time-frequency representation of speech, up to 400 ms and 3.2 kHz in bandwidth. In particular, our approach provided a substantial increase in STOI & PESQ objective metrics of the initially corrupted speech samples. Notably, using deep feature losses to train the framework led to the best results, as compared to conventional approaches.Comment: Accepted to InterSpeech202

arXiv.org e-Print Archive

Crossref

Where and Who? Automatic Semantic-Aware Person Composition

Author: Barnes Connelly
Bernier Crispin
Cohen Benjamin
Ordonez Vicente
Tan Fuwen
Publication venue
Publication date: 02/12/2017
Field of study

Image compositing is a method used to generate realistic yet fake imagery by inserting contents from one image to another. Previous work in compositing has focused on improving appearance compatibility of a user selected foreground segment and a background image (i.e. color and illumination consistency). In this work, we instead develop a fully automated compositing model that additionally learns to select and transform compatible foreground segments from a large collection given only an input image background. To simplify the task, we restrict our problem by focusing on human instance composition, because human segments exhibit strong correlations with their background and because of the availability of large annotated data. We develop a novel branching Convolutional Neural Network (CNN) that jointly predicts candidate person locations given a background image. We then use pre-trained deep feature representations to retrieve person instances from a large segment database. Experimental results show that our model can generate composite images that look visually convincing. We also develop a user interface to demonstrate the potential application of our method.Comment: 10 pages, 9 figure

arXiv.org e-Print Archive

Crossref

MTRNet: A Generic Scene Text Eraser

Author: Denman Simon
Fookes Clinton
Sivapalan Sabesan
Sridharan Sridha
Tursun Osman
Zeng Rui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2019
Field of study

Text removal algorithms have been proposed for uni-lingual scripts with regular shapes and layouts. However, to the best of our knowledge, a generic text removal method which is able to remove all or user-specified text regions regardless of font, script, language or shape is not available. Developing such a generic text eraser for real scenes is a challenging task, since it inherits all the challenges of multi-lingual and curved text detection and inpainting. To fill this gap, we propose a mask-based text removal network (MTRNet). MTRNet is a conditional adversarial generative network (cGAN) with an auxiliary mask. The introduced auxiliary mask not only makes the cGAN a generic text eraser, but also enables stable training and early convergence on a challenging large-scale synthetic dataset, initially proposed for text detection in real scenes. What's more, MTRNet achieves state-of-the-art results on several real-world datasets including ICDAR 2013, ICDAR 2017 MLT, and CTW1500, without being explicitly trained on this data, outperforming previous state-of-the-art methods trained directly on these datasets.Comment: Presented at ICDAR2019 Conferenc

arXiv.org e-Print Archive

Crossref

Queensland University of Technology ePrints Archive

Virtual restoration of the Ghent altarpiece using crack detection and inpainting

Author: Cornelis Bruno
Daubechies Ingrid
De Mey Marc
Dooms Ann
Gezels Emile
Martens Maximiliaan
Pizurica Aleksandra
Platisa Ljiljana
Ruzic Tijana
Schelkens Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

In this paper, we present a new method for virtual restoration of digitized paintings, with the special focus on the Ghent Altarpiece (1432), one of Belgium's greatest masterpieces. The goal of the work is to remove cracks from the digitized painting thereby approximating how the painting looked like before ageing for nearly 600 years and aiding art historical and palaeographical analysis. For crack detection, we employ a multiscale morphological approach, which can cope with greatly varying thickness of the cracks as well as with their varying intensities (from dark to the light ones). Due to the content of the painting (with extremely many fine details) and complex type of cracks (including inconsistent whitish clouds around them), the available inpainting methods do not provide satisfactory results on many parts of the painting. We show that patch-based methods outperform pixel-based ones, but leaving still much room for improvements in this application. We propose a new method for candidate patch selection, which can be combined with different patch-based inpainting methods to improve their performance in crack removal. The results demonstrate improved performance, with less artefacts and better preserved fine details

Crossref

Ghent University Academic Bibliography