2,407 research outputs found
WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning
Watermarking serves as a widely adopted approach to safeguard media
copyright. In parallel, the research focus has extended to watermark removal
techniques, offering an adversarial means to enhance watermark robustness and
foster advancements in the watermarking field. Existing watermark removal
methods mainly rely on UNet with task-specific decoder branches--one for
watermark localization and the other for background image restoration. However,
watermark localization and background restoration are not isolated tasks;
precise watermark localization inherently implies regions necessitating
restoration, and the background restoration process contributes to more
accurate watermark localization. To holistically integrate information from
both branches, we introduce an implicit joint learning paradigm. This empowers
the network to autonomously navigate the flow of information between implicit
branches through a gate mechanism. Furthermore, we employ cross-channel
attention to facilitate local detail restoration and holistic structural
comprehension, while harnessing nested structures to integrate multi-scale
information. Extensive experiments are conducted on various challenging
benchmarks to validate the effectiveness of our proposed method. The results
demonstrate our approach's remarkable superiority, surpassing existing
state-of-the-art methods by a large margin
Networks are Slacking Off: Understanding Generalization Problem in Image Deraining
Deep deraining networks, while successful in laboratory benchmarks,
consistently encounter substantial generalization issues when deployed in
real-world applications. A prevailing perspective in deep learning encourages
the use of highly complex training data, with the expectation that a richer
image content knowledge will facilitate overcoming the generalization problem.
However, through comprehensive and systematic experimentation, we discovered
that this strategy does not enhance the generalization capability of these
networks. On the contrary, it exacerbates the tendency of networks to overfit
to specific degradations. Our experiments reveal that better generalization in
a deraining network can be achieved by simplifying the complexity of the
training data. This is due to the networks are slacking off during training,
that is, learning the least complex elements in the image content and
degradation to minimize training loss. When the complexity of the background
image is less than that of the rain streaks, the network will prioritize the
reconstruction of the background, thereby avoiding overfitting to the rain
patterns and resulting in improved generalization performance. Our research not
only offers a valuable perspective and methodology for better understanding the
generalization problem in low-level vision tasks, but also displays promising
practical potential
An aesthetics of touch: investigating the language of design relating to form
How well can designers communicate qualities of touch?
This paper presents evidence that they have some capability to do so, much of which appears to have been learned, but at present make limited use of such language. Interviews with graduate designer-makers suggest that they are aware of and value the importance of touch and materiality in their work, but lack a vocabulary to fully relate to their detailed explanations of other aspects such as their intent or selection of materials. We believe that more attention should be paid to the verbal dialogue that happens in the design process, particularly as other researchers show that even making-based learning also has a strong verbal element to it. However, verbal language alone does not appear to be adequate for a comprehensive language of touch. Graduate designers-makers’ descriptive practices combined non-verbal manipulation within verbal accounts. We thus argue that haptic vocabularies do not simply describe material qualities, but rather are situated competences that physically demonstrate the presence of haptic qualities. Such competencies are more important than groups of verbal vocabularies in isolation. Design support for developing and extending haptic competences must take this wide range of considerations into account to comprehensively improve designers’ capabilities
Between images and built form: Automating the recognition of standardised building components using deep learning
Building on the richness of recent contributions in the field, this paper presents a state-of-the-art CNN analysis method for automatingthe recognition of standardised building components in modern heritage buildings. At the turn of the twentieth century manufacturedbuilding components became widely advertised for specification by architects. Consequently, a form of standardisation across varioustypologies began to take place. During this era of rapid economic and industrialised growth, many forms of public building wereerected. This paper seeks to demonstrate a method for informing the recognition of such elements using deep learning to recognise'families' of elements across a range of buildings in order to retrieve and recognise their technical specifications from the contemporarytrade literature. The method is illustrated through the case of Carnegie Public Libraries in the UK, which provides a unique butubiquitous platform from which to explore the potential for the automated recognition of manufactured standard architecturalcomponents. The aim of enhancing this knowledge base is to use the degree to which these were standardised originally as a means toinform and so support their ongoing care but also that of many other contemporary buildings. Although these libraries are numerous,they are maintained at a local level and as such, their shared challenges for maintenance remain unknown to one another. Additionally,this paper presents a methodology to indirectly retrieve useful indicators and semantics, relating to emerging HBIM families, byapplying deep learning to a varied range of architectural imagery
Explaining Autonomous Driving Actions with Visual Question Answering
The end-to-end learning ability of self-driving vehicles has achieved
significant milestones over the last decade owing to rapid advances in deep
learning and computer vision algorithms. However, as autonomous driving
technology is a safety-critical application of artificial intelligence (AI),
road accidents and established regulatory principles necessitate the need for
the explainability of intelligent action choices for self-driving vehicles. To
facilitate interpretability of decision-making in autonomous driving, we
present a Visual Question Answering (VQA) framework, which explains driving
actions with question-answering-based causal reasoning. To do so, we first
collect driving videos in a simulation environment using reinforcement learning
(RL) and extract consecutive frames from this log data uniformly for five
selected action categories. Further, we manually annotate the extracted frames
using question-answer pairs as justifications for the actions chosen in each
scenario. Finally, we evaluate the correctness of the VQA-predicted answers for
actions on unseen driving scenes. The empirical results suggest that the VQA
mechanism can provide support to interpret real-time decisions of autonomous
vehicles and help enhance overall driving safety.Comment: Accepted to the 2023 IEEE International Conference on Intelligent
Transportation Systems (IEEE ITSC-2023
- …