2,407 research outputs found

    WMFormer++: Nested Transformer for Visible Watermark Removal via Implict Joint Learning

    Full text link
    Watermarking serves as a widely adopted approach to safeguard media copyright. In parallel, the research focus has extended to watermark removal techniques, offering an adversarial means to enhance watermark robustness and foster advancements in the watermarking field. Existing watermark removal methods mainly rely on UNet with task-specific decoder branches--one for watermark localization and the other for background image restoration. However, watermark localization and background restoration are not isolated tasks; precise watermark localization inherently implies regions necessitating restoration, and the background restoration process contributes to more accurate watermark localization. To holistically integrate information from both branches, we introduce an implicit joint learning paradigm. This empowers the network to autonomously navigate the flow of information between implicit branches through a gate mechanism. Furthermore, we employ cross-channel attention to facilitate local detail restoration and holistic structural comprehension, while harnessing nested structures to integrate multi-scale information. Extensive experiments are conducted on various challenging benchmarks to validate the effectiveness of our proposed method. The results demonstrate our approach's remarkable superiority, surpassing existing state-of-the-art methods by a large margin

    Networks are Slacking Off: Understanding Generalization Problem in Image Deraining

    Full text link
    Deep deraining networks, while successful in laboratory benchmarks, consistently encounter substantial generalization issues when deployed in real-world applications. A prevailing perspective in deep learning encourages the use of highly complex training data, with the expectation that a richer image content knowledge will facilitate overcoming the generalization problem. However, through comprehensive and systematic experimentation, we discovered that this strategy does not enhance the generalization capability of these networks. On the contrary, it exacerbates the tendency of networks to overfit to specific degradations. Our experiments reveal that better generalization in a deraining network can be achieved by simplifying the complexity of the training data. This is due to the networks are slacking off during training, that is, learning the least complex elements in the image content and degradation to minimize training loss. When the complexity of the background image is less than that of the rain streaks, the network will prioritize the reconstruction of the background, thereby avoiding overfitting to the rain patterns and resulting in improved generalization performance. Our research not only offers a valuable perspective and methodology for better understanding the generalization problem in low-level vision tasks, but also displays promising practical potential

    An aesthetics of touch: investigating the language of design relating to form

    Get PDF
    How well can designers communicate qualities of touch? This paper presents evidence that they have some capability to do so, much of which appears to have been learned, but at present make limited use of such language. Interviews with graduate designer-makers suggest that they are aware of and value the importance of touch and materiality in their work, but lack a vocabulary to fully relate to their detailed explanations of other aspects such as their intent or selection of materials. We believe that more attention should be paid to the verbal dialogue that happens in the design process, particularly as other researchers show that even making-based learning also has a strong verbal element to it. However, verbal language alone does not appear to be adequate for a comprehensive language of touch. Graduate designers-makers’ descriptive practices combined non-verbal manipulation within verbal accounts. We thus argue that haptic vocabularies do not simply describe material qualities, but rather are situated competences that physically demonstrate the presence of haptic qualities. Such competencies are more important than groups of verbal vocabularies in isolation. Design support for developing and extending haptic competences must take this wide range of considerations into account to comprehensively improve designers’ capabilities

    Between images and built form: Automating the recognition of standardised building components using deep learning

    Get PDF
    Building on the richness of recent contributions in the field, this paper presents a state-of-the-art CNN analysis method for automatingthe recognition of standardised building components in modern heritage buildings. At the turn of the twentieth century manufacturedbuilding components became widely advertised for specification by architects. Consequently, a form of standardisation across varioustypologies began to take place. During this era of rapid economic and industrialised growth, many forms of public building wereerected. This paper seeks to demonstrate a method for informing the recognition of such elements using deep learning to recognise'families' of elements across a range of buildings in order to retrieve and recognise their technical specifications from the contemporarytrade literature. The method is illustrated through the case of Carnegie Public Libraries in the UK, which provides a unique butubiquitous platform from which to explore the potential for the automated recognition of manufactured standard architecturalcomponents. The aim of enhancing this knowledge base is to use the degree to which these were standardised originally as a means toinform and so support their ongoing care but also that of many other contemporary buildings. Although these libraries are numerous,they are maintained at a local level and as such, their shared challenges for maintenance remain unknown to one another. Additionally,this paper presents a methodology to indirectly retrieve useful indicators and semantics, relating to emerging HBIM families, byapplying deep learning to a varied range of architectural imagery

    Explaining Autonomous Driving Actions with Visual Question Answering

    Full text link
    The end-to-end learning ability of self-driving vehicles has achieved significant milestones over the last decade owing to rapid advances in deep learning and computer vision algorithms. However, as autonomous driving technology is a safety-critical application of artificial intelligence (AI), road accidents and established regulatory principles necessitate the need for the explainability of intelligent action choices for self-driving vehicles. To facilitate interpretability of decision-making in autonomous driving, we present a Visual Question Answering (VQA) framework, which explains driving actions with question-answering-based causal reasoning. To do so, we first collect driving videos in a simulation environment using reinforcement learning (RL) and extract consecutive frames from this log data uniformly for five selected action categories. Further, we manually annotate the extracted frames using question-answer pairs as justifications for the actions chosen in each scenario. Finally, we evaluate the correctness of the VQA-predicted answers for actions on unseen driving scenes. The empirical results suggest that the VQA mechanism can provide support to interpret real-time decisions of autonomous vehicles and help enhance overall driving safety.Comment: Accepted to the 2023 IEEE International Conference on Intelligent Transportation Systems (IEEE ITSC-2023
    • …
    corecore