234 research outputs found

    FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing

    Full text link
    Text-conditional image editing is a very useful task that has recently emerged with immeasurable potential. Most current real image editing methods first need to complete the reconstruction of the image, and then editing is carried out by various methods based on the reconstruction. Most methods use DDIM Inversion for reconstruction, however, DDIM Inversion often fails to guarantee reconstruction performance, i.e., it fails to produce results that preserve the original image content. To address the problem of reconstruction failure, we propose FEC, which consists of three sampling methods, each designed for different editing types and settings. Our three methods of FEC achieve two important goals in image editing task: 1) ensuring successful reconstruction, i.e., sampling to get a generated result that preserves the texture and features of the original real image. 2) these sampling methods can be paired with many editing methods and greatly improve the performance of these editing methods to accomplish various editing tasks. In addition, none of our sampling methods require fine-tuning of the diffusion model or time-consuming training on large-scale datasets. Hence the cost of time as well as the use of computer memory and computation can be significantly reduced

    Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition

    Full text link
    Incorporating large-scale pre-trained models with the prototypical neural networks is a de-facto paradigm in few-shot named entity recognition. Existing methods, unfortunately, are not aware of the fact that embeddings from pre-trained models contain a prominently large amount of information regarding word frequencies, biasing prototypical neural networks against learning word entities. This discrepancy constrains the two models' synergy. Thus, we propose a one-line-code normalization method to reconcile such a mismatch with empirical and theoretical grounds. Our experiments based on nine benchmark datasets show the superiority of our method over the counterpart models and are comparable to the state-of-the-art methods. In addition to the model enhancement, our work also provides an analytical viewpoint for addressing the general problems in few-shot name entity recognition or other tasks that rely on pre-trained models or prototypical neural networks.Comment: Findings of EMNLP 202

    KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

    Full text link
    Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training

    Wetland mapping in the Balqash Lake Basin Using Multi-source Remote Sensing Data and Topographic features Synergic Retrieval

    Get PDF
    AbstractWetland plays a major role in the hydrological cycle, the carbon sink (carbon sequestration), nitrogen absorption, geochemical cycle, water conservation, biological diversity. Traditional field surveys for mapping wetlands distribution in large areas are very difficult to undertake. Remote sensing techniques offer promising solutions to this problem. But spectral confusion with other land cover classes and different types of wetlands, it is difficult to extract wetland information automatically. The overarching goal of this study was to develop a hybrid method for lake wetlands automated delineation by integrated using multi-source remote sensing data and DEM data. Firstly, it is to do radiance correction and convert image DN value to reflectance or radiance. Secondly, spectral index calculation and topographic indices derive, such as NDVI, NDWI, TVDI, slope and others topographic feature indices and etc. Thirdly, water bodies extraction through the NDWI iterative computation. Finally, it is to retrieve marsh land from image via comprehensive information of soil moisture character, topographic factors and spatial analysis. By the above steps, we got the ultimate wetlands distribution information. The methodology was evaluated by the balqash lake basin wetland extraction in Kazakhstan. Experiments result shows that the hybrid method performs well in lake wetlands delineation. The overall accuracies of wetland classes exceed 85%, which can meet the application requirements

    Understanding Daily Travel Patterns of Subway Users – An Example from the Beijing Subway

    Get PDF
    The daily travel patterns (DTPs) present short-term and timely characteristics of the users’ travel behaviour, and they are helpful for subway planners to better understand the travel choices and regularity of subway users (SUs) in details. While several well-known subway travel patterns have been detected, such as commuting modes and shopping modes, specific features of many patterns are still confused or omitted. Now, based on the automatic fare collection (AFC) system, a data-mining procedure to recognize DTPs of all SUs has become possible and effective. In this study, DTPs are identified by the station sequences (SSs), which are modelled from smart card transaction data of the AFC system. The data-mining procedure is applied to a large weekly sample from the Beijing Subway to understand DTPs. The results show that more than 93% SUs of the Beijing Subway travel in 7 DTPs, which are remarkably stable in share and distribution. Different DTPs have their own unique characteristics in terms of time distribution, activity duration and repeatability, which provide a wealth of information to calibrate different types of users and characterize their travel patterns.</p

    WaveDM: Wavelet-Based Diffusion Models for Image Restoration

    Full text link
    Latest diffusion-based methods for many image restoration tasks outperform traditional models, but they encounter the long-time inference problem. To tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM) with an Efficient Conditional Sampling (ECS) strategy. WaveDM learns the distribution of clean images in the wavelet domain conditioned on the wavelet spectrum of degraded images after wavelet transform, which is more time-saving in each step of sampling than modeling in the spatial domain. In addition, ECS follows the same procedure as the deterministic implicit sampling in the initial sampling period and then stops to predict clean images directly, which reduces the number of total sampling steps to around 5. Evaluations on four benchmark datasets including image raindrop removal, defocus deblurring, demoir\'eing, and denoising demonstrate that WaveDM achieves state-of-the-art performance with the efficiency that is comparable to traditional one-pass methods and over 100 times faster than existing image restoration methods using vanilla diffusion models

    Relational Learning between Multiple Pulmonary Nodules via Deep Set Attention Transformers

    Full text link
    Diagnosis and treatment of multiple pulmonary nodules are clinically important but challenging. Prior studies on nodule characterization use solitary-nodule approaches on multiple nodular patients, which ignores the relations between nodules. In this study, we propose a multiple instance learning (MIL) approach and empirically prove the benefit to learn the relations between multiple nodules. By treating the multiple nodules from a same patient as a whole, critical relational information between solitary-nodule voxels is extracted. To our knowledge, it is the first study to learn the relations between multiple pulmonary nodules. Inspired by recent advances in natural language processing (NLP) domain, we introduce a self-attention transformer equipped with 3D CNN, named {NoduleSAT}, to replace typical pooling-based aggregation in multiple instance learning. Extensive experiments on lung nodule false positive reduction on LUNA16 database, and malignancy classification on LIDC-IDRI database, validate the effectiveness of the proposed method.Comment: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI 2020
    corecore