234 research outputs found
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
Text-conditional image editing is a very useful task that has recently
emerged with immeasurable potential. Most current real image editing methods
first need to complete the reconstruction of the image, and then editing is
carried out by various methods based on the reconstruction. Most methods use
DDIM Inversion for reconstruction, however, DDIM Inversion often fails to
guarantee reconstruction performance, i.e., it fails to produce results that
preserve the original image content. To address the problem of reconstruction
failure, we propose FEC, which consists of three sampling methods, each
designed for different editing types and settings. Our three methods of FEC
achieve two important goals in image editing task: 1) ensuring successful
reconstruction, i.e., sampling to get a generated result that preserves the
texture and features of the original real image. 2) these sampling methods can
be paired with many editing methods and greatly improve the performance of
these editing methods to accomplish various editing tasks. In addition, none of
our sampling methods require fine-tuning of the diffusion model or
time-consuming training on large-scale datasets. Hence the cost of time as well
as the use of computer memory and computation can be significantly reduced
Reconciliation of Pre-trained Models and Prototypical Neural Networks in Few-shot Named Entity Recognition
Incorporating large-scale pre-trained models with the prototypical neural
networks is a de-facto paradigm in few-shot named entity recognition. Existing
methods, unfortunately, are not aware of the fact that embeddings from
pre-trained models contain a prominently large amount of information regarding
word frequencies, biasing prototypical neural networks against learning word
entities. This discrepancy constrains the two models' synergy. Thus, we propose
a one-line-code normalization method to reconcile such a mismatch with
empirical and theoretical grounds. Our experiments based on nine benchmark
datasets show the superiority of our method over the counterpart models and are
comparable to the state-of-the-art methods. In addition to the model
enhancement, our work also provides an analytical viewpoint for addressing the
general problems in few-shot name entity recognition or other tasks that rely
on pre-trained models or prototypical neural networks.Comment: Findings of EMNLP 202
KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing
Text-conditioned image editing is a recently emerged and highly practical
task, and its potential is immeasurable. However, most of the concurrent
methods are unable to perform action editing, i.e. they can not produce results
that conform to the action semantics of the editing prompt and preserve the
content of the original image. To solve the problem of action editing, we
propose KV Inversion, a method that can achieve satisfactory reconstruction
performance and action editing, which can solve two major problems: 1) the
edited result can match the corresponding action, and 2) the edited object can
retain the texture and identity of the original real image. In addition, our
method does not require training the Stable Diffusion model itself, nor does it
require scanning a large-scale dataset to perform time-consuming training
Wetland mapping in the Balqash Lake Basin Using Multi-source Remote Sensing Data and Topographic features Synergic Retrieval
AbstractWetland plays a major role in the hydrological cycle, the carbon sink (carbon sequestration), nitrogen absorption, geochemical cycle, water conservation, biological diversity. Traditional field surveys for mapping wetlands distribution in large areas are very difficult to undertake. Remote sensing techniques offer promising solutions to this problem. But spectral confusion with other land cover classes and different types of wetlands, it is difficult to extract wetland information automatically. The overarching goal of this study was to develop a hybrid method for lake wetlands automated delineation by integrated using multi-source remote sensing data and DEM data. Firstly, it is to do radiance correction and convert image DN value to reflectance or radiance. Secondly, spectral index calculation and topographic indices derive, such as NDVI, NDWI, TVDI, slope and others topographic feature indices and etc. Thirdly, water bodies extraction through the NDWI iterative computation. Finally, it is to retrieve marsh land from image via comprehensive information of soil moisture character, topographic factors and spatial analysis. By the above steps, we got the ultimate wetlands distribution information. The methodology was evaluated by the balqash lake basin wetland extraction in Kazakhstan. Experiments result shows that the hybrid method performs well in lake wetlands delineation. The overall accuracies of wetland classes exceed 85%, which can meet the application requirements
Understanding Daily Travel Patterns of Subway Users – An Example from the Beijing Subway
The daily travel patterns (DTPs) present short-term and timely characteristics of the users’ travel behaviour, and they are helpful for subway planners to better understand the travel choices and regularity of subway users (SUs) in details. While several well-known subway travel patterns have been detected, such as commuting modes and shopping modes, specific features of many patterns are still confused or omitted. Now, based on the automatic fare collection (AFC) system, a data-mining procedure to recognize DTPs of all SUs has become possible and effective. In this study, DTPs are identified by the station sequences (SSs), which are modelled from smart card transaction data of the AFC system. The data-mining procedure is applied to a large weekly sample from the Beijing Subway to understand DTPs. The results show that more than 93% SUs of the Beijing Subway travel in 7 DTPs, which are remarkably stable in share and distribution. Different DTPs have their own unique characteristics in terms of time distribution, activity duration and repeatability, which provide a wealth of information to calibrate different types of users and characterize their travel patterns.</p
WaveDM: Wavelet-Based Diffusion Models for Image Restoration
Latest diffusion-based methods for many image restoration tasks outperform
traditional models, but they encounter the long-time inference problem. To
tackle it, this paper proposes a Wavelet-Based Diffusion Model (WaveDM) with an
Efficient Conditional Sampling (ECS) strategy. WaveDM learns the distribution
of clean images in the wavelet domain conditioned on the wavelet spectrum of
degraded images after wavelet transform, which is more time-saving in each step
of sampling than modeling in the spatial domain. In addition, ECS follows the
same procedure as the deterministic implicit sampling in the initial sampling
period and then stops to predict clean images directly, which reduces the
number of total sampling steps to around 5. Evaluations on four benchmark
datasets including image raindrop removal, defocus deblurring, demoir\'eing,
and denoising demonstrate that WaveDM achieves state-of-the-art performance
with the efficiency that is comparable to traditional one-pass methods and over
100 times faster than existing image restoration methods using vanilla
diffusion models
Relational Learning between Multiple Pulmonary Nodules via Deep Set Attention Transformers
Diagnosis and treatment of multiple pulmonary nodules are clinically
important but challenging. Prior studies on nodule characterization use
solitary-nodule approaches on multiple nodular patients, which ignores the
relations between nodules. In this study, we propose a multiple instance
learning (MIL) approach and empirically prove the benefit to learn the
relations between multiple nodules. By treating the multiple nodules from a
same patient as a whole, critical relational information between
solitary-nodule voxels is extracted. To our knowledge, it is the first study to
learn the relations between multiple pulmonary nodules. Inspired by recent
advances in natural language processing (NLP) domain, we introduce a
self-attention transformer equipped with 3D CNN, named {NoduleSAT}, to replace
typical pooling-based aggregation in multiple instance learning. Extensive
experiments on lung nodule false positive reduction on LUNA16 database, and
malignancy classification on LIDC-IDRI database, validate the effectiveness of
the proposed method.Comment: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI
2020
- …