61 research outputs found
Recurrent Attentional Networks for Saliency Detection
Convolutional-deconvolution networks can be adopted to perform end-to-end
saliency detection. But, they do not work well with objects of multiple scales.
To overcome such a limitation, in this work, we propose a recurrent attentional
convolutional-deconvolution network (RACDNN). Using spatial transformer and
recurrent network units, RACDNN is able to iteratively attend to selected image
sub-regions to perform saliency refinement progressively. Besides tackling the
scale problem, RACDNN can also learn context-aware features from past
iterations to enhance saliency refinement in future iterations. Experiments on
several challenging saliency detection datasets validate the effectiveness of
RACDNN, and show that RACDNN outperforms state-of-the-art saliency detection
methods.Comment: CVPR 201
Stochastic Downsampling for Cost-Adjustable Inference and Improved Regularization in Convolutional Networks
It is desirable to train convolutional networks (CNNs) to run more
efficiently during inference. In many cases however, the computational budget
that the system has for inference cannot be known beforehand during training,
or the inference budget is dependent on the changing real-time resource
availability. Thus, it is inadequate to train just inference-efficient CNNs,
whose inference costs are not adjustable and cannot adapt to varied inference
budgets. We propose a novel approach for cost-adjustable inference in CNNs -
Stochastic Downsampling Point (SDPoint). During training, SDPoint applies
feature map downsampling to a random point in the layer hierarchy, with a
random downsampling ratio. The different stochastic downsampling configurations
known as SDPoint instances (of the same model) have computational costs
different from each other, while being trained to minimize the same prediction
loss. Sharing network parameters across different instances provides
significant regularization boost. During inference, one may handpick a SDPoint
instance that best fits the inference budget. The effectiveness of SDPoint, as
both a cost-adjustable inference approach and a regularizer, is validated
through extensive experiments on image classification
A low-temperature external cavity diode laser for broad wavelength tuning
We report on the design and characterization of a low-temperature external cavity diode laser (ECDL) system for broad wavelength tuning. The performance achieved with multiple diode models addresses the scarcity of commercial red laser diodes below 633 nm, which is a wavelength range relevant to the spectroscopy of many molecules and ions. Using a combination of multiple-stage thermoelectric cooling and water cooling, the operating temperature of a laser diode is lowered to −64 °C, more than 85 °C below the ambient temperature. The laser system integrates temperature and diffraction grating feedback tunability for coarse and fine wavelength adjustments, respectively. For two different diode models, single-mode operation is achieved with 38 mW output power at 616.8 nm and 69 mW at 622.6 nm, more than 15 nm below their ambient temperature free-running wavelengths. The ECDL design can be used for diodes of any available wavelength, allowing individual diodes to be tuned continuously over tens of nanometers and extending the wavelength coverage of commercial laser diodes
SceneComposer: Any-Level Semantic Image Synthesis
We propose a new framework for conditional image synthesis from semantic
layouts of any precision levels, ranging from pure text to a 2D semantic canvas
with precise shapes. More specifically, the input layout consists of one or
more semantic regions with free-form text descriptions and adjustable precision
levels, which can be set based on the desired controllability. The framework
naturally reduces to text-to-image (T2I) at the lowest level with no shape
information, and it becomes segmentation-to-image (S2I) at the highest level.
By supporting the levels in-between, our framework is flexible in assisting
users of different drawing expertise and at different stages of their creative
workflow. We introduce several novel techniques to address the challenges
coming with this new setup, including a pipeline for collecting training data;
a precision-encoded mask pyramid and a text feature map representation to
jointly encode precision level, semantics, and composition information; and a
multi-scale guided diffusion model to synthesize images. To evaluate the
proposed method, we collect a test dataset containing user-drawn layouts with
diverse scenes and styles. Experimental results show that the proposed method
can generate high-quality images following the layout at given precision, and
compares favorably against existing methods. Project page
\url{https://zengxianyu.github.io/scenec/
AIMS: All-Inclusive Multi-Level Segmentation
Despite the progress of image segmentation for accurate visual entity
segmentation, completing the diverse requirements of image editing applications
for different-level region-of-interest selections remains unsolved. In this
paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS),
which segments visual regions into three levels: part, entity, and relation
(two entities with some semantic relationships). We also build a unified AIMS
model through multi-dataset multi-task training to address the two major
challenges of annotation inconsistency and task correlation. Specifically, we
propose task complementarity, association, and prompt mask encoder for
three-level predictions. Extensive experiments demonstrate the effectiveness
and generalization capacity of our method compared to other state-of-the-art
methods on a single dataset or the concurrent work on segmenting anything. We
will make our code and training model publicly available.Comment: Technical Repor
High-Quality Entity Segmentation
Dense image segmentation tasks e.g., semantic, panoptic) are useful for image
editing, but existing methods can hardly generalize well in an in-the-wild
setting where there are unrestricted image domains, classes, and image
resolution and quality variations. Motivated by these observations, we
construct a new entity segmentation dataset, with a strong focus on
high-quality dense segmentation in the wild. The dataset contains images
spanning diverse image domains and entities, along with plentiful
high-resolution images and high-quality mask annotations for training and
testing. Given the high-quality and -resolution nature of the dataset, we
propose CropFormer which is designed to tackle the intractability of
instance-level segmentation on high-resolution images. It improves mask
prediction by fusing high-res image crops that provide more fine-grained image
details and the full image. CropFormer is the first query-based Transformer
architecture that can effectively fuse mask predictions from multiple image
views, by learning queries that effectively associate the same entities across
the full image and its crop. With CropFormer, we achieve a significant AP gain
of on the challenging entity segmentation task. Furthermore, CropFormer
consistently improves the accuracy of traditional segmentation tasks and
datasets. The dataset and code will be released at
http://luqi.info/entityv2.github.io/.Comment: The project webiste: http://luqi.info/entityv2.github.io
- …