4,555 research outputs found
Fused Text Segmentation Networks for Multi-oriented Scene Text Detection
In this paper, we introduce a novel end-end framework for multi-oriented
scene text detection from an instance-aware semantic segmentation perspective.
We present Fused Text Segmentation Networks, which combine multi-level features
during the feature extracting as text instance may rely on finer feature
expression compared to general objects. It detects and segments the text
instance jointly and simultaneously, leveraging merits from both semantic
segmentation task and region proposal based object detection task. Not
involving any extra pipelines, our approach surpasses the current state of the
art on multi-oriented scene text detection benchmarks: ICDAR2015 Incidental
Scene Text and MSRA-TD500 reaching Hmean 84.1% and 82.0% respectively. Morever,
we report a baseline on total-text containing curved text which suggests
effectiveness of the proposed approach.Comment: Accepted by ICPR201
FSS-1000: A 1000-Class Dataset for Few-Shot Segmentation
Over the past few years, we have witnessed the success of deep learning in
image recognition thanks to the availability of large-scale human-annotated
datasets such as PASCAL VOC, ImageNet, and COCO. Although these datasets have
covered a wide range of object categories, there are still a significant number
of objects that are not included. Can we perform the same task without a lot of
human annotations? In this paper, we are interested in few-shot object
segmentation where the number of annotated training examples are limited to 5
only. To evaluate and validate the performance of our approach, we have built a
few-shot segmentation dataset, FSS-1000, which consists of 1000 object classes
with pixelwise annotation of ground-truth segmentation. Unique in FSS-1000, our
dataset contains significant number of objects that have never been seen or
annotated in previous datasets, such as tiny daily objects, merchandise,
cartoon characters, logos, etc. We build our baseline model using standard
backbone networks such as VGG-16, ResNet-101, and Inception. To our surprise,
we found that training our model from scratch using FSS-1000 achieves
comparable and even better results than training with weights pre-trained by
ImageNet which is more than 100 times larger than FSS-1000. Both our approach
and dataset are simple, effective, and easily extensible to learn segmentation
of new object classes given very few annotated training examples. Dataset is
available at https://github.com/HKUSTCV/FSS-1000
CDC: Convolutional-De-Convolutional Networks for Precise Temporal Action Localization in Untrimmed Videos
Temporal action localization is an important yet challenging problem. Given a
long, untrimmed video consisting of multiple action instances and complex
background contents, we need not only to recognize their action categories, but
also to localize the start time and end time of each instance. Many
state-of-the-art systems use segment-level classifiers to select and rank
proposal segments of pre-determined boundaries. However, a desirable model
should move beyond segment-level and make dense predictions at a fine
granularity in time to determine precise temporal boundaries. To this end, we
design a novel Convolutional-De-Convolutional (CDC) network that places CDC
filters on top of 3D ConvNets, which have been shown to be effective for
abstracting action semantics but reduce the temporal length of the input data.
The proposed CDC filter performs the required temporal upsampling and spatial
downsampling operations simultaneously to predict actions at the frame-level
granularity. It is unique in jointly modeling action semantics in space-time
and fine-grained temporal dynamics. We train the CDC network in an end-to-end
manner efficiently. Our model not only achieves superior performance in
detecting actions in every frame, but also significantly boosts the precision
of localizing temporal boundaries. Finally, the CDC network demonstrates a very
high efficiency with the ability to process 500 frames per second on a single
GPU server. We will update the camera-ready version and publish the source
codes online soon.Comment: IEEE Conference on Computer Vision and Pattern Recognition (CVPR),
201
Millisecond single-molecule localization microscopy combined with convolution analysis and automated image segmentation to determine protein concentrations in complexly structured, functional cells, one cell at a time
We present a single-molecule tool called the CoPro (Concentration of
Proteins) method that uses millisecond imaging with convolution analysis,
automated image segmentation and super-resolution localization microscopy to
generate robust estimates for protein concentration in different compartments
of single living cells, validated using realistic simulations of complex
multiple compartment cell types. We demonstrates its utility experimentally on
model Escherichia coli bacteria and Saccharomyces cerevisiae budding yeast
cells, and use it to address the biological question of how signals are
transduced in cells. Cells in all domains of life dynamically sense their
environment through signal transduction mechanisms, many involving gene
regulation. The glucose sensing mechanism of S. cerevisiae is a model system
for studying gene regulatory signal transduction. It uses the multi-copy
expression inhibitor of the GAL gene family, Mig1, to repress unwanted genes in
the presence of elevated extracellular glucose concentrations. We fluorescently
labelled Mig1 molecules with green fluorescent protein (GFP) via chromosomal
integration at physiological expression levels in living S. cerevisiae cells,
in addition to the RNA polymerase protein Nrd1 with the fluorescent protein
reporter mCherry. Using CoPro we make quantitative estimates of Mig1 and Nrd1
protein concentrations in the cytoplasm and nucleus compartments on a
cell-by-cell basis under physiological conditions. These estimates indicate a
4-fold shift towards higher values in concentration of diffusive Mig1 in the
nucleus if the external glucose concentration is raised, whereas equivalent
levels in the cytoplasm shift to smaller values with a relative change an order
of magnitude smaller. This compares with Nrd1 which is not involved directly in
glucose sensing, which is almost exclusively localized in the nucleus under
high and..
ParseNet: Looking Wider to See Better
We present a technique for adding global context to deep convolutional
networks for semantic segmentation. The approach is simple, using the average
feature for a layer to augment the features at each location. In addition, we
study several idiosyncrasies of training, significantly increasing the
performance of baseline networks (e.g. from FCN). When we add our proposed
global feature, and a technique for learning normalization parameters, accuracy
increases consistently even over our improved versions of the baselines. Our
proposed approach, ParseNet, achieves state-of-the-art performance on SiftFlow
and PASCAL-Context with small additional computational cost over baselines, and
near current state-of-the-art performance on PASCAL VOC 2012 semantic
segmentation with a simple approach. Code is available at
https://github.com/weiliu89/caffe/tree/fcn .Comment: ICLR 2016 submissio
- …