91 research outputs found
OVSNet : Towards One-Pass Real-Time Video Object Segmentation
Video object segmentation aims at accurately segmenting the target object
regions across consecutive frames. It is technically challenging for coping
with complicated factors (e.g., shape deformations, occlusion and out of the
lens). Recent approaches have largely solved them by using backforth
re-identification and bi-directional mask propagation. However, their methods
are extremely slow and only support offline inference, which in principle
cannot be applied in real time. Motivated by this observation, we propose a
efficient detection-based paradigm for video object segmentation. We propose an
unified One-Pass Video Segmentation framework (OVS-Net) for modeling
spatial-temporal representation in a unified pipeline, which seamlessly
integrates object detection, object segmentation, and object re-identification.
The proposed framework lends itself to one-pass inference that effectively and
efficiently performs video object segmentation. Moreover, we propose a
maskguided attention module for modeling the multi-scale object boundary and
multi-level feature fusion. Experiments on the challenging DAVIS 2017
demonstrate the effectiveness of the proposed framework with comparable
performance to the state-of-the-art, and the great efficiency about 11.5 FPS
towards pioneering real-time work to our knowledge, more than 5 times faster
than other state-of-the-art methods.Comment: 10 pages, 6 figure
Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction
Stock market plays an important role in the economic development. Due to the
complex volatility of the stock market, the research and prediction on the
change of the stock price, can avoid the risk for the investors. The
traditional time series model ARIMA can not describe the nonlinearity, and can
not achieve satisfactory results in the stock prediction. As neural networks
are with strong nonlinear generalization ability, this paper proposes an
attention-based CNN-LSTM and XGBoost hybrid model to predict the stock price.
The model constructed in this paper integrates the time series model, the
Convolutional Neural Networks with Attention mechanism, the Long Short-Term
Memory network, and XGBoost regressor in a non-linear relationship, and
improves the prediction accuracy. The model can fully mine the historical
information of the stock market in multiple periods. The stock data is first
preprocessed through ARIMA. Then, the deep learning architecture formed in
pretraining-finetuning framework is adopted. The pre-training model is the
Attention-based CNN-LSTM model based on sequence-to-sequence framework. The
model first uses convolution to extract the deep features of the original stock
data, and then uses the Long Short-Term Memory networks to mine the long-term
time series features. Finally, the XGBoost model is adopted for fine-tuning.
The results show that the hybrid model is more effective and the prediction
accuracy is relatively high, which can help investors or institutions to make
decisions and achieve the purpose of expanding return and avoiding risk. Source
code is available at
https://github.com/zshicode/Attention-CLX-stock-prediction.Comment: arXiv admin note: text overlap with arXiv:2202.1380
Graph-guided Architecture Search for Real-time Semantic Segmentation
Designing a lightweight semantic segmentation network often requires
researchers to find a trade-off between performance and speed, which is always
empirical due to the limited interpretability of neural networks. In order to
release researchers from these tedious mechanical trials, we propose a
Graph-guided Architecture Search (GAS) pipeline to automatically search
real-time semantic segmentation networks. Unlike previous works that use a
simplified search space and stack a repeatable cell to form a network, we
introduce a novel search mechanism with new search space where a lightweight
model can be effectively explored through the cell-level diversity and
latencyoriented constraint. Specifically, to produce the cell-level diversity,
the cell-sharing constraint is eliminated through the cell-independent manner.
Then a graph convolution network (GCN) is seamlessly integrated as a
communication mechanism between cells. Finally, a latency-oriented constraint
is endowed into the search process to balance the speed and performance.
Extensive experiments on Cityscapes and CamVid datasets demonstrate that GAS
achieves the new state-of-the-art trade-off between accuracy and speed. In
particular, on Cityscapes dataset, GAS achieves the new best performance of
73.5% mIoU with speed of 108.4 FPS on Titan Xp.Comment: CVPR202
Search What You Want: Barrier Panelty NAS for Mixed Precision Quantization
Emergent hardwares can support mixed precision CNN models inference that
assign different bitwidths for different layers. Learning to find an optimal
mixed precision model that can preserve accuracy and satisfy the specific
constraints on model size and computation is extremely challenge due to the
difficult in training a mixed precision model and the huge space of all
possible bit quantizations. In this paper, we propose a novel soft Barrier
Penalty based NAS (BP-NAS) for mixed precision quantization, which ensures all
the searched models are inside the valid domain defined by the complexity
constraint, thus could return an optimal model under the given constraint by
conducting search only one time. The proposed soft Barrier Penalty is
differentiable and can impose very large losses to those models outside the
valid domain while almost no punishment for models inside the valid domain,
thus constraining the search only in the feasible domain. In addition, a
differentiable Prob-1 regularizer is proposed to ensure learning with NAS is
reasonable. A distribution reshaping training strategy is also used to make
training more stable. BP-NAS sets new state of the arts on both classification
(Cifar-10, ImageNet) and detection (COCO), surpassing all the efficient mixed
precision methods designed manually and automatically. Particularly, BP-NAS
achieves higher mAP (up to 2.7\% mAP improvement) together with lower bit
computation cost compared with the existing best mixed precision model on COCO
detection.Comment: ECCV202
Uncertainty-Aware Consistency Regularization for Cross-Domain Semantic Segmentation
Unsupervised domain adaptation (UDA) aims to adapt existing models of the
source domain to a new target domain with only unlabeled data. Many
adversarial-based UDA methods involve high-instability training and have to
carefully tune the optimization procedure. Some non-adversarial UDA methods
employ a consistency regularization on the target predictions of a student
model and a teacher model under different perturbations, where the teacher
shares the same architecture with the student and is updated by the exponential
moving average of the student. However, these methods suffer from noticeable
negative transfer resulting from either the error-prone discriminator network
or the unreasonable teacher model. In this paper, we propose an
uncertainty-aware consistency regularization method for cross-domain semantic
segmentation. By exploiting the latent uncertainty information of the target
samples, more meaningful and reliable knowledge from the teacher model can be
transferred to the student model. In addition, we further reveal the reason why
the current consistency regularization is often unstable in minimizing the
distribution discrepancy. We also show that our method can effectively ease
this issue by mining the most reliable and meaningful samples with a dynamic
weighting scheme of consistency loss. Experiments demonstrate that the proposed
method outperforms the state-of-the-art methods on two domain adaptation
benchmarks, GTAV Cityscapes and SYNTHIA
Cityscapes
DMT: Dynamic Mutual Training for Semi-Supervised Learning
Recent semi-supervised learning methods use pseudo supervision as core idea,
especially self-training methods that generate pseudo labels. However, pseudo
labels are unreliable. Self-training methods usually rely on single model
prediction confidence to filter low-confidence pseudo labels, thus remaining
high-confidence errors and wasting many low-confidence correct labels. In this
paper, we point out it is difficult for a model to counter its own errors.
Instead, leveraging inter-model disagreement between different models is a key
to locate pseudo label errors. With this new viewpoint, we propose mutual
training between two different models by a dynamically re-weighted loss
function, called Dynamic Mutual Training (DMT). We quantify inter-model
disagreement by comparing predictions from two different models to dynamically
re-weight loss in training, where a larger disagreement indicates a possible
error and corresponds to a lower loss value. Extensive experiments show that
DMT achieves state-of-the-art performance in both image classification and
semantic segmentation. Our codes are released at
https://github.com/voldemortX/DST-CBC .Comment: Reformatte
Context-Aware Mixup for Domain Adaptive Semantic Segmentation
Unsupervised domain adaptation (UDA) aims to adapt a model of the labeled
source domain to an unlabeled target domain. Existing UDA-based semantic
segmentation approaches always reduce the domain shifts in pixel level, feature
level, and output level. However, almost all of them largely neglect the
contextual dependency, which is generally shared across different domains,
leading to less-desired performance. In this paper, we propose a novel
Context-Aware Mixup (CAMix) framework for domain adaptive semantic
segmentation, which exploits this important clue of context-dependency as
explicit prior knowledge in a fully end-to-end trainable manner for enhancing
the adaptability toward the target domain. Firstly, we present a contextual
mask generation strategy by leveraging the accumulated spatial distributions
and prior contextual relationships. The generated contextual mask is critical
in this work and will guide the context-aware domain mixup on three different
levels. Besides, provided the context knowledge, we introduce a
significance-reweighted consistency loss to penalize the inconsistency between
the mixed student prediction and the mixed teacher prediction, which alleviates
the negative transfer of the adaptation, e.g., early performance degradation.
Extensive experiments and analysis demonstrate the effectiveness of our method
against the state-of-the-art approaches on widely-used UDA benchmarks.Comment: Accepted to IEEE Transactions on Circuits and Systems for Video
Technology (TCSVT
Enhanced Boundary Learning for Glass-like Object Segmentation
Glass-like objects such as windows, bottles, and mirrors exist widely in the
real world. Sensing these objects has many applications, including robot
navigation and grasping. However, this task is very challenging due to the
arbitrary scenes behind glass-like objects. This paper aims to solve the
glass-like object segmentation problem via enhanced boundary learning. In
particular, we first propose a novel refined differential module that outputs
finer boundary cues. We then introduce an edge-aware point-based graph
convolution network module to model the global shape along the boundary. We use
these two modules to design a decoder that generates accurate and clean
segmentation results, especially on the object contours. Both modules are
lightweight and effective: they can be embedded into various segmentation
models. In extensive experiments on three recent glass-like object segmentation
datasets, including Trans10k, MSD, and GDD, our approach establishes new
state-of-the-art results. We also illustrate the strong generalization
properties of our method on three generic segmentation datasets, including
Cityscapes, BDD, and COCO Stuff. Code and models is available at
\url{https://github.com/hehao13/EBLNet}.Comment: ICCV-2021 Code is availabe at https://github.com/hehao13/EBLNe
PointFlow: Flowing Semantics Through Points for Aerial Image Segmentation
Aerial Image Segmentation is a particular semantic segmentation problem and
has several challenging characteristics that general semantic segmentation does
not have. There are two critical issues: The one is an extremely
foreground-background imbalanced distribution, and the other is multiple small
objects along with the complex background. Such problems make the recent dense
affinity context modeling perform poorly even compared with baselines due to
over-introduced background context. To handle these problems, we propose a
point-wise affinity propagation module based on the Feature Pyramid Network
(FPN) framework, named PointFlow. Rather than dense affinity learning, a sparse
affinity map is generated upon selected points between the adjacent features,
which reduces the noise introduced by the background while keeping efficiency.
In particular, we design a dual point matcher to select points from the salient
area and object boundaries, respectively. Experimental results on three
different aerial segmentation datasets suggest that the proposed method is more
effective and efficient than state-of-the-art general semantic segmentation
methods. Especially, our methods achieve the best speed and accuracy trade-off
on three aerial benchmarks. Further experiments on three general semantic
segmentation datasets prove the generality of our method. Code will be provided
in (https: //github.com/lxtGH/PFSegNets).Comment: accepted by CVPR202
- …