373 research outputs found
Nonconvex Nonsmooth Low-Rank Minimization via Iteratively Reweighted Nuclear Norm
The nuclear norm is widely used as a convex surrogate of the rank function in
compressive sensing for low rank matrix recovery with its applications in image
recovery and signal processing. However, solving the nuclear norm based relaxed
convex problem usually leads to a suboptimal solution of the original rank
minimization problem. In this paper, we propose to perform a family of
nonconvex surrogates of -norm on the singular values of a matrix to
approximate the rank function. This leads to a nonconvex nonsmooth minimization
problem. Then we propose to solve the problem by Iteratively Reweighted Nuclear
Norm (IRNN) algorithm. IRNN iteratively solves a Weighted Singular Value
Thresholding (WSVT) problem, which has a closed form solution due to the
special properties of the nonconvex surrogate functions. We also extend IRNN to
solve the nonconvex problem with two or more blocks of variables. In theory, we
prove that IRNN decreases the objective function value monotonically, and any
limit point is a stationary point. Extensive experiments on both synthesized
data and real images demonstrate that IRNN enhances the low-rank matrix
recovery compared with state-of-the-art convex algorithms
Generalized Nonconvex Nonsmooth Low-Rank Minimization
As surrogate functions of -norm, many nonconvex penalty functions have
been proposed to enhance the sparse vector recovery. It is easy to extend these
nonconvex penalty functions on singular values of a matrix to enhance low-rank
matrix recovery. However, different from convex optimization, solving the
nonconvex low-rank minimization problem is much more challenging than the
nonconvex sparse minimization problem. We observe that all the existing
nonconvex penalty functions are concave and monotonically increasing on
. Thus their gradients are decreasing functions. Based on this
property, we propose an Iteratively Reweighted Nuclear Norm (IRNN) algorithm to
solve the nonconvex nonsmooth low-rank minimization problem. IRNN iteratively
solves a Weighted Singular Value Thresholding (WSVT) problem. By setting the
weight vector as the gradient of the concave penalty function, the WSVT problem
has a closed form solution. In theory, we prove that IRNN decreases the
objective function value monotonically, and any limit point is a stationary
point. Extensive experiments on both synthetic data and real images demonstrate
that IRNN enhances the low-rank matrix recovery compared with state-of-the-art
convex algorithms.Comment: IEEE International Conference on Computer Vision and Pattern
Recognition, 201
On the Sample Complexity of Multichannel Frequency Estimation via Convex Optimization
The use of multichannel data in line spectral estimation (or frequency
estimation) is common for improving the estimation accuracy in array
processing, structural health monitoring, wireless communications, and more.
Recently proposed atomic norm methods have attracted considerable attention due
to their provable superiority in accuracy, flexibility and robustness compared
with conventional approaches. In this paper, we analyze atomic norm
minimization for multichannel frequency estimation from noiseless compressive
data, showing that the sample size per channel that ensures exact estimation
decreases with the increase of the number of channels under mild conditions. In
particular, given channels, order samples per channel, selected randomly from
equispaced samples, suffice to ensure with high probability exact
estimation of frequencies that are normalized and mutually separated by at
least . Numerical results are provided corroborating our analysis.Comment: 14 pages, double column, to appear in IEEE Trans. Information Theor
ADPS: Asymmetric Distillation Post-Segmentation for Image Anomaly Detection
Knowledge Distillation-based Anomaly Detection (KDAD) methods rely on the
teacher-student paradigm to detect and segment anomalous regions by contrasting
the unique features extracted by both networks. However, existing KDAD methods
suffer from two main limitations: 1) the student network can effortlessly
replicate the teacher network's representations, and 2) the features of the
teacher network serve solely as a ``reference standard" and are not fully
leveraged. Toward this end, we depart from the established paradigm and instead
propose an innovative approach called Asymmetric Distillation Post-Segmentation
(ADPS). Our ADPS employs an asymmetric distillation paradigm that takes
distinct forms of the same image as the input of the teacher-student networks,
driving the student network to learn discriminating representations for
anomalous regions.
Meanwhile, a customized Weight Mask Block (WMB) is proposed to generate a
coarse anomaly localization mask that transfers the distilled knowledge
acquired from the asymmetric paradigm to the teacher network. Equipped with
WMB, the proposed Post-Segmentation Module (PSM) is able to effectively detect
and segment abnormal regions with fine structures and clear boundaries.
Experimental results demonstrate that the proposed ADPS outperforms the
state-of-the-art methods in detecting and segmenting anomalies. Surprisingly,
ADPS significantly improves Average Precision (AP) metric by 9% and 20% on the
MVTec AD and KolektorSDD2 datasets, respectively.Comment: 11pages,9 figure
Constructing a Non-Negative Low Rank and Sparse Graph with Data-Adaptive Features
This paper aims at constructing a good graph for discovering intrinsic data
structures in a semi-supervised learning setting. Firstly, we propose to build
a non-negative low-rank and sparse (referred to as NNLRS) graph for the given
data representation. Specifically, the weights of edges in the graph are
obtained by seeking a nonnegative low-rank and sparse matrix that represents
each data sample as a linear combination of others. The so-obtained NNLRS-graph
can capture both the global mixture of subspaces structure (by the low
rankness) and the locally linear structure (by the sparseness) of the data,
hence is both generative and discriminative. Secondly, as good features are
extremely important for constructing a good graph, we propose to learn the data
embedding matrix and construct the graph jointly within one framework, which is
termed as NNLRS with embedded features (referred to as NNLRS-EF). Extensive
experiments on three publicly available datasets demonstrate that the proposed
method outperforms the state-of-the-art graph construction method by a large
margin for both semi-supervised classification and discriminative analysis,
which verifies the effectiveness of our proposed method
Image-Specific Information Suppression and Implicit Local Alignment for Text-based Person Search
Text-based person search (TBPS) is a challenging task that aims to search
pedestrian images with the same identity from an image gallery given a query
text. In recent years, TBPS has made remarkable progress and state-of-the-art
methods achieve superior performance by learning local fine-grained
correspondence between images and texts. However, most existing methods rely on
explicitly generated local parts to model fine-grained correspondence between
modalities, which is unreliable due to the lack of contextual information or
the potential introduction of noise. Moreover, existing methods seldom consider
the information inequality problem between modalities caused by image-specific
information. To address these limitations, we propose an efficient joint
Multi-level Alignment Network (MANet) for TBPS, which can learn aligned
image/text feature representations between modalities at multiple levels, and
realize fast and effective person search. Specifically, we first design an
image-specific information suppression module, which suppresses image
background and environmental factors by relation-guided localization and
channel attention filtration respectively. This module effectively alleviates
the information inequality problem and realizes the alignment of information
volume between images and texts. Secondly, we propose an implicit local
alignment module to adaptively aggregate all pixel/word features of image/text
to a set of modality-shared semantic topic centers and implicitly learn the
local fine-grained correspondence between modalities without additional
supervision and cross-modal interactions. And a global alignment is introduced
as a supplement to the local perspective. The cooperation of global and local
alignment modules enables better semantic alignment between modalities.
Extensive experiments on multiple databases demonstrate the effectiveness and
superiority of our MANet
Boosting Few-shot Fine-grained Recognition with Background Suppression and Foreground Alignment
Few-shot fine-grained recognition (FS-FGR) aims to recognize novel
fine-grained categories with the help of limited available samples.
Undoubtedly, this task inherits the main challenges from both few-shot learning
and fine-grained recognition. First, the lack of labeled samples makes the
learned model easy to overfit. Second, it also suffers from high intra-class
variance and low inter-class difference in the datasets. To address this
challenging task, we propose a two-stage background suppression and foreground
alignment framework, which is composed of a background activation suppression
(BAS) module, a foreground object alignment (FOA) module, and a local to local
(L2L) similarity metric. Specifically, the BAS is introduced to generate a
foreground mask for localization to weaken background disturbance and enhance
dominative foreground objects. What's more, considering the lack of labeled
samples, we compute the pairwise similarity of feature maps using both the raw
image and the refined image. The FOA then reconstructs the feature map of each
support sample according to its correction to the query ones, which addresses
the problem of misalignment between support-query image pairs. To enable the
proposed method to have the ability to capture subtle differences in confused
samples, we present a novel L2L similarity metric to further measure the local
similarity between a pair of aligned spatial features in the embedding space.
Extensive experiments conducted on multiple popular fine-grained benchmarks
demonstrate that our method outperforms the existing state-of-the-art by a
large margin.Comment: Preprint under review in TCSVT Journa
- …