205 research outputs found
Why KDAC? A general activation function for knowledge discovery
Deep learning oriented named entity recognition (DNER) has gradually become
the paradigm of knowledge discovery, which greatly promotes domain
intelligence. However, the current activation function of DNER fails to treat
gradient vanishing, no negative output or non-differentiable existence, which
may impede knowledge exploration caused by the omission and incomplete
representation of latent semantics. To break through the dilemma, we present a
novel activation function termed KDAC. Detailly, KDAC is an aggregation
function with multiple conversion modes. The backbone of the activation region
is the interaction between exponent and linearity, and the both ends extend
through adaptive linear divergence, which surmounts the obstacle of gradient
vanishing and no negative output. Crucially, the non-differentiable points are
alerted and eliminated by an approximate smoothing algorithm. KDAC has a series
of brilliant properties, including nonlinear, stable near-linear transformation
and derivative, as well as dynamic style, etc. We perform experiments based on
BERT-BiLSTM-CNN-CRF model on six benchmark datasets containing different domain
knowledge, such as Weibo, Clinical, E-commerce, Resume, HAZOP and People's
daily. The evaluation results show that KDAC is advanced and effective, and can
provide more generalized activation to stimulate the performance of DNER. We
hope that KDAC can be exploited as a promising activation function to devote
itself to the construction of knowledge.Comment: Accepted by Neurocomputin
CityDreamer: Compositional Generative Model of Unbounded 3D Cities
In recent years, extensive research has focused on 3D natural scene
generation, but the domain of 3D city generation has not received as much
exploration. This is due to the greater challenges posed by 3D city generation,
mainly because humans are more sensitive to structural distortions in urban
environments. Additionally, generating 3D cities is more complex than 3D
natural scenes since buildings, as objects of the same class, exhibit a wider
range of appearances compared to the relatively consistent appearance of
objects like trees in natural scenes. To address these challenges, we propose
CityDreamer, a compositional generative model designed specifically for
unbounded 3D cities, which separates the generation of building instances from
other background objects, such as roads, green lands, and water areas, into
distinct modules. Furthermore, we construct two datasets, OSM and GoogleEarth,
containing a vast amount of real-world city imagery to enhance the realism of
the generated 3D cities both in their layouts and appearances. Through
extensive experiments, CityDreamer has proven its superiority over
state-of-the-art methods in generating a wide range of lifelike 3D cities.Comment: Project page: https://haozhexie.com/project/city-dreame
Fingerprint Presentation Attack Detector Using Global-Local Model
The vulnerability of automated fingerprint recognition systems (AFRSs) to
presentation attacks (PAs) promotes the vigorous development of PA detection
(PAD) technology. However, PAD methods have been limited by information loss
and poor generalization ability, resulting in new PA materials and fingerprint
sensors. This paper thus proposes a global-local model-based PAD (RTK-PAD)
method to overcome those limitations to some extent. The proposed method
consists of three modules, called: 1) the global module; 2) the local module;
and 3) the rethinking module. By adopting the cut-out-based global module, a
global spoofness score predicted from nonlocal features of the entire
fingerprint images can be achieved. While by using the texture
in-painting-based local module, a local spoofness score predicted from
fingerprint patches is obtained. The two modules are not independent but
connected through our proposed rethinking module by localizing two
discriminative patches for the local module based on the global spoofness
score. Finally, the fusion spoofness score by averaging the global and local
spoofness scores is used for PAD. Our experimental results evaluated on LivDet
2017 show that the proposed RTK-PAD can achieve an average classification error
(ACE) of 2.28% and a true detection rate (TDR) of 91.19% when the false
detection rate (FDR) equals 1.0%, which significantly outperformed the
state-of-the-art methods by 10% in terms of TDR (91.19% versus 80.74%).Comment: This paper was accepted by IEEE Transactions on Cybernetics. Current
version is updated with minor revisions on introduction and related work
Long-Range Feature Propagating for Natural Image Matting
Natural image matting estimates the alpha values of unknown regions in the
trimap. Recently, deep learning based methods propagate the alpha values from
the known regions to unknown regions according to the similarity between them.
However, we find that more than 50\% pixels in the unknown regions cannot be
correlated to pixels in known regions due to the limitation of small effective
reception fields of common convolutional neural networks, which leads to
inaccurate estimation when the pixels in the unknown regions cannot be inferred
only with pixels in the reception fields. To solve this problem, we propose
Long-Range Feature Propagating Network (LFPNet), which learns the long-range
context features outside the reception fields for alpha matte estimation.
Specifically, we first design the propagating module which extracts the context
features from the downsampled image. Then, we present Center-Surround Pyramid
Pooling (CSPP) that explicitly propagates the context features from the
surrounding context image patch to the inner center image patch. Finally, we
use the matting module which takes the image, trimap and context features to
estimate the alpha matte. Experimental results demonstrate that the proposed
method performs favorably against the state-of-the-art methods on the
AlphaMatting and Adobe Image Matting datasets
Taming Self-Supervised Learning for Presentation Attack Detection: De-Folding and De-Mixing
Biometric systems are vulnerable to Presentation Attacks (PA) performed using
various Presentation Attack Instruments (PAIs). Even though there are numerous
Presentation Attack Detection (PAD) techniques based on both deep learning and
hand-crafted features, the generalization of PAD for unknown PAI is still a
challenging problem. In this work, we empirically prove that the initialization
of the PAD model is a crucial factor for the generalization, which is rarely
discussed in the community. Based on such observation, we proposed a
self-supervised learning-based method, denoted as DF-DM. Specifically, DF-DM is
based on a global-local view coupled with De-Folding and De-Mixing to derive
the task-specific representation for PAD. During De-Folding, the proposed
technique will learn region-specific features to represent samples in a local
pattern by explicitly minimizing generative loss. While De-Mixing drives
detectors to obtain the instance-specific features with global information for
more comprehensive representation by minimizing interpolation-based
consistency. Extensive experimental results show that the proposed method can
achieve significant improvements in terms of both face and fingerprint PAD in
more complicated and hybrid datasets when compared with state-of-the-art
methods. When training in CASIA-FASD and Idiap Replay-Attack, the proposed
method can achieve an 18.60% Equal Error Rate (EER) in OULU-NPU and MSU-MFSD,
exceeding baseline performance by 9.54%. The source code of the proposed
technique is available at https://github.com/kongzhecn/dfdm.Comment: Accepted by IEEE Transactions on Neural Networks and Learning Systems
(TNNLS
Multi-view 3D Face Reconstruction Based on Flame
At present, face 3D reconstruction has broad application prospects in various
fields, but the research on it is still in the development stage. In this
paper, we hope to achieve better face 3D reconstruction quality by combining
multi-view training framework with face parametric model Flame, propose a
multi-view training and testing model MFNet (Multi-view Flame Network). We
build a self-supervised training framework and implement constraints such as
multi-view optical flow loss function and face landmark loss, and finally
obtain a complete MFNet. We propose innovative implementations of multi-view
optical flow loss and the covisible mask. We test our model on AFLW and
facescape datasets and also take pictures of our faces to reconstruct 3D faces
while simulating actual scenarios as much as possible, which achieves good
results. Our work mainly addresses the problem of combining parametric models
of faces with multi-view face 3D reconstruction and explores the implementation
of a Flame based multi-view training and testing framework for contributing to
the field of face 3D reconstruction
- …