240 research outputs found
A Two-Stage Framework in Cross-Spectrum Domain for Real-Time Speech Enhancement
Two-stage pipeline is popular in speech enhancement tasks due to its
superiority over traditional single-stage methods. The current two-stage
approaches usually enhance the magnitude spectrum in the first stage, and
further modify the complex spectrum to suppress the residual noise and recover
the speech phase in the second stage. The above whole process is performed in
the short-time Fourier transform (STFT) spectrum domain. In this paper, we
re-implement the above second sub-process in the short-time discrete cosine
transform (STDCT) spectrum domain. The reason is that we have found STDCT
performs greater noise suppression capability than STFT. Additionally, the
implicit phase of STDCT ensures simpler and more efficient phase recovery,
which is challenging and computationally expensive in the STFT-based methods.
Therefore, we propose a novel two-stage framework called the STFT-STDCT
spectrum fusion network (FDFNet) for speech enhancement in cross-spectrum
domain. Experimental results demonstrate that the proposed FDFNet outperforms
the previous two-stage methods and also exhibits superior performance compared
to other advanced systems.Comment: Accepted by ICASSP 202
Hypergraph Convolutional Network based Weakly Supervised Point Cloud Semantic Segmentation with Scene-Level Annotations
Point cloud segmentation with scene-level annotations is a promising but
challenging task. Currently, the most popular way is to employ the class
activation map (CAM) to locate discriminative regions and then generate
point-level pseudo labels from scene-level annotations. However, these methods
always suffer from the point imbalance among categories, as well as the sparse
and incomplete supervision from CAM. In this paper, we propose a novel weighted
hypergraph convolutional network-based method, called WHCN, to confront the
challenges of learning point-wise labels from scene-level annotations. Firstly,
in order to simultaneously overcome the point imbalance among different
categories and reduce the model complexity, superpoints of a training point
cloud are generated by exploiting the geometrically homogeneous partition.
Then, a hypergraph is constructed based on the high-confidence superpoint-level
seeds which are converted from scene-level annotations. Secondly, the WHCN
takes the hypergraph as input and learns to predict high-precision point-level
pseudo labels by label propagation. Besides the backbone network consisting of
spectral hypergraph convolution blocks, a hyperedge attention module is learned
to adjust the weights of hyperedges in the WHCN. Finally, a segmentation
network is trained by these pseudo point cloud labels. We comprehensively
conduct experiments on the ScanNet and S3DIS segmentation datasets.
Experimental results demonstrate that the proposed WHCN is effective to predict
the point labels with scene annotations, and yields state-of-the-art results in
the community. The source code is available at
http://zhiyongsu.github.io/Project/WHCN.html
Socio-Economic Management Theory Related to BPM: A Case Study of Dysfunctions in Digital Transformation Strategy
This research claims that dynamic strategies demanded by today’s digital environment exacerbate inconsistency between an organization’s digital transformation efforts and its enterprise architecture (EA) planning process. This phenomenon leads to redundant investments, delayed implementation, and frequent failures in digital transformation projects. In order to investigate this inconsistency, we apply the socioeconomic approach to management (SEAM) theory. Through critical analysis of four case studies in a large manufacturing organization, we clarify the relationship between digital transformation and EA and reveal the dysfunction in strategic implementation from a SEAM and business process management (BPM) perspective. In practice, this research integrates digital transformation and EA to provide a context-specific approach for planning and designing enterprise digital transformation strategies
RXFOOD: Plug-in RGB-X Fusion for Object of Interest Detection
The emergence of different sensors (Near-Infrared, Depth, etc.) is a remedy
for the limited application scenarios of traditional RGB camera. The RGB-X
tasks, which rely on RGB input and another type of data input to resolve
specific problems, have become a popular research topic in multimedia. A
crucial part in two-branch RGB-X deep neural networks is how to fuse
information across modalities. Given the tremendous information inside RGB-X
networks, previous works typically apply naive fusion (e.g., average or max
fusion) or only focus on the feature fusion at the same scale(s). While in this
paper, we propose a novel method called RXFOOD for the fusion of features
across different scales within the same modality branch and from different
modality branches simultaneously in a unified attention mechanism. An Energy
Exchange Module is designed for the interaction of each feature map's energy
matrix, who reflects the inter-relationship of different positions and
different channels inside a feature map. The RXFOOD method can be easily
incorporated to any dual-branch encoder-decoder network as a plug-in module,
and help the original backbone network better focus on important positions and
channels for object of interest detection. Experimental results on RGB-NIR
salient object detection, RGB-D salient object detection, and RGBFrequency
image manipulation detection demonstrate the clear effectiveness of the
proposed RXFOOD.Comment: 10 page
- …