103 research outputs found
Study on the Formalized Development of the Street Stall Economybased on Domestic and International Experiences and Perspectives
The ground-floor economy has a long history as a significant part of the informal economy. Due to the dependence on its own social status and relationship to the government’s political and economic objectives, it has developed precariously in recent years. In the face of post-epidemic problems, a shortcut is to learn from international experience. This paper used the structural theory and drew from the secondary data, demonstrating the background of informal economy and exploring the rational ways to maintain and develop street vending. Spatialization, legalization and network digitization are proven international approaches, which display the empirical and theoretical implications to urban practice and studies
Further improvement of fluidized bed models by incorporating zone method with Aspen Plus interface
While providing a fast and accurate tool of simulating fluidized beds, the major limitation of classical zero-dimensional ideal reactor models used in process simulators, such as models built into commercial software (e.g. Aspen Plus®), has been the difficulties of involving thermal reciprocity between each reactor model and incorporating heat absorption by the water wall and super-heaters which is usually specified as model inputs rather than predicted by the models themselves. This aspect is of particular importance to the geometry design and evaluation of operating conditions and flexibility of fluidized beds. This paper proposes a novel modelling approach to resolve this limitation by incorporating an external model that marries the advantages of zone method and Aspen Plus in a robust manner. The improved model has a relatively modest computing demand and hence may be incorporated feasibly into dynamic simulations of a whole power plant
EFMVFL: An Efficient and Flexible Multi-party Vertical Federated Learning without a Third Party
Federated learning allows multiple participants to conduct joint modeling
without disclosing their local data. Vertical federated learning (VFL) handles
the situation where participants share the same ID space and different feature
spaces. In most VFL frameworks, to protect the security and privacy of the
participants' local data, a third party is needed to generate homomorphic
encryption key pairs and perform decryption operations. In this way, the third
party is granted the right to decrypt information related to model parameters.
However, it isn't easy to find such a credible entity in the real world.
Existing methods for solving this problem are either communication-intensive or
unsuitable for multi-party scenarios. By combining secret sharing and
homomorphic encryption, we propose a novel VFL framework without a third party
called EFMVFL, which supports flexible expansion to multiple participants with
low communication overhead and is applicable to generalized linear models. We
give instantiations of our framework under logistic regression and Poisson
regression. Theoretical analysis and experiments show that our framework is
secure, more efficient, and easy to be extended to multiple participants.Comment: 9pages,2 figure
Crowd3D: Towards Hundreds of People Reconstruction from a Single Image
Image-based multi-person reconstruction in wide-field large scenes is
critical for crowd analysis and security alert. However, existing methods
cannot deal with large scenes containing hundreds of people, which encounter
the challenges of large number of people, large variations in human scale, and
complex spatial distribution. In this paper, we propose Crowd3D, the first
framework to reconstruct the 3D poses, shapes and locations of hundreds of
people with global consistency from a single large-scene image. The core of our
approach is to convert the problem of complex crowd localization into pixel
localization with the help of our newly defined concept, Human-scene Virtual
Interaction Point (HVIP). To reconstruct the crowd with global consistency, we
propose a progressive reconstruction network based on HVIP by pre-estimating a
scene-level camera and a ground plane. To deal with a large number of persons
and various human sizes, we also design an adaptive human-centric cropping
scheme. Besides, we contribute a benchmark dataset, LargeCrowd, for crowd
reconstruction in a large scene. Experimental results demonstrate the
effectiveness of the proposed method. The code and datasets will be made
public.Comment: 8 pages (not including reference
Learning to reconstruct and understand indoor scenes from sparse views
This paper proposes a new method for simultaneous 3D reconstruction and semantic segmentation for indoor scenes. Unlike existing methods that require recording a video using a color camera and/or a depth camera, our method only needs a small number of (e.g., 3~5) color images from uncalibrated sparse views, which significantly simplifies data acquisition and broadens applicable scenarios. To achieve promising 3D reconstruction from sparse views with limited overlap, our method first recovers the depth map and semantic information for each view, and then fuses the depth maps into a 3D scene. To this end, we design an iterative deep architecture, named IterNet, to estimate the depth map and semantic segmentation alternately. To obtain accurate alignment between views with limited overlap, we further propose a joint global and local registration method to reconstruct a 3D scene with semantic information. We also make available a new indoor synthetic dataset, containing photorealistic high-resolution RGB images, accurate depth maps and pixel-level semantic labels for thousands of complex layouts. Experimental results on public datasets and our dataset demonstrate that our method achieves more accurate depth estimation, smaller semantic segmentation errors, and better 3D reconstruction results over state-of-the-art methods
STAR-TM: STructure aware reconstruction of textured mesh from single image
We present a novel method for single-view 3D reconstruction of textured meshes , with a focus to address the primary challenge surrounding texture inference and transfer. Our key observation is that learning textured reconstruction in a structure-aware and globally consistent manner is effective in handling the severe ill-posedness of the texturing problem and significant variations in object pose and texture details. Specifically, we perform structured mesh reconstruction, via a retrieval-and-assembly approach, to produce a set of genus-zero parts parameterized by deformable boxes and endowed with semantic information. For texturing, we first transfer visible colors from the input image onto the unified UV texture space of the deformable boxes. Then we combine a learned transformer model for per-part texture completion with a global consistency loss to optimize inter-part texture consistency. Our texture completion model operates in a VQ-VAE embedding space and is trained end-to-end, with the transformer training enhanced with retrieved texture instances to improve texture completion performance amid significant occlusion. Extensive experiments demonstrate higher-quality textured mesh reconstruction obtained by our method over state-of-the-art alternatives, both quantitatively and qualitatively, as reflected by a better recovery of texture coherence and details
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation
Our previously proposed MossFormer has achieved promising performance in
monaural speech separation. However, it predominantly adopts a
self-attention-based MossFormer module, which tends to emphasize longer-range,
coarser-scale dependencies, with a deficiency in effectively modelling
finer-scale recurrent patterns. In this paper, we introduce a novel hybrid
model that provides the capabilities to model both long-range, coarse-scale
dependencies and fine-scale recurrent patterns by integrating a recurrent
module into the MossFormer framework. Instead of applying the recurrent neural
networks (RNNs) that use traditional recurrent connections, we present a
recurrent module based on a feedforward sequential memory network (FSMN), which
is considered "RNN-free" recurrent network due to the ability to capture
recurrent patterns without using recurrent connections. Our recurrent module
mainly comprises an enhanced dilated FSMN block by using gated convolutional
units (GCU) and dense connections. In addition, a bottleneck layer and an
output layer are also added for controlling information flow. The recurrent
module relies on linear projections and convolutions for seamless, parallel
processing of the entire sequence. The integrated MossFormer2 hybrid model
demonstrates remarkable enhancements over MossFormer and surpasses other
state-of-the-art methods in WSJ0-2/3mix, Libri2Mix, and WHAM!/WHAMR!
benchmarks.Comment: 5 pages, 3 figures, accepted by ICASSP 202
Robust pose transfer with dynamic details using neural video rendering
Pose transfer of human videos aims to generate a high fidelity video of a target person imitating actions of a source person. A few studies have made great progress either through image translation with deep latent features or neural rendering with explicit 3D features. However, both of them rely on large amounts of training data to generate realistic results, and the performance degrades on more accessible internet videos due to insufficient training frames. In this paper, we demonstrate that the dynamic details can be preserved even trained from short monocular videos. Overall, we propose a neural video rendering framework coupled with an image-translation-based dynamic details generation network (D2G-Net), which fully utilizes both the stability of explicit 3D features and the capacity of learning components. To be specific, a novel hybrid texture representation is presented to encode both the static and pose-varying appearance characteristics, which is then mapped to the image space and rendered as a detail-rich frame in the neural rendering stage. Through extensive comparisons, we demonstrate that our neural human video renderer is capable of achieving both clearer dynamic details and more robust performance even on accessible short videos with only 2k-4k frames
- …