210 research outputs found
Recommended from our members
Background Replacement in Video Conferencing
Data Availability Statement: Source code is available at Github and dataset is available at Kaggle. Link for GitHub repository: https://github.com/kiranshahi/Real-time-Background-replacement-in-Video-Conferencing; Link for a dataset: https://www.kaggle.com/datasets/nikhilroxtomar/person-segmentation.Copyright © Kiran Shahi, Yongmin Li 2023. Background replacement is one of the most used features in video conferencing applications by many people, perhaps mainly for privacy protection, but also for other purposes such as branding, marketing and promoting professionalism. However, the existing applications in video conference tools have serious limitations. Most applications tend to generate strong artefacts (while there is a slight change in the perspective of the background), or require green screens to avoid such artefacts, which results in an unnatural background or even exposes the original background to other users in the video conference. In this work, we aim to study the relationship between the foreground and background in real-time videos. Three different methods are presented and evaluated, including the baseline U-Net, the lightweight U-Net MobileNet, and the U-Net MobileNet&ConvLSTM models. The above models are trained on public datasets for image segmentation. Experimental results show that both the lightweight U-Net MobileNet and the U-Net MobileNet& ConvLSTM models achieve superior performance as compared to the baseline U-Net model.This research received no external funding
Data augmentation using background replacement for automated sorting of littered waste
The introduction of sophisticated waste treatment plants is making the process of trash sorting and recycling more and more effective and eco-friendly. Studies on Automated Waste Sorting (AWS) are greatly contributing to making the whole recycling process more efficient. However, a relevant issue, which remains unsolved, is how to deal with the large amount of waste that is littered in the environment instead of being collected properly. In this paper, we introduce BackRep: a method for building waste recognizers that can be used for identifying and sorting littered waste directly where it is found. BackRep consists of a data-augmentation procedure, which expands existing datasets by cropping solid waste in images taken on a uniform (white) background and superimposing it on more realistic backgrounds. For our purpose, realistic backgrounds are those representing places where solid waste is usually littered. To experiment with our data-augmentation procedure, we produced a new dataset in realistic settings. We observed that waste recognizers trained on augmented data actually outperform those trained on existing datasets. Hence, our data-augmentation procedure seems a viable approach to support the development of waste recognizers for urban and wild environments
Large-Mass Ultra-Low Noise Germanium Detectors: Performance and Applications in Neutrino and Astroparticle Physics
A new type of radiation detector, a p-type modified electrode germanium
diode, is presented. The prototype displays, for the first time, a combination
of features (mass, energy threshold and background expectation) required for a
measurement of coherent neutrino-nucleus scattering in a nuclear reactor
experiment. The device hybridizes the mass and energy resolution of a
conventional HPGe coaxial gamma spectrometer with the low electronic noise and
threshold of a small x-ray semiconductor detector, also displaying an intrinsic
ability to distinguish multiple from single-site particle interactions. The
present performance of the prototype and possible further improvements are
discussed, as well as other applications for this new type of device in
neutrino and astroparticle physics (double-beta decay, neutrino magnetic moment
and WIMP searches).Comment: submitted to Phys. Rev.
PortraitNet:Real-time portrait segmentation network for mobile device
Real-time portrait segmentation plays a significant role in many applications on mobile device, such as background replacement in video chat or teleconference. In this paper, we propose a real-time portrait segmentation model, called PortraitNet, that can run effectively and efficiently on mobile device. PortraitNet is based on a lightweight U-shape architecture with two auxiliary losses at the training stage, while no additional cost is required at the testing stage for portrait inference. The two auxiliary losses are boundary loss and consistency constraint loss. The former improves the accuracy of boundary pixels, and the latter enhances the robustness in complex lighting environment. We evaluate PortraitNet on portrait segmentation dataset EG1800 and Supervise-Portrait. Compared with the state-of-the-art methods, our approach achieves remarkable performance in terms of both accuracy and efficiency, especially for generating results with sharper boundaries and under severe illumination conditions. Meanwhile, PortraitNet is capable of processing 224 × 224 RGB images at 30 FPS on iPhone 7
LayerDiffusion: Layered Controlled Image Editing with Diffusion Models
Text-guided image editing has recently experienced rapid development.
However, simultaneously performing multiple editing actions on a single image,
such as background replacement and specific subject attribute changes, while
maintaining consistency between the subject and the background remains
challenging. In this paper, we propose LayerDiffusion, a semantic-based layered
controlled image editing method. Our method enables non-rigid editing and
attribute modification of specific subjects while preserving their unique
characteristics and seamlessly integrating them into new backgrounds. We
leverage a large-scale text-to-image model and employ a layered controlled
optimization strategy combined with layered diffusion training. During the
diffusion process, an iterative guidance strategy is used to generate a final
image that aligns with the textual description. Experimental results
demonstrate the effectiveness of our method in generating highly coherent
images that closely align with the given textual description. The edited images
maintain a high similarity to the features of the input image and surpass the
performance of current leading image editing methods. LayerDiffusion opens up
new possibilities for controllable image editing.Comment: 17 pages, 14 figure
PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing
Diffusion models have showcased their remarkable capability to synthesize
diverse and high-quality images, sparking interest in their application for
real image editing. However, existing diffusion-based approaches for local
image editing often suffer from undesired artifacts due to the pixel-level
blending of the noised target images and diffusion latent variables, which lack
the necessary semantics for maintaining image consistency. To address these
issues, we propose PFB-Diff, a Progressive Feature Blending method for
Diffusion-based image editing. Unlike previous methods, PFB-Diff seamlessly
integrates text-guided generated content into the target image through
multi-level feature blending. The rich semantics encoded in deep features and
the progressive blending scheme from high to low levels ensure semantic
coherence and high quality in edited images. Additionally, we introduce an
attention masking mechanism in the cross-attention layers to confine the impact
of specific words to desired regions, further improving the performance of
background editing. PFB-Diff can effectively address various editing tasks,
including object/background replacement and object attribute editing. Our
method demonstrates its superior performance in terms of image fidelity,
editing accuracy, efficiency, and faithfulness to the original image, without
the need for fine-tuning or training.Comment: 18 pages, 15 figure
HeadOn: Real-time Reenactment of Human Portrait Videos
We propose HeadOn, the first real-time source-to-target reenactment approach
for complete human portrait videos that enables transfer of torso and head
motion, face expression, and eye gaze. Given a short RGB-D video of the target
actor, we automatically construct a personalized geometry proxy that embeds a
parametric head, eye, and kinematic torso model. A novel real-time reenactment
algorithm employs this proxy to photo-realistically map the captured motion
from the source actor to the target actor. On top of the coarse geometric
proxy, we propose a video-based rendering technique that composites the
modified target portrait video via view- and pose-dependent texturing, and
creates photo-realistic imagery of the target actor under novel torso and head
poses, facial expressions, and gaze directions. To this end, we propose a
robust tracking of the face and torso of the source actor. We extensively
evaluate our approach and show significant improvements in enabling much
greater flexibility in creating realistic reenacted output videos.Comment: Video: https://www.youtube.com/watch?v=7Dg49wv2c_g Presented at
Siggraph'1
- …