45 research outputs found
Region-Aware Portrait Retouching with Sparse Interactive Guidance
Portrait retouching aims to improve the aesthetic quality of input portrait
photos and especially requires human-region priority. \pink{The deep
learning-based methods largely elevate the retouching efficiency and provide
promising retouched results. However, existing portrait retouching methods
focus on automatic retouching, which treats all human-regions equally and
ignores users' preferences for specific individuals,} thus suffering from
limited flexibility in interactive scenarios. In this work, we emphasize the
importance of users' intents and explore the interactive portrait retouching
task. Specifically, we propose a region-aware retouching framework with two
branches: an automatic branch and an interactive branch. \pink{The automatic
branch involves an encoding-decoding process, which searches region candidates
and performs automatic region-aware retouching without user guidance. The
interactive branch encodes sparse user guidance into a priority condition
vector and modulates latent features with a region selection module to further
emphasize the user-specified regions. Experimental results show that our
interactive branch effectively captures users' intents and generalizes well to
unseen scenes with sparse user guidance, while our automatic branch also
outperforms the state-of-the-art retouching methods due to improved
region-awareness.
Best- Search Algorithm for Neural Text Generation
Modern natural language generation paradigms require a good decoding strategy
to obtain quality sequences out of the model. Beam search yields high-quality
but low diversity outputs; stochastic approaches suffer from high variance and
sometimes low quality, but the outputs tend to be more natural and creative. In
this work, we propose a deterministic search algorithm balancing both quality
and diversity. We first investigate the vanilla best-first search (BFS)
algorithm and then propose the Best- Search algorithm. Inspired by BFS, we
greedily expand the top nodes, instead of only the first node, to boost
efficiency and diversity. Upweighting recently discovered nodes accompanied by
heap pruning ensures the completeness of the search procedure. Experiments on
four NLG tasks, including question generation, commonsense generation, text
summarization, and translation, show that best- search yields more diverse
and natural outputs compared to strong baselines, while our approach maintains
high text quality. The proposed algorithm is parameter-free, lightweight,
efficient, and easy to use.Comment: 17 page
Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight
This paper studies the sample-efficiency of learning in Partially Observable
Markov Decision Processes (POMDPs), a challenging problem in reinforcement
learning that is known to be exponentially hard in the worst-case. Motivated by
real-world settings such as loading in game playing, we propose an enhanced
feedback model called ``multiple observations in hindsight'', where after each
episode of interaction with the POMDP, the learner may collect multiple
additional observations emitted from the encountered latent states, but may not
observe the latent states themselves. We show that sample-efficient learning
under this feedback model is possible for two new subclasses of POMDPs:
\emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}.
Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a
widely studied subclass for which sample-efficient learning is possible under
standard trajectory feedback. Notably, distinguishable POMDPs only require the
emission distributions from different latent states to be \emph{different}
instead of \emph{linearly independent} as required in revealing POMDPs
Coastal Aquaculture Extraction Using GF-3 Fully Polarimetric SAR Imagery: A Framework Integrating UNet++ with Marker-Controlled Watershed Segmentation
Coastal aquaculture monitoring is vital for sustainable offshore aquaculture management. However, the dense distribution and various sizes of aquacultures make it challenging to accurately extract the boundaries of aquaculture ponds. In this study, we develop a novel combined framework that integrates UNet++ with a marker-controlled watershed segmentation strategy to facilitate aquaculture boundary extraction from fully polarimetric GaoFen-3 SAR imagery. First, four polarimetric decomposition algorithms were applied to extract 13 polarimetric scattering features. Together with the nine other polarisation and texture features, a total of 22 polarimetric features were then extracted, among which four were optimised according to the separability index. Subsequently, to reduce the “adhesion” phenomenon and separate adjacent and even adhering ponds into individual aquaculture units, two UNet++ subnetworks were utilised to construct the marker and foreground functions, the results of which were then used in the marker-controlled watershed algorithm to obtain refined aquaculture results. A multiclass segmentation strategy that divides the intermediate markers into three categories (aquaculture, background and dikes) was applied to the marker function. In addition, a boundary patch refinement postprocessing strategy was applied to the two subnetworks to extract and repair the complex/error-prone boundaries of the aquaculture ponds, followed by a morphological operation that was conducted for label augmentation. An experimental investigation performed to extract individual aquacultures in the Yancheng Coastal Wetlands indicated that the crucial features for aquacultures are Shannon entropy (SE), the intensity component of SE (SE_I) and the corresponding mean texture features (Mean_SE and Mean_SE_I). When the optimal features were introduced, our proposed method performed better than standard UNet++ in aquaculture extraction, achieving improvements of 1.8%, 3.2%, 21.7% and 12.1% in F1, IoU, MR and insF1, respectively. The experimental results indicate that the proposed method can handle the adhesion of both adjacent objects and unclear boundaries effectively and capture clear and refined aquaculture boundaries
AI-Generated Incentive Mechanism and Full-Duplex Semantic Communications for Information Sharing
The next generation of Internet services, such as Metaverse, rely on mixed
reality (MR) technology to provide immersive user experiences. However, the
limited computation power of MR headset-mounted devices (HMDs) hinders the
deployment of such services. Therefore, we propose an efficient information
sharing scheme based on full-duplex device-to-device (D2D) semantic
communications to address this issue. Our approach enables users to avoid heavy
and repetitive computational tasks, such as artificial intelligence-generated
content (AIGC) in the view images of all MR users. Specifically, a user can
transmit the generated content and semantic information extracted from their
view image to nearby users, who can then use this information to obtain the
spatial matching of computation results under their view images. We analyze the
performance of full-duplex D2D communications, including the achievable rate
and bit error probability, by using generalized small-scale fading models. To
facilitate semantic information sharing among users, we design a contract
theoretic AI-generated incentive mechanism. The proposed diffusion model
generates the optimal contract design, outperforming two deep reinforcement
learning algorithms, i.e., proximal policy optimization and soft actor-critic
algorithms. Our numerical analysis experiment proves the effectiveness of our
proposed methods. The code for this paper is available at
https://github.com/HongyangDu/SemSharingComment: Accepted by IEEE JSA
Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding
Semantic communications can reduce the resource consumption by transmitting
task-related semantic information extracted from source messages. However, when
the source messages are utilized for various tasks, e.g., wireless sensing data
for localization and activities detection, semantic communication technique is
difficult to be implemented because of the increased processing complexity. In
this paper, we propose the inverse semantic communications as a new paradigm.
Instead of extracting semantic information from messages, we aim to encode the
task-related source messages into a hyper-source message for data transmission
or storage. Following this paradigm, we design an inverse semantic-aware
wireless sensing framework with three algorithms for data sampling,
reconfigurable intelligent surface (RIS)-aided encoding, and self-supervised
decoding, respectively. Specifically, on the one hand, we propose a novel RIS
hardware design for encoding several signal spectrums into one MetaSpectrum. To
select the task-related signal spectrums for achieving efficient encoding, a
semantic hash sampling method is introduced. On the other hand, we propose a
self-supervised learning method for decoding the MetaSpectrums to obtain the
original signal spectrums. Using the sensing data collected from real-world, we
show that our framework can reduce the data volume by 95% compared to that
before encoding, without affecting the accomplishment of sensing tasks.
Moreover, compared with the typically used uniform sampling scheme, the
proposed semantic hash sampling scheme can achieve 67% lower mean squared error
in recovering the sensing parameters. In addition, experiment results
demonstrate that the amplitude response matrix of the RIS enables the
encryption of the sensing data
A Unified Framework for Guiding Generative AI with Wireless Perception in Resource Constrained Mobile Edge Networks
With the significant advancements in artificial intelligence (AI)
technologies and powerful computational capabilities, generative AI (GAI) has
become a pivotal digital content generation technique for offering superior
digital services. However, directing GAI towards desired outputs still suffer
the inherent instability of the AI model. In this paper, we design a novel
framework that utilizes wireless perception to guide GAI (WiPe-GAI) for
providing digital content generation service, i.e., AI-generated content
(AIGC), in resource-constrained mobile edge networks. Specifically, we first
propose a new sequential multi-scale perception (SMSP) algorithm to predict
user skeleton based on the channel state information (CSI) extracted from
wireless signals. This prediction then guides GAI to provide users with AIGC,
such as virtual character generation. To ensure the efficient operation of the
proposed framework in resource constrained networks, we further design a
pricing-based incentive mechanism and introduce a diffusion model based
approach to generate an optimal pricing strategy for the service provisioning.
The strategy maximizes the user's utility while enhancing the participation of
the virtual service provider (VSP) in AIGC provision. The experimental results
demonstrate the effectiveness of the designed framework in terms of skeleton
prediction and optimal pricing strategy generation comparing with other
existing solutions
Performance Analysis of Free-Space Information Sharing in Full-Duplex Semantic Communications
In next-generation Internet services, such as Metaverse, the mixed reality
(MR) technique plays a vital role. Yet the limited computing capacity of the
user-side MR headset-mounted device (HMD) prevents its further application,
especially in scenarios that require a lot of computation. One way out of this
dilemma is to design an efficient information sharing scheme among users to
replace the heavy and repetitive computation. In this paper, we propose a
free-space information sharing mechanism based on full-duplex device-to-device
(D2D) semantic communications. Specifically, the view images of MR users in the
same real-world scenario may be analogous. Therefore, when one user (i.e., a
device) completes some computation tasks, the user can send his own calculation
results and the semantic features extracted from the user's own view image to
nearby users (i.e., other devices). On this basis, other users can use the
received semantic features to obtain the spatial matching of the computational
results under their own view images without repeating the computation. Using
generalized small-scale fading models, we analyze the key performance
indicators of full-duplex D2D communications, including channel capacity and
bit error probability, which directly affect the transmission of semantic
information. Finally, the numerical analysis experiment proves the
effectiveness of our proposed methods