45 research outputs found

    Region-Aware Portrait Retouching with Sparse Interactive Guidance

    Full text link
    Portrait retouching aims to improve the aesthetic quality of input portrait photos and especially requires human-region priority. \pink{The deep learning-based methods largely elevate the retouching efficiency and provide promising retouched results. However, existing portrait retouching methods focus on automatic retouching, which treats all human-regions equally and ignores users' preferences for specific individuals,} thus suffering from limited flexibility in interactive scenarios. In this work, we emphasize the importance of users' intents and explore the interactive portrait retouching task. Specifically, we propose a region-aware retouching framework with two branches: an automatic branch and an interactive branch. \pink{The automatic branch involves an encoding-decoding process, which searches region candidates and performs automatic region-aware retouching without user guidance. The interactive branch encodes sparse user guidance into a priority condition vector and modulates latent features with a region selection module to further emphasize the user-specified regions. Experimental results show that our interactive branch effectively captures users' intents and generalizes well to unseen scenes with sparse user guidance, while our automatic branch also outperforms the state-of-the-art retouching methods due to improved region-awareness.

    Best-kk Search Algorithm for Neural Text Generation

    Full text link
    Modern natural language generation paradigms require a good decoding strategy to obtain quality sequences out of the model. Beam search yields high-quality but low diversity outputs; stochastic approaches suffer from high variance and sometimes low quality, but the outputs tend to be more natural and creative. In this work, we propose a deterministic search algorithm balancing both quality and diversity. We first investigate the vanilla best-first search (BFS) algorithm and then propose the Best-kk Search algorithm. Inspired by BFS, we greedily expand the top kk nodes, instead of only the first node, to boost efficiency and diversity. Upweighting recently discovered nodes accompanied by heap pruning ensures the completeness of the search procedure. Experiments on four NLG tasks, including question generation, commonsense generation, text summarization, and translation, show that best-kk search yields more diverse and natural outputs compared to strong baselines, while our approach maintains high text quality. The proposed algorithm is parameter-free, lightweight, efficient, and easy to use.Comment: 17 page

    Sample-Efficient Learning of POMDPs with Multiple Observations In Hindsight

    Full text link
    This paper studies the sample-efficiency of learning in Partially Observable Markov Decision Processes (POMDPs), a challenging problem in reinforcement learning that is known to be exponentially hard in the worst-case. Motivated by real-world settings such as loading in game playing, we propose an enhanced feedback model called ``multiple observations in hindsight'', where after each episode of interaction with the POMDP, the learner may collect multiple additional observations emitted from the encountered latent states, but may not observe the latent states themselves. We show that sample-efficient learning under this feedback model is possible for two new subclasses of POMDPs: \emph{multi-observation revealing POMDPs} and \emph{distinguishable POMDPs}. Both subclasses generalize and substantially relax \emph{revealing POMDPs} -- a widely studied subclass for which sample-efficient learning is possible under standard trajectory feedback. Notably, distinguishable POMDPs only require the emission distributions from different latent states to be \emph{different} instead of \emph{linearly independent} as required in revealing POMDPs

    Coastal Aquaculture Extraction Using GF-3 Fully Polarimetric SAR Imagery: A Framework Integrating UNet++ with Marker-Controlled Watershed Segmentation

    Get PDF
    Coastal aquaculture monitoring is vital for sustainable offshore aquaculture management. However, the dense distribution and various sizes of aquacultures make it challenging to accurately extract the boundaries of aquaculture ponds. In this study, we develop a novel combined framework that integrates UNet++ with a marker-controlled watershed segmentation strategy to facilitate aquaculture boundary extraction from fully polarimetric GaoFen-3 SAR imagery. First, four polarimetric decomposition algorithms were applied to extract 13 polarimetric scattering features. Together with the nine other polarisation and texture features, a total of 22 polarimetric features were then extracted, among which four were optimised according to the separability index. Subsequently, to reduce the “adhesion” phenomenon and separate adjacent and even adhering ponds into individual aquaculture units, two UNet++ subnetworks were utilised to construct the marker and foreground functions, the results of which were then used in the marker-controlled watershed algorithm to obtain refined aquaculture results. A multiclass segmentation strategy that divides the intermediate markers into three categories (aquaculture, background and dikes) was applied to the marker function. In addition, a boundary patch refinement postprocessing strategy was applied to the two subnetworks to extract and repair the complex/error-prone boundaries of the aquaculture ponds, followed by a morphological operation that was conducted for label augmentation. An experimental investigation performed to extract individual aquacultures in the Yancheng Coastal Wetlands indicated that the crucial features for aquacultures are Shannon entropy (SE), the intensity component of SE (SE_I) and the corresponding mean texture features (Mean_SE and Mean_SE_I). When the optimal features were introduced, our proposed method performed better than standard UNet++ in aquaculture extraction, achieving improvements of 1.8%, 3.2%, 21.7% and 12.1% in F1, IoU, MR and insF1, respectively. The experimental results indicate that the proposed method can handle the adhesion of both adjacent objects and unclear boundaries effectively and capture clear and refined aquaculture boundaries

    AI-Generated Incentive Mechanism and Full-Duplex Semantic Communications for Information Sharing

    Full text link
    The next generation of Internet services, such as Metaverse, rely on mixed reality (MR) technology to provide immersive user experiences. However, the limited computation power of MR headset-mounted devices (HMDs) hinders the deployment of such services. Therefore, we propose an efficient information sharing scheme based on full-duplex device-to-device (D2D) semantic communications to address this issue. Our approach enables users to avoid heavy and repetitive computational tasks, such as artificial intelligence-generated content (AIGC) in the view images of all MR users. Specifically, a user can transmit the generated content and semantic information extracted from their view image to nearby users, who can then use this information to obtain the spatial matching of computation results under their view images. We analyze the performance of full-duplex D2D communications, including the achievable rate and bit error probability, by using generalized small-scale fading models. To facilitate semantic information sharing among users, we design a contract theoretic AI-generated incentive mechanism. The proposed diffusion model generates the optimal contract design, outperforming two deep reinforcement learning algorithms, i.e., proximal policy optimization and soft actor-critic algorithms. Our numerical analysis experiment proves the effectiveness of our proposed methods. The code for this paper is available at https://github.com/HongyangDu/SemSharingComment: Accepted by IEEE JSA

    Semantic Communications for Wireless Sensing: RIS-aided Encoding and Self-supervised Decoding

    Full text link
    Semantic communications can reduce the resource consumption by transmitting task-related semantic information extracted from source messages. However, when the source messages are utilized for various tasks, e.g., wireless sensing data for localization and activities detection, semantic communication technique is difficult to be implemented because of the increased processing complexity. In this paper, we propose the inverse semantic communications as a new paradigm. Instead of extracting semantic information from messages, we aim to encode the task-related source messages into a hyper-source message for data transmission or storage. Following this paradigm, we design an inverse semantic-aware wireless sensing framework with three algorithms for data sampling, reconfigurable intelligent surface (RIS)-aided encoding, and self-supervised decoding, respectively. Specifically, on the one hand, we propose a novel RIS hardware design for encoding several signal spectrums into one MetaSpectrum. To select the task-related signal spectrums for achieving efficient encoding, a semantic hash sampling method is introduced. On the other hand, we propose a self-supervised learning method for decoding the MetaSpectrums to obtain the original signal spectrums. Using the sensing data collected from real-world, we show that our framework can reduce the data volume by 95% compared to that before encoding, without affecting the accomplishment of sensing tasks. Moreover, compared with the typically used uniform sampling scheme, the proposed semantic hash sampling scheme can achieve 67% lower mean squared error in recovering the sensing parameters. In addition, experiment results demonstrate that the amplitude response matrix of the RIS enables the encryption of the sensing data

    A Unified Framework for Guiding Generative AI with Wireless Perception in Resource Constrained Mobile Edge Networks

    Full text link
    With the significant advancements in artificial intelligence (AI) technologies and powerful computational capabilities, generative AI (GAI) has become a pivotal digital content generation technique for offering superior digital services. However, directing GAI towards desired outputs still suffer the inherent instability of the AI model. In this paper, we design a novel framework that utilizes wireless perception to guide GAI (WiPe-GAI) for providing digital content generation service, i.e., AI-generated content (AIGC), in resource-constrained mobile edge networks. Specifically, we first propose a new sequential multi-scale perception (SMSP) algorithm to predict user skeleton based on the channel state information (CSI) extracted from wireless signals. This prediction then guides GAI to provide users with AIGC, such as virtual character generation. To ensure the efficient operation of the proposed framework in resource constrained networks, we further design a pricing-based incentive mechanism and introduce a diffusion model based approach to generate an optimal pricing strategy for the service provisioning. The strategy maximizes the user's utility while enhancing the participation of the virtual service provider (VSP) in AIGC provision. The experimental results demonstrate the effectiveness of the designed framework in terms of skeleton prediction and optimal pricing strategy generation comparing with other existing solutions

    Performance Analysis of Free-Space Information Sharing in Full-Duplex Semantic Communications

    Full text link
    In next-generation Internet services, such as Metaverse, the mixed reality (MR) technique plays a vital role. Yet the limited computing capacity of the user-side MR headset-mounted device (HMD) prevents its further application, especially in scenarios that require a lot of computation. One way out of this dilemma is to design an efficient information sharing scheme among users to replace the heavy and repetitive computation. In this paper, we propose a free-space information sharing mechanism based on full-duplex device-to-device (D2D) semantic communications. Specifically, the view images of MR users in the same real-world scenario may be analogous. Therefore, when one user (i.e., a device) completes some computation tasks, the user can send his own calculation results and the semantic features extracted from the user's own view image to nearby users (i.e., other devices). On this basis, other users can use the received semantic features to obtain the spatial matching of the computational results under their own view images without repeating the computation. Using generalized small-scale fading models, we analyze the key performance indicators of full-duplex D2D communications, including channel capacity and bit error probability, which directly affect the transmission of semantic information. Finally, the numerical analysis experiment proves the effectiveness of our proposed methods
    corecore