9 research outputs found

    Sequential Semantic Generative Communication for Progressive Text-to-Image Generation

    Full text link
    This paper proposes new framework of communication system leveraging promising generation capabilities of multi-modal generative models. Regarding nowadays smart applications, successful communication can be made by conveying the perceptual meaning, which we set as text prompt. Text serves as a suitable semantic representation of image data as it has evolved to instruct an image or generate image through multi-modal techniques, by being interpreted in a manner similar to human cognition. Utilizing text can also reduce the overload compared to transmitting the intact data itself. The transmitter converts objective image to text through multi-model generation process and the receiver reconstructs the image using reverse process. Each word in the text sentence has each syntactic role, responsible for particular piece of information the text contains. For further efficiency in communication load, the transmitter sequentially sends words in priority of carrying the most information until reaches successful communication. Therefore, our primary focus is on the promising design of a communication system based on image-to-text transformation and the proposed schemes for sequentially transmitting word tokens. Our work is expected to pave a new road of utilizing state-of-the-art generative models to real communication systemsComment: 4 pages, 2 figures, to be published in IEEE International Conference on Sensing, Communication, and Networking, Workshop on Semantic Communication for 6G (SC6G-SECON23

    Generative AI Meets Semantic Communication: Evolution and Revolution of Communication Tasks

    Full text link
    While deep generative models are showing exciting abilities in computer vision and natural language processing, their adoption in communication frameworks is still far underestimated. These methods are demonstrated to evolve solutions to classic communication problems such as denoising, restoration, or compression. Nevertheless, generative models can unveil their real potential in semantic communication frameworks, in which the receiver is not asked to recover the sequence of bits used to encode the transmitted (semantic) message, but only to regenerate content that is semantically consistent with the transmitted message. Disclosing generative models capabilities in semantic communication paves the way for a paradigm shift with respect to conventional communication systems, which has great potential to reduce the amount of data traffic and offers a revolutionary versatility to novel tasks and applications that were not even conceivable a few years ago. In this paper, we present a unified perspective of deep generative models in semantic communication and we unveil their revolutionary role in future communication frameworks, enabling emerging applications and tasks. Finally, we analyze the challenges and opportunities to face to develop generative models specifically tailored for communication systems.Comment: Under consideration in IEEE Network Special Issue "The Interplay Between Generative AI and 5G-Advanced toward 6G

    Image synthesis based on a model of human vision

    Get PDF
    Modern computer graphics systems are able to construct renderings of such high quality that viewers are deceived into regarding the images as coming from a photographic source. Large amounts of computing resources are expended in this rendering process, using complex mathematical models of lighting and shading. However, psychophysical experiments have revealed that viewers only regard certain informative regions within a presented image. Furthermore, it has been shown that these visually important regions contain low-level visual feature differences that attract the attention of the viewer. This thesis will present a new approach to image synthesis that exploits these experimental findings by modulating the spatial quality of image regions by their visual importance. Efficiency gains are therefore reaped, without sacrificing much of the perceived quality of the image. Two tasks must be undertaken to achieve this goal. Firstly, the design of an appropriate region-based model of visual importance, and secondly, the modification of progressive rendering techniques to effect an importance-based rendering approach. A rule-based fuzzy logic model is presented that computes, using spatial feature differences, the relative visual importance of regions in an image. This model improves upon previous work by incorporating threshold effects induced by global feature difference distributions and by using texture concentration measures. A modified approach to progressive ray-tracing is also presented. This new approach uses the visual importance model to guide the progressive refinement of an image. In addition, this concept of visual importance has been incorporated into supersampling, texture mapping and computer animation techniques. Experimental results are presented, illustrating the efficiency gains reaped from using this method of progressive rendering. This visual importance-based rendering approach is expected to have applications in the entertainment industry, where image fidelity may be sacrificed for efficiency purposes, as long as the overall visual impression of the scene is maintained. Different aspects of the approach should find many other applications in image compression, image retrieval, progressive data transmission and active robotic vision

    Out of this word: the effect of parafoveal orthographic information on central word processing

    Get PDF
    The aim of this thesis is to investigate the effect of parafoveal information on central word processing. This topic impacts on two controversial areas of research: the allocation of attention during reading, and letter processing during word recognition. Researchers into the role of attention during reading are split into two camps, with some believing that attention is allocated serially to consecutive words and others that it is spread across multiple words in parallel. This debate has been informed by the results of recent experiments that test a key prediction of the parallel processing theory that parafoveal and foveal processing occur concurrently. However, there is a gap in the literature for tightly-controlled experiments to further test this prediction. In contrast, the study of the processing that letters undergo during word recognition has a long history, with many researchers concluding that letter identity is processed only conjointly with letter ‘slot’ position within a word, known as ‘slot-based’ coding. However, recent innovative studies have demonstrated that more word priming is produced from prime letter strings containing letter transpositions than from primes containing letter substitutions, although this work has not been extended to parafoveal letter prime presentations. This thesis will also discuss the neglected subject of how research into these separate topics of text reading and isolated word recognition can be integrated via parafoveal processing. It presents six experiments designed to investigate how our responses to a central word are affected by varying its relationship with simultaneously presented parafoveal information. Experiment 1 introduced the Flanking Letters Lexical Decision task in which a lexical decision was made to words flanked by bigrams either orthographically related or unrelated to the response word; the results indicated that there is parafoveal orthographic priming but did not support the ‘slot-based’ coding theory as letter order was unimportant. Experiments 2-4 involved eye-tracking of participants who read sentences containing a boundary change that allowed the presentation of an orthographically related word in parafoveal vision. Experiment 2 demonstrated that an orthographically related word at position n+1 reduces first-pass fixations on word n, indicating parallel processing of these words. Experiment 4 replicated this result, and also showed that altering the letter identity of word n+1 reduced orthographic priming whereas altering letter order did not, indicating that slot-based coding of letters does not occur during reading. However, Experiment 3 found that an orthographically related word presented at position n-1 did not prime word n, signifying the influence of reading direction on parafoveal processing. Experiment 5 investigated whether the parallel processing that words undergo during text reading conditions our representations of isolated words; lexical decision times to words flanked by bigrams that formed plausible or implausible contexts did not differ. Lastly, one possible cause of the reading disorder dyslexia is under- or over- processing of parafoveal information. Experiment 6 therefore replicated Experiment 1 including a sample of dyslexia sufferers but found no interaction between reading ability and parafoveal processing. Overall, the results of this thesis lead to the conclusion that there is extensive processing of parafoveal information during both reading (indicating parallel processing) and word recognition (contraindicating slot-based coding), and that underpinning both our reading and word recognition processes is the flexibility of our information-gathering mechanisms
    corecore