9 research outputs found
Sequential Semantic Generative Communication for Progressive Text-to-Image Generation
This paper proposes new framework of communication system leveraging
promising generation capabilities of multi-modal generative models. Regarding
nowadays smart applications, successful communication can be made by conveying
the perceptual meaning, which we set as text prompt. Text serves as a suitable
semantic representation of image data as it has evolved to instruct an image or
generate image through multi-modal techniques, by being interpreted in a manner
similar to human cognition. Utilizing text can also reduce the overload
compared to transmitting the intact data itself. The transmitter converts
objective image to text through multi-model generation process and the receiver
reconstructs the image using reverse process. Each word in the text sentence
has each syntactic role, responsible for particular piece of information the
text contains. For further efficiency in communication load, the transmitter
sequentially sends words in priority of carrying the most information until
reaches successful communication. Therefore, our primary focus is on the
promising design of a communication system based on image-to-text
transformation and the proposed schemes for sequentially transmitting word
tokens. Our work is expected to pave a new road of utilizing state-of-the-art
generative models to real communication systemsComment: 4 pages, 2 figures, to be published in IEEE International Conference
on Sensing, Communication, and Networking, Workshop on Semantic Communication
for 6G (SC6G-SECON23
Generative AI Meets Semantic Communication: Evolution and Revolution of Communication Tasks
While deep generative models are showing exciting abilities in computer
vision and natural language processing, their adoption in communication
frameworks is still far underestimated. These methods are demonstrated to
evolve solutions to classic communication problems such as denoising,
restoration, or compression. Nevertheless, generative models can unveil their
real potential in semantic communication frameworks, in which the receiver is
not asked to recover the sequence of bits used to encode the transmitted
(semantic) message, but only to regenerate content that is semantically
consistent with the transmitted message. Disclosing generative models
capabilities in semantic communication paves the way for a paradigm shift with
respect to conventional communication systems, which has great potential to
reduce the amount of data traffic and offers a revolutionary versatility to
novel tasks and applications that were not even conceivable a few years ago. In
this paper, we present a unified perspective of deep generative models in
semantic communication and we unveil their revolutionary role in future
communication frameworks, enabling emerging applications and tasks. Finally, we
analyze the challenges and opportunities to face to develop generative models
specifically tailored for communication systems.Comment: Under consideration in IEEE Network Special Issue "The Interplay
Between Generative AI and 5G-Advanced toward 6G
Image synthesis based on a model of human vision
Modern computer graphics systems are able to construct renderings of such high quality that viewers are deceived into regarding the images as coming from a photographic source. Large amounts of computing resources are expended in this rendering process, using complex mathematical models of lighting and shading.
However, psychophysical experiments have revealed that viewers only regard certain informative regions within a presented image. Furthermore, it has been shown that these visually important regions contain low-level visual feature differences that attract the attention of the viewer.
This thesis will present a new approach to image synthesis that exploits these experimental findings by modulating the spatial quality of image regions by their visual importance. Efficiency gains are therefore reaped, without sacrificing much of the perceived quality of the image. Two tasks must be undertaken to achieve this goal. Firstly, the design of an appropriate region-based model of visual importance, and secondly, the modification of progressive rendering techniques to effect an importance-based rendering approach.
A rule-based fuzzy logic model is presented that computes, using spatial feature differences, the relative visual importance of regions in an image. This model improves upon previous work by incorporating threshold effects induced by global feature difference distributions and by using texture concentration measures.
A modified approach to progressive ray-tracing is also presented. This new approach uses the visual importance model to guide the progressive refinement of an image. In addition, this concept of visual importance has been incorporated into supersampling, texture mapping and computer animation techniques. Experimental results are presented, illustrating the efficiency gains reaped from using this method of progressive rendering.
This visual importance-based rendering approach is expected to have applications in the entertainment industry, where image fidelity may be sacrificed for efficiency purposes, as long as the overall visual impression of the scene is maintained. Different aspects of the approach should find many other applications in image compression, image retrieval, progressive data transmission and active robotic vision
Local Features to a Global View: Recognition of Occluded Objects by Spectral Matching Using Pairwise Feature Relationships
Ph.DDOCTOR OF PHILOSOPH
Out of this word: the effect of parafoveal orthographic information on central word processing
The aim of this thesis is to investigate the effect of parafoveal information on central
word processing. This topic impacts on two controversial areas of research: the
allocation of attention during reading, and letter processing during word recognition.
Researchers into the role of attention during reading are split into two camps, with some
believing that attention is allocated serially to consecutive words and others that it is
spread across multiple words in parallel. This debate has been informed by the results of
recent experiments that test a key prediction of the parallel processing theory that
parafoveal and foveal processing occur concurrently. However, there is a gap in the
literature for tightly-controlled experiments to further test this prediction. In contrast, the
study of the processing that letters undergo during word recognition has a long history,
with many researchers concluding that letter identity is processed only conjointly with
letter ‘slot’ position within a word, known as ‘slot-based’ coding. However, recent
innovative studies have demonstrated that more word priming is produced from prime
letter strings containing letter transpositions than from primes containing letter
substitutions, although this work has not been extended to parafoveal letter prime
presentations. This thesis will also discuss the neglected subject of how research into
these separate topics of text reading and isolated word recognition can be integrated via
parafoveal processing.
It presents six experiments designed to investigate how our responses to a central word
are affected by varying its relationship with simultaneously presented parafoveal
information. Experiment 1 introduced the Flanking Letters Lexical Decision task in
which a lexical decision was made to words flanked by bigrams either orthographically
related or unrelated to the response word; the results indicated that there is parafoveal
orthographic priming but did not support the ‘slot-based’ coding theory as letter order
was unimportant. Experiments 2-4 involved eye-tracking of participants who read
sentences containing a boundary change that allowed the presentation of an
orthographically related word in parafoveal vision. Experiment 2 demonstrated that an orthographically related word at position n+1 reduces first-pass fixations on word n,
indicating parallel processing of these words. Experiment 4 replicated this result, and
also showed that altering the letter identity of word n+1 reduced orthographic priming
whereas altering letter order did not, indicating that slot-based coding of letters does not
occur during reading. However, Experiment 3 found that an orthographically related
word presented at position n-1 did not prime word n, signifying the influence of reading
direction on parafoveal processing. Experiment 5 investigated whether the parallel
processing that words undergo during text reading conditions our representations of
isolated words; lexical decision times to words flanked by bigrams that formed plausible
or implausible contexts did not differ. Lastly, one possible cause of the reading disorder
dyslexia is under- or over- processing of parafoveal information. Experiment 6 therefore
replicated Experiment 1 including a sample of dyslexia sufferers but found no
interaction between reading ability and parafoveal processing. Overall, the results of this
thesis lead to the conclusion that there is extensive processing of parafoveal information
during both reading (indicating parallel processing) and word recognition (contraindicating
slot-based coding), and that underpinning both our reading and word
recognition processes is the flexibility of our information-gathering mechanisms