    Video coding based on fractals and sparse representations

    Orientador: Hélio PedriniDissertação (mestrado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Vídeos são sequências de imagens estáticas representando cenas em movimento. Transmitir e armazenar essas imagens sem nenhum tipo de pré-processamento necessitaria de enormes larguras de banda nos canais de comunicação e uma quantidade massiva de espaço de armazenamento. A fim de reduzir o número de bits necessários para tais dados, foram criados métodos de compressão com perda. Esses métodos geralmente consistem em um codificador e um decodificador, tal que o codificador gera uma sequência de bits que representa uma aproximação razoável do vídeo através de um formato pré-especificado e o decodificador lê essa sequência, convertendo-a novamente em uma série de imagens. A transmissão de vídeos sob restrições extremas de largura de banda tem aplicações importantes como videoconferências e circuitos fechados de televisão. Neste trabalho são abordados dois métodos destinados a essa aplicação, decomposição usando representações esparsas e compressão fractal. A ampla maioria dos codificadores tem como mecanismo principal o uso de transformações inversíveis capazes de representar imagens espacialmente suaves com poucos coeficientes não-nulos. Representações esparsas são uma generalização dessa ideia, em que a transformação tem como base um conjunto cujo número de elementos excede a dimensão do espaço vetorial onde ela opera. A projeção dos dados pode ser feita a partir de uma heurística rápida chamada Matching Pursuit. Uma abordagem combinando essa heurística com um algoritmo para gerar a base sobrecompleta por aprendizado de máquina é apresentada. Codificadores fractais representam uma aproximação da imagem como um sistema de funções iterativas. Para isso, criam e transmitem uma sequência de comandos, chamada colagem, capazes de obter uma representação da imagem na escala original dada a mesma imagem em uma escala reduzida. A colagem é criada de tal forma que, se aplicada a uma imagem inicial qualquer repetidas vezes, reduzindo sua escala antes de toda iteração, converge em uma aproximação da imagem codificada. Métodos simplificados e rápidos para a criação da colagem e uma generalização desses métodos para a compressão de vídeos são apresentados. Ao invés de construir a colagem tentando mapear qualquer bloco da escala reduzida na escala original, apenas um conjunto pequeno de blocos é considerado. O método de compressão proposto para vídeos agrupa um conjunto de quadros consecutivos do vídeo em um fractal volumétrico. A colagem mapeia blocos tridimensionais entre as escalas, considerando uma escala menor tanto no tempo quanto no espaço. Uma adaptação desse método para canais de comunicação cuja largura de banda é instável também é propostaAbstract: A video is a sequence of still images representing scenes in motion. A video is a sequence of extremely similar images separated by abrupt changes in their content. If these images were transmitted and stored without any kind of preprocessing, this would require a massive amount of storage space and communication channels with very high bandwidths. Lossy compression methods were created in order to reduce the number of bits used to represent this kind of data. These methods generally consist in an encoder and a decoder, where the encoder generates a sequence of bits that represents an acceptable approximation of the video using a certain predefined format and the decoder reads this sequence, converting it back into a series of images. Transmitting videos under extremely limited bandwidth has important applications in video conferences or closed-circuit television systems. Two different approaches are explored in this work, decomposition based on sparse representations and fractal coding. Most video coders are based on invertible transforms capable of representing spatially smooth images with few non-zero coeficients. Sparse representations are a generalization of this idea using a transform that has an overcomplete dictionary as a basis. Overcomplete dictionaries are sets with more elements in it than the dimension of the vector space in which the transform operates. The data can be projected into this basis using a fast heuristic called Matching Pursuits. A video encoder combining this fast heuristic with a machine learning algorithm capable of constructing the overcomplete dictionary is proposed. Fractal encoders represent an approximation of the image through an iterated function system. In order to do that, a sequence of instructions, called a collage, is created and transmitted. The collage can construct an approximation of the original image given a smaller scale version of it. It is created in such a way that, when applied to any initial image several times, contracting it before each iteration, it converges into an approximation of the encoded image. Simplier and faster methods for creating a collage and a generalization of these methods to video compression are presented. Instead of constructing a collage by matching any block from the smaller scale to the original one, a small subset of possible matches is considered. The proposed video encoding method creates groups of consecutive frames which are used to construct a volumetric fractal. The collage maps tridimensional blocks between the different scales, using a smaller scale in both space and time. An improved version of this algorithm designed for communication channels with variable bandwidth is presentedMestradoCiência da ComputaçãoMestre em Ciência da Computaçã

    Low-power CMOS digital-pixel Imagers for high-speed uncooled PbSe IR applications

    This PhD dissertation describes the research and development of a new low-cost medium wavelength infrared MWIR monolithic imager technology for high-speed uncooled industrial applications. It takes the baton on the latest technological advances in the field of vapour phase deposition (VPD) PbSe-based medium wavelength IR (MWIR) detection accomplished by the industrial partner NIT S.L., adding fundamental knowledge on the investigation of novel VLSI analog and mixed-signal design techniques at circuit and system levels for the development of the readout integrated device attached to the detector. The work supports on the hypothesis that, by the use of the preceding design techniques, current standard inexpensive CMOS technologies fulfill all operational requirements of the VPD PbSe detector in terms of connectivity, reliability, functionality and scalability to integrate the device. The resulting monolithic PbSe-CMOS camera must consume very low power, operate at kHz frequencies, exhibit good uniformity and fit the CMOS read-out active pixels in the compact pitch of the focal plane, all while addressing the particular characteristics of the MWIR detector: high dark-to-signal ratios, large input parasitic capacitance values and remarkable mismatching in PbSe integration. In order to achieve these demands, this thesis proposes null inter-pixel crosstalk vision sensor architectures based on a digital-only focal plane array (FPA) of configurable pixel sensors. Each digital pixel sensor (DPS) cell is equipped with fast communication modules, self-biasing, offset cancellation, analog-to-digital converter (ADC) and fixed pattern noise (FPN) correction. In-pixel power consumption is minimized by the use of comprehensive MOSFET subthreshold operation. The main aim is to potentiate the integration of PbSe-based infra-red (IR)-image sensing technologies so as to widen its use, not only in distinct scenarios, but also at different stages of PbSe-CMOS integration maturity. For this purpose, we posit to investigate a comprehensive set of functional blocks distributed in two parallel approaches: • Frame-based “Smart” MWIR imaging based on new DPS circuit topologies with gain and offset FPN correction capabilities. This research line exploits the detector pitch to offer fully-digital programmability at pixel level and complete functionality with input parasitic capacitance compensation and internal frame memory. • Frame-free “Compact”-pitch MWIR vision based on a novel DPS lossless analog integrator and configurable temporal difference, combined with asynchronous communication protocols inside the focal plane. This strategy is conceived to allow extensive pitch compaction and readout speed increase by the suppression of in-pixel digital filtering, and the use of dynamic bandwidth allocation in each pixel of the FPA. In order make the electrical validation of first prototypes independent of the expensive PbSe deposition processes at wafer level, investigation is extended as well to the development of affordable sensor emulation strategies and integrated test platforms specifically oriented to image read-out integrated circuits. DPS cells, imagers and test chips have been fabricated and characterized in standard 0.15μm 1P6M, 0.35μm 2P4M and 2.5μm 2P1M CMOS technologies, all as part of research projects with industrial partnership. The research has led to the first high-speed uncooled frame-based IR quantum imager monolithically fabricated in a standard VLSI CMOS technology, and has given rise to the Tachyon series [1], a new line of commercial IR cameras used in real-time industrial, environmental and transportation control systems. The frame-free architectures investigated in this work represent a firm step forward to push further pixel pitch and system bandwidth up to the limits imposed by the evolving PbSe detector in future generations of the device.La present tesi doctoral descriu la recerca i el desenvolupament d'una nova tecnologia monolítica d'imatgeria infraroja de longitud d'ona mitja (MWIR), no refrigerada i de baix cost, per a usos industrials d'alta velocitat. El treball pren el relleu dels últims avenços assolits pel soci industrial NIT S.L. en el camp dels detectors MWIR de PbSe depositats en fase vapor (VPD), afegint-hi coneixement fonamental en la investigació de noves tècniques de disseny de circuits VLSI analògics i mixtes pel desenvolupament del dispositiu integrat de lectura unit al detector pixelat. Es parteix de la hipòtesi que, mitjançant l'ús de les esmentades tècniques de disseny, les tecnologies CMOS estàndard satisfan tots els requeriments operacionals del detector VPD PbSe respecte a connectivitat, fiabilitat, funcionalitat i escalabilitat per integrar de forma econòmica el dispositiu. La càmera PbSe-CMOS resultant ha de consumir molt baixa potència, operar a freqüències de kHz, exhibir bona uniformitat, i encabir els píxels actius CMOS de lectura en el pitch compacte del pla focal de la imatge, tot atenent a les particulars característiques del detector: altes relacions de corrent d'obscuritat a senyal, elevats valors de capacitat paràsita a l'entrada i dispersions importants en el procés de fabricació. Amb la finalitat de complir amb els requisits previs, es proposen arquitectures de sensors de visió de molt baix acoblament interpíxel basades en l'ús d'una matriu de pla focal (FPA) de píxels actius exclusivament digitals. Cada píxel sensor digital (DPS) està equipat amb mòduls de comunicació d'alta velocitat, autopolarització, cancel·lació de l'offset, conversió analògica-digital (ADC) i correcció del soroll de patró fixe (FPN). El consum en cada cel·la es minimitza fent un ús exhaustiu del MOSFET operant en subllindar. L'objectiu últim és potenciar la integració de les tecnologies de sensat d'imatge infraroja (IR) basades en PbSe per expandir-ne el seu ús, no només a diferents escenaris, sinó també en diferents estadis de maduresa de la integració PbSe-CMOS. En aquest sentit, es proposa investigar un conjunt complet de blocs funcionals distribuïts en dos enfocs paral·lels: - Dispositius d'imatgeria MWIR "Smart" basats en frames utilitzant noves topologies de circuit DPS amb correcció de l'FPN en guany i offset. Aquesta línia de recerca exprimeix el pitch del detector per oferir una programabilitat completament digital a nivell de píxel i plena funcionalitat amb compensació de la capacitat paràsita d'entrada i memòria interna de fotograma. - Dispositius de visió MWIR "Compact"-pitch "frame-free" en base a un novedós esquema d'integració analògica en el DPS i diferenciació temporal configurable, combinats amb protocols de comunicació asíncrons dins del pla focal. Aquesta estratègia es concep per permetre una alta compactació del pitch i un increment de la velocitat de lectura, mitjançant la supressió del filtrat digital intern i l'assignació dinàmica de l'ample de banda a cada píxel de l'FPA. Per tal d'independitzar la validació elèctrica dels primers prototips respecte a costosos processos de deposició del PbSe sensor a nivell d'oblia, la recerca s'amplia també al desenvolupament de noves estratègies d'emulació del detector d'IR i plataformes de test integrades especialment orientades a circuits integrats de lectura d'imatge. Cel·les DPS, dispositius d'imatge i xips de test s'han fabricat i caracteritzat, respectivament, en tecnologies CMOS estàndard 0.15 micres 1P6M, 0.35 micres 2P4M i 2.5 micres 2P1M, tots dins el marc de projectes de recerca amb socis industrials. Aquest treball ha conduït a la fabricació del primer dispositiu quàntic d'imatgeria IR d'alta velocitat, no refrigerat, basat en frames, i monolíticament fabricat en tecnologia VLSI CMOS estàndard, i ha donat lloc a Tachyon, una nova línia de càmeres IR comercials emprades en sistemes de control industrial, mediambiental i de transport en temps real.Postprint (published version

    More than meets the eye: the conceptual essence of intrinsic memorability

    In a world where sensory threads weave an endless tapestry of multi-modal data, the human brain stands as the masterful weaver of meaning. As we wade through this tempest of input, our brain spins these threads into an intelligible internal representation and holds on tight to what it deems important. But what, exactly, makes certain threads more important than others? And how can we predict their significance? Memorability is the tensile strength of the threads that tie us to the world. It is a proxy for human importance, indicating which threads the human brain will curate and retain with exceptional fidelity. This research investigates these multisensory threads by exploring the influence of audio, visual, and textual modalities on predicting video memorability, and how the interplay between them can influence the overall memorability of a given piece of content. The findings suggest that, while visual data may dominate our sensory experience, it is the underlying conceptual essence that truly holds the key to memorability. This thesis leverages state-of-the-art image synthesis techniques to distill and examine this essence, creating surrogate dreams of video scenes to facilitate the disentanglement of conceptual and perceptual elements of memorability. The work also leverages human EEG data to explore the possibility of a moment of memorability—a moment of encoding that corresponds to a remembering moment—which we expect to exist due to the temporal nature of the world and the natural encoding limits of our brains. The previously murky relationship between the two core means of remembrance---recognition and recall---are reconciled by conducting a novel video memorability drawing task. The research sheds new light on the nature of multi-modal memorability, providing a deeper understanding of how our brain processes and retains information in a complex sensory world. By uncovering the conceptual essence that lies at the heart of memorability, it opens up new avenues for predicting and curating more meaningful media content, and ultimately deepen our connection to the world around us

    Linguistic Competence and New Empiricism in Philosophy and Science

    The topic of this dissertation is the nature of linguistic competence, the capacity to understand and produce sentences of natural language. I defend the empiricist account of linguistic competence embedded in the connectionist cognitive science. This strand of cognitive science has been opposed to the traditional symbolic cognitive science, coupled with transformational-generative grammar, which was committed to nativism due to the view that human cognition, including language capacity, should be construed in terms of symbolic representations and hardwired rules. Similarly, linguistic competence in this framework was regarded as being innate, rule-governed, domain-specific, and fundamentally different from performance, i.e., idiosyncrasies and factors governing linguistic behavior. I analyze state-of-the-art connectionist, deep learning models of natural language processing, most notably large language models, to see what they can tell us about linguistic competence. Deep learning is a statistical technique for the classification of patterns through which artificial intelligence researchers train artificial neural networks containing multiple layers that crunch a gargantuan amount of textual and/or visual data. I argue that these models suggest that linguistic competence should be construed as stochastic, pattern-based, and stemming from domain-general mechanisms. Moreover, I distinguish syntactic from semantic competence, and I show for each the ramifications of the endorsement of a connectionist research program as opposed to the traditional symbolic cognitive science and transformational-generative grammar. I provide a unifying front, consisting of usage-based theories, a construction grammar approach, and an embodied approach to cognition to show that the more multimodal and diverse models are in terms of architectural features and training data, the stronger the case is for the connectionist linguistic competence. I also propose to discard the competence vs. performance distinction as theoretically inferior so that a novel and integrative account of linguistic competence originating in connectionism and empiricism that I propose and defend in the dissertation could be put forward in scientific and philosophical literature

    Human Computation and Human Subject Tasks in Social Network Playful Applications

    Universal connectivity has made crowdsourcing - an online activity of a crowd toward the completion of a goal requested by someone in an open call - possible. The question rises whether users can be motivated to perform those tasks by intrinsic rather than extrinsic factors (money, valuables). The current work explores the gamification approach in order to appeal to the intrinsic motivation of players Namely, instead of bringing the serious task into the major focus of the contributors, it proposes to use storytelling and playful metaphors as the elements that can mask the serious tasks and at the same time may attract the attention of potential contributors. Furthermore, it explores the possibilities of constructing such system as social network playful applications and employs Facebook as a distribution platform. The results demonstrate a positive feedback of the players. Identified are also differences in female and male players' attitudes, which gives space for a deeper research of the players' profiling and motivation in the future

    ESCOM 2017 Book of Abstracts

    Circuits to control--learning engineering by designing LEGO robots

    Thesis (Ph. D.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 1994.Includes bibliographical references (leaves 251-255).by Fred Garth Martin.Ph.D

    'Communicating adventure' : a semiotic investigation of the UK adventure subculture of motorcycling consumption

    Changing cultural trends and increasing pressures and constraints on everyday life have led to a proliferation in the uptake of adventure pursuits in Western society. People are increasingly drawn to involvement in subcultures of high-risk extremity and adventure, and manufacturers, marketers and the media are commonly reflecting a discourse that ‘commodifies’ adventure experience in their wider cultural products and brands. This growth in the consumption of adventure has created an opportunity, and a necessity, for researchers, academics and practitioners alike to become involved in the development of adventure-leisure research and theory. This study takes the UK motorcycling subculture of adventure consumption as a unit of analysis, and employs a ‘holistic’ cultural approach to investigate meaningful consumption processes within, and relative to it. Specifically, it focuses on the role of consumers in contributing to the cultural world of motorcycling adventure consumption as well as the significance of manufacturers, service suppliers and marketers in producing and conveying it. This is achieved through employment of an ‘interpretive semiology’ research philosophy, in which a number of pioneering semiotic and narrative techniques are used and developed, to identify the key communication codes and myths that drive the construction and movement of meaning within, and relative to this consumption subculture. An ‘outside in’ approach is employed to understand the subculture from a wide crosssection of related discourse, and this is combined with an ‘inside-out’ approach, which focuses on the motorcyclist consumer psyche, on consumer involvement in motorcycling activity and use of signifying props, spaces and stories for the construction and signification of meaningful motorcyclist self-identity. Also this approach examines the role of manufacturers, service suppliers and marketers in constructing and signifying brands that purvey cultural messages and construct categories of motorcycling subculture. The results highlight that although UK motorcycling adventure subculture is enshrined with a very rich cultural heritage, it is dynamic in nature, and cultural changes can be identified by analysis of key cultural communication codes and myths. These codes and myths are influenced, and driven, by an interrelationship that exists between consumers, manufacturers, service suppliers, marketers and wider popular cultural discourse and media. They all exist in the same culturally constituted world and meaning is generated and signified through common market places and market stimuli. Overall, this study provides a contribution to adventure-leisure and interpretive, cultural consumer behaviour research and it employs and develops pioneering semiotic and narrative methodologies. It demonstrates how the field of semiotics, with rich theoretical and sometimes complicated underpinnings, can be applied in this context to achieve significant theoretical and practical implications.EThOS - Electronic Theses Online ServiceGBUnited Kingdo


    PSA 2018

