59 research outputs found

    Human-artificial intelligence approaches for secure analysis in CAPTCHA codes

    Get PDF
    CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) has long been used to keep automated bots from misusing web services by leveraging human-artificial intelligence (HAI) interactions to distinguish whether the user is a human or a computer program. Various CAPTCHA schemes have been proposed over the years, principally to increase usability and security against emerging bots and hackers performing malicious operations. However, automated attacks have effectively cracked all common conventional schemes, and the majority of present CAPTCHA methods are also vulnerable to human-assisted relay attacks. Invisible reCAPTCHA and some approaches have not yet been cracked. However, with the introduction of fourth-generation bots accurately mimicking human behavior, a secure CAPTCHA would be hardly designed without additional special devices. Almost all cognitive-based CAPTCHAs with sensor support have not yet been compromised by automated attacks. However, they are still compromised to human-assisted relay attacks due to having a limited number of challenges and can be only solved using trusted devices. Obviously, cognitive-based CAPTCHA schemes have an advantage over other schemes in the race against security attacks. In this study, as a strong starting point for creating future secure and usable CAPTCHA schemes, we have offered an overview analysis of HAI between computer users and computers under the security aspects of open problems, difficulties, and opportunities of current CAPTCHA schemes.Web of Science20221art. no.

    Be(ing)Dazzled: Living in machinima

    Get PDF
    In the creative function of language non-truth or less-than-truth is, we have seen, a primary device. The relevant framework is not one of morality but of survival. At one level, from brute camouflage to poetic vision, the linguistic capacity to conceal, misinform, leave ambiguous, hypothesize, invent, is indispensable to the equilibrium of human consciousness and to the development of mankind in society. ~ George Steiner (1998, 239) Consider as an aspect of the avant-garde art cinema movement (as characterized by P. Adams Sitney (1979) and represented by artists such as Maya Deren, Stan Brakhage, Hollis Frampton etc) the denormativizing of cinema. These artists/filmmakers took accruing cinematic sensibilities and built alternative artifices that evaded, glancingly addressed, or completely ignored pluralistic Being Dazzled: Living in Machinima by Sheldon Brown, from Understanding Machinima, ed. Jenn Ng, Continuum Press 2012. 2 cinema. Works had glaring apparency, enacting their own type of dazzle, such that some were difficult to watch while still commanding attention-i.e. http://youtu.be/mTGdGgQtZic. Their status as cinematic spectacle could elicit thoughtfulness in an age when one cultural act occurred at a time. Now the focused attention that these avant-garde strategies required might not stand a chance. Michael Snow's Wavelength (1967) https://www.youtube.com/watch?v=aBOzOVLxbCE becomes another ambient channel of media decoration, helping to illuminate our smartphone keyboards while we catch up on our Facebook posts (at least until it is re-scripted: http://youtu.be/AhN9RS60QRc). We aren't teasingly vexed by its set-up and inevitable conclusion, attentive to occurrences that might undermine its own bet, such as the changing of film in the camera when it runs out. We used to go along with this (at least some would). Those who did had a belief, or at least a hope, that their efforts might have a revelatory reward. It was a cinematic aerobic workout right at our lactic acid threshold of attention span, spurring new questions about a changing relationship to cinema. The mid-20 th century's media artist's agency was to be a radical alternative to an aesthetically and ideologically conservative, slow-moving and entrenched status quo of mass culture. But with the digitization of media forms, the invention of new mediums is as available as sites of action for the artist as existing forms were to their content. Contemporary media art production now has the creation of its form as an aspect of its stakes (even when this isn't undertaken, it is still present as an option not exercised). What might be the possibilities that media artists have when new mediums, such as machinima, emerge? How does the cultural role of this nascent (and possibly transient) form inform the types of engagements that the artist might employ? Machinima has already achieved a certain level of codification such that aficionados recognize it when they see it. It operates as a sub-cultural expression -which can be simply At least, that is how we came to understand the development of media forms in their pre-digital days when they emerged every few decades or centuries. Now the digitization of media production and distribution brings a continuous proliferation and hybridization of media forms. So what is the use of concocting the neologism "machinima" right now? Can there be a productive tension in having some common, recognizable forms of its practice, along with an imaginative and expansive application of its underlying methods? Cinema as a concept has proven to be a useful idea, both allowing for the development of complex semantic methods that have broad cultural legibility as well as providing a platform for gestures that radically reconsider the basis by which its underlying qualities can be employed for meaning making. In the history of cinema, we have seen this take place through activities that build its methods from within its normative tradition as well as by Art provocations towards a messier future Whereas examples from mainstream cinema display anxiety about cultural developments which might obviate its usefulness, works of art I created around the same time were identifying a liminal space between cinema and virtual worlds as a zone of generative tension. This attitude differs from the situation of the post-war avant-garde film makers mentioned above, in that I was anticipating a transformation (or an end) of the current state of cinema, and my gestures were stabs at making something from its aftermath. The artwork was not taking a dialectical stance between cinema and art, but its verve arose from the dissonance of cinema's transformations by digital processes. These early artworks undertook concerns about what would become part of the more general phenomena of machinima. It was not the purpose of these works to predict, describe or specifically develop machinima; rather, they addressed cultural developments that would later prove to be aspects of machinima, such as how the ubiquity of cinema has blurred distinctions and created new possibilities for the roles of spectator, actor and creator, and the legacy of antecedent cinematic machines to machinima's gestalt. As cinema becomes a native digital medium rather than merely a form translated into digital methods, fundamental changes likewise occur in its ontology: what is cinematic representation and who are we as viewers, creators and participants in cinema? This is an expansion of the typical use of the category of machinima as movies made with video game engines, whose mode, methods and implicit semantics can seem to narrow its purview to an examination of being that arises from video games or virtual space. Instead, it might be useful to see machinima as a sense-making schema operating in a manner that isn't confined by virtual realms, but is attenuated to our general state as inhabitants of physical and virtual worlds which are both scripted and each day become increasingly intertwined. The concept of machinima revises relationships between authorship, viewer and cultural artifact, providing an expression of the complex agency we have in a post-cinematic world in which we exist in coded spaces -a condition that emerged in the late 20 th century and that is now pervasive. Art anticipating machinima In the work "MetaStasis/MediaStatic" (1989), http://www.sheldonbrown.net/metastasis/index.html emergent computational affordances operate as a cybernetic kludge onto the apparatus of the cinematic mundane, giving a new choreography of the world. A "home-made" video projector, built using a black and white TV tube, a lens, structural tubing, cement and motors, is spun at 300 rpm in an immersive "video-shack". A digital control system choreographs the elements in the installation, exploiting our perceptual biology to take apart and re-assemble the phenomenon of television such that it becomes an all-encompassing field of imagery which the viewer becomes a part of. At the core of this image space is the ominous whir of a new cinematic machine -a physical manifestation of a transforming cinema getting more pervasive and ubiquitous as it becomes fodder for digitally-based processes. The intent of the "Metastasis/Mediastatic" artwork is to make this cultural transformation palpable via the atmospheric disturbance created by the vortex of the machine and the distortions of the architectural environment that is necessitated to create the effects. The cinematic narrative of "MetaStasis/MediaStatic" is a product of automated editing. Its compositional method is an algorithmic cut-up of the found objects of tele-cinematic broadcast Eisenstein. The pacing of its edits starts at a pace that is familiar to any channel surfer, but ramps up to a fervor where it becomes part of the logic of the time of the frame. This speedrun through the channel space ascends to produce the climax of one's time in the "MetaStasis/MediaStatic" video shack, before ejecting the viewers out through its automated portal. Montage as cinema machine In "MetaStasis/MediaStatic", the exterior of the immersive chamber suggests that there are implications to the ubiquity of mediation. A follow-up work, "The Vorkapitchulator" Being Dazzled: Living in Machinima by Sheldon Brown, from Understanding Machinima, ed. Jenn Ng, Continuum Press 2012. 8 scout handbooks, with the expected bias found in each: the girls are encouraged to co-operate while the boys are spurred on to compete. The zoetrope is a frame by frame breakdown of gender reassignment surgery, while the exercise machine is the device to reshape one's physical bodyeach of the elements speaks to an aspect of the ways we are already invented and re-invented, while our digital identity presents us with radical new fluidity in these processes. This re-use of cinematic assets gives us an idea about the coming of machinima, in which new scripts are authored with given assets of a virtual world. While new machinimatic narratives will range from those closely related to or spun off from the original game engine to those that are surrealistically orthogonal, the use of the asset or code base creates an inescapable relationship to its original. Contemporary cinema still uses the montage sequence, albeit often with less dependency on the visual surrealism of Vorkapich, although the sequence of "Gutterballs" in The Big Lebowski (1998) is a notable homage http://youtu.be/mHAGbD3dlhE, but also with Experience has achieved an optimal flow (Csikszentmihalyi 1990), and the confines of the typical narrative structure are no longer required. The Vorkapitchulator is thus a cinematic machine, producing digital cinema by physically manifesting the tropes of digital cinema (3D graphic logos, morphing image sequences, 3D stereography, interactive interfaces, computer controlled camera choreography and procedurally generated graphics, to name a few) and capturing them with analog, robotic video cameras. This shifting of the site for the (then) digitization process, where it is invisibly contained within the image to an embodied apparatus which is viscerally felt, can be seen as a way of pointing to the broader condition that machine cinema was likely to accelerate -namely, the cinematization of experience, or our search for peak experience when everything seems to have that "like I was in a movie" flow, or a perceived aesthetic order or narrative structure that is typically absent or wanting in the everyday. The montage sequence's collapse of temporality and spatiality into a collage of associative image sequences, often rhythmically paired with music or a narrative voice over, gives us the cultural template of this desire. If we've strived to live life as if we are in a movie, we have also increasingly made our world into a site of cinematic apparatuses. The car radio and the personal portable stereo were perhaps some Being Dazzled: Living in Machinima by Sheldon Brown, from Understanding Machinima, ed. Jenn Ng, Continuum Press 2012. 10 of the first cinematic experiential generation systems, and now we have screens everywhere in the world, cladding the sides of our buildings, embedded in the furniture of our cars and airplanes, and carried in our pockets on various mobile devices, some of which are still referred to anachronistically as "phones". We create mental montages as we drive around the streets or ride the subway or jog down the beach. Our (day)dreams locate us as the star in a solipsistic, automatic and pervasive theater, forming a transcendent relationship to the world that we glide through to our own supercharged soundtracks, living our own "last days as a wiseguy" http://youtu.be/sJQ8tjroAfE with the thinnest of artifice required. We re-frame our own movements through the world by the application of these templates to cohere and manufacture a meaningful experience of living in the world. Tap (1984) to Jersey Shore (2009), these winking odes to the cinematic are testament to our willing our complicity to deception as long as it is entertaining. It may even reveal some essential aspect of mimicry and socialization in our biological capacity and need for empathetic experience. We might consider this kind of reflective logic as an aspect of contemporary machinima to see how its mirror reflects on aspects of our condition. A function of machinima is arguably its mediation of virtual experience to human consciousness. How are we to consider the experiences of the virtual? Do we know when we are the reader, viewer or product of culture? Are virtual realms sites where we have vital experiences? Or is there some new hybrid of being and reading, creating and being consumed which is taking place here? Looking at machinima from perspectives of aesthetics and method, we can see a what is now recognized as machinima is due to a dissonance between things that look like video games but act like cinema, an equation that, so far, lacks commutativity -it doesn't work in reverse -but this visual signifier might just be a temporary form. Digital image aesthetics in machinima and cinema Aspects of computer games and virtual space seem to beg to be considered cinema -they are experienced on similar screens and they can use similar pictorial, temporal and auditory compositions. Yet in the mainstream of both these forms, they create experiences which are best not compared. Thinking about video games as cinema is as useful as thinking about music as literature. There might be some parallels, but they are generally aimed at different things. This difference gives clarity to machinima as distinct from the video game. Machinima is considered on most aspects similar to any other cinematic form, or as a pastiche of a cinematic model. We recognize its form as being "like a sitcom" or "like a movie", but machinima gets its profile by its distinction from these media. It is like an episode of Friends (1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004) After all, cinema is now a digital medium -from movies to television, digital methods are the defacto norm in its production and distribution. The few pieces that are still occasionally shot on film do so as a kind of arcane affectation. While analog video has long been cheaper and quicker to produce than film-based cinema, digital cinema achieved what analog video never could -being indistinguishable from, and even improved over, chemical film. Digital cinema often has an aesthetic of invisibility: we aren't aware or concerned that the movie is shot on digital cameras, edited as digital files and sent on hard drives to movie theaters for liquid crystal panels to reflect onto movie screens. The aesthetic of digital cinema is mostly that of photo or hyper-realismmaking the real more idealized or more fantastic, but within a vocabulary of realism. When digital extensions are added to movies, they often extend or blend into a photographic record. They don't carry the entirety of the representation on their own basis as elements that may have a digital aesthetic, but are cast in the light of elements such as the visual authority of human forms. This is all delivered by the interaction between many algorithms and pieces of data (which is usually in the form of images that undergo significant transformations). For instance, the way which a piece of metal looks -its reflective qualities, its diffusion of light, its surface texture -are all described in algorithms that have been developed to provide a visual simulation of how metal often looks in It can be the case that algorithms exist for computing a particular visual characteristic to produce a more "naturalistic" result than those used in a game, but often with computational cost that will be too high for viable use in an interactive graphic. These types of algorithms were first implemented in renderers for use in producing non-interactive computer graphics for movies (for instance, the digital character of Gollum in the Lord of the Rings (2001Rings ( -2003 movies has skin that is rendered using a photon mapping technique (Jensen 2001)). Versions of these algorithms, in turn, make their way into games when the ongoing speed of computers meets the re-engineering of a more efficient algorithm. These digital cinema processes are developed with the impulse to make the fantastic more believable -the production of a contemporary film as a collage of dozens of separate files into the final frame -or the ordinary just a bit more idealized, retouching images with a mark that is finer than the final resolution of the image. The flip side of this "naturalistic" digital cinema is the 3D computer animated movie. This form has developed its genre tropes: toylike aesthetics of big eyes, large heads, and children's stories from Toy Story (1995) to Avatar (2010) -simple tales of good and bad, loss and underdog heroics. On the other hand, machinima, for the time being at least, trades on its aesthetic by clearly evoking the synthetic and digital world of the video game. Digital bits are not put in the service of extending the illusory cinematic veil, but instead celebrate the artificial realm of the algorithm. More than that, the visual signifiers place the works in a particular technological moment or situate the form in relation to latent readings found in a particular platform. We can usually pin down the date of most machinima productions to a few years around the release of a particular game Consider this aspect of machinima as a contrary sensibility to digital cinema's industrydominated aesthetic urges. While the game technology industry touts its latest progress towards some platonic notion of photorealism, machinima utilizes the visual artifacts of the game engine as a necessary signifier of the work, even as it no longer requires the computational dependency on real-time game engine rendering. Machinima productions could be rendered by a different renderer than the one in which the actions are captured and saved (described below in work that I've done). The visual aesthetic thus deliberately points to the initial game engine and often engages disjunctions of narrative content and visual form. It disrupts the "progress myth" of the game industry, which equates improvements in the medium of video games to achievements in photorealistic aesthetics. Here the visual vocabulary embraces the artifacts at hand as significant elements of the vocabulary of the form. If we reflect on the gestures of the avant-garde film makers which began this discussion, we see how visible sprocket holes, overexposed film, scratches and "mishandled" film stock produced a new vocabulary for cinema that was eventually taken into and extended the gestures of pluralistic cinema. By mining what are often considered to be the shortcomings of the visual forms produced by game engines, paired with radically different content, we get to see how far that visual language can be stretched, and what happens when i

    Toward Robust Video Event Detection and Retrieval Under Adversarial Constraints

    Get PDF
    The continuous stream of videos that are uploaded and shared on the Internet has been leveraged by computer vision researchers for a myriad of detection and retrieval tasks, including gesture detection, copy detection, face authentication, etc. However, the existing state-of-the-art event detection and retrieval techniques fail to deal with several real-world challenges (e.g., low resolution, low brightness and noise) under adversary constraints. This dissertation focuses on these challenges in realistic scenarios and demonstrates practical methods to address the problem of robustness and efficiency within video event detection and retrieval systems in five application settings (namely, CAPTCHA decoding, face liveness detection, reconstructing typed input on mobile devices, video confirmation attack, and content-based copy detection). Specifically, for CAPTCHA decoding, I propose an automated approach which can decode moving-image object recognition (MIOR) CAPTCHAs faster than humans. I showed that not only are there inherent weaknesses in current MIOR CAPTCHA designs, but that several obvious countermeasures (e.g., extending the length of the codeword) are not viable. More importantly, my work highlights the fact that the choice of underlying hard problem selected by the designers of a leading commercial solution falls into a solvable subclass of computer vision problems. For face liveness detection, I introduce a novel approach to bypass modern face authentication systems. More specifically, by leveraging a handful of pictures of the target user taken from social media, I show how to create realistic, textured, 3D facial models that undermine the security of widely used face authentication solutions. My framework makes use of virtual reality (VR) systems, incorporating along the way the ability to perform animations (e.g., raising an eyebrow or smiling) of the facial model, in order to trick liveness detectors into believing that the 3D model is a real human face. I demonstrate that such VR-based spoofing attacks constitute a fundamentally new class of attacks that point to a serious weaknesses in camera-based authentication systems. For reconstructing typed input on mobile devices, I proposed a method that successfully transcribes the text typed on a keyboard by exploiting video of the user typing, even from significant distances and from repeated reflections. This feat allows us to reconstruct typed input from the image of a mobile phone’s screen on a user’s eyeball as reflected through a nearby mirror, extending the privacy threat to include situations where the adversary is located around a corner from the user. To assess the viability of a video confirmation attack, I explored a technique that exploits the emanations of changes in light to reveal the programs being watched. I leverage the key insight that the observable emanations of a display (e.g., a TV or monitor) during presentation of the viewing content induces a distinctive flicker pattern that can be exploited by an adversary. My proposed approach works successfully in a number of practical scenarios, including (but not limited to) observations of light effusions through the windows, on the back wall, or off the victim’s face. My empirical results show that I can successfully confirm hypotheses while capturing short recordings (typically less than 4 minutes long) of the changes in brightness from the victim’s display from a distance of 70 meters. Lastly, for content-based copy detection, I take advantage of a new temporal feature to index a reference library in a manner that is robust to the popular spatial and temporal transformations in pirated videos. My technique narrows the detection gap in the important area of temporal transformations applied by would-be pirates. My large-scale evaluation on real-world data shows that I can successfully detect infringing content from movies and sports clips with 90.0% precision at a 71.1% recall rate, and can achieve that accuracy at an average time expense of merely 5.3 seconds, outperforming the state of the art by an order of magnitude.Doctor of Philosoph

    2020 Huskies Showcase Abstracts

    Get PDF
    The 2020 Huskies Showcase abstracts are arranged in the following order: Applied Experience Displays; Artistic Performances; Demonstrations; Gallery Exhibits; Oral Presentations; Poster Presentations

    Face recognition using statistical adapted local binary patterns.

    Get PDF
    Biometrics is the study of methods of recognizing humans based on their behavioral and physical characteristics or traits. Face recognition is one of the biometric modalities that received a great amount of attention from many researchers during the past few decades because of its potential applications in a variety of security domains. Face recognition however is not only concerned with recognizing human faces, but also with recognizing faces of non-biological entities or avatars. Fortunately, the need for secure and affordable virtual worlds is attracting the attention of many researchers who seek to find fast, automatic and reliable ways to identify virtual worlds’ avatars. In this work, I propose new techniques for recognizing avatar faces, which also can be applied to recognize human faces. Proposed methods are based mainly on a well-known and efficient local texture descriptor, Local Binary Pattern (LBP). I am applying different versions of LBP such as: Hierarchical Multi-scale Local Binary Patterns and Adaptive Local Binary Pattern with Directional Statistical Features in the wavelet space and discuss the effect of this application on the performance of each LBP version. In addition, I use a new version of LBP called Local Difference Pattern (LDP) with other well-known descriptors and classifiers to differentiate between human and avatar face images. The original LBP achieves high recognition rate if the tested images are pure but its performance gets worse if these images are corrupted by noise. To deal with this problem I propose a new definition to the original LBP in which the LBP descriptor will not threshold all the neighborhood pixel based on the central pixel value. A weight for each pixel in the neighborhood will be computed, a new value for each pixel will be calculated and then using simple statistical operations will be used to compute the new threshold, which will change automatically, based on the pixel’s values. This threshold can be applied with the original LBP or any other version of LBP and can be extended to work with Local Ternary Pattern (LTP) or any version of LTP to produce different versions of LTP for recognizing noisy avatar and human faces images

    Practical, appropriate, empirically-validated guidelines for designing educational games

    Get PDF
    There has recently been a great deal of interest in the potential of computer games to function as innovative educational tools. However, there is very little evidence of games fulfilling that potential. Indeed, the process of merging the disparate goals of education and games design appears problematic, and there are currently no practical guidelines for how to do so in a coherent manner. In this paper, we describe the successful, empirically validated teaching methods developed by behavioural psychologists and point out how they are uniquely suited to take advantage of the benefits that games offer to education. We conclude by proposing some practical steps for designing educational games, based on the techniques of Applied Behaviour Analysis. It is intended that this paper can both focus educational games designers on the features of games that are genuinely useful for education, and also introduce a successful form of teaching that this audience may not yet be familiar with

    Sensing and awareness of 360º immersive videos on the move

    Get PDF
    Tese de mestrado em Engenharia Informática, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2013Ao apelar a vários sentidos e transmitir um conjunto muito rico de informação, o vídeo tem o potencial para causar um forte impacto emocional nos espectadores, assim como para a criação de uma forte sensação de presença e ligação com o vídeo. Estas potencialidades podem ser estendidas através de percepção multimídia, e da flexibilidade da mobilidade. Com a popularidade dos dispositivos móveis e a crescente variedade de sensores e actuadores que estes incluem, existe cada vez mais potencial para a captura e visualização de vídeo em 360º enriquecido com informação extra (metadados), criando assim as condições para proporcionar experiências de visualização de vídeo mais imersivas ao utilizador. Este trabalho explora o potencial imersivo do vídeo em 360º. O problema é abordado num contexto de ambientes móveis, assim como num contexto da interação com ecrãs de maiores dimensões, tirando partido de second screens para interagir com o vídeo. De realçar que, em ambos os casos, o vídeo a ser reproduzido é aumentado com vários tipos de informação. Foram assim concebidas várias funcionalidades para a captura, pesquisa, visualização e navegação de vídeo em 360º. Os resultados confirmaram a existência de vantagens no uso de abordagens multisensoriais como forma de melhorar as características imersivas de um ambiente de vídeo. Foram também identificadas determinadas propriedades e parâmetros que obtêm melhores resultados em determinadas situações. O vídeo permite capturar e apresentar eventos e cenários com grande autenticidade, realismo e impacto emocional. Para além disso, tem-se vindo a tornar cada vez mais pervasivo no quotidiano, sendo os dispositivos pessoais de captura e reprodução, a Internet, as redes sociais, ou a iTV exemplos de meios através dos quais o vídeo chega até aos utilizadores (Neng & Chambel, 2010; Noronha et al, 2012). Desta forma, a imersão em vídeo tem o potencial para causar um forte impacto emocional nos espectadores, assim como para a criação de uma forte sensação de presença e ligação com o vídeo (Douglas & Hargadon, 2000; Visch et al, 2010). Contudo, no vídeo tradicional a experiência dos espectadores é limitada ao ângulo para o qual a câmara apontava durante a captura do vídeo. A introdução de vídeo em 360º veio ultrapassar essa restrição. Na busca de melhorar ainda mais as capacidades imersivas do vídeo podem ser considerados tópicos como a percepção multimídia e a mobilidade. Os dispositivos móveis têm vindo a tornar-se cada vez mais omnipresentes na sociedade moderna, e, dada a grande variedade de sensores e actuadores que incluem, oferecem um largo espectro de oportunidades de captura e reprodução de vídeo em 360º enriquecido com informação extra (metadados), tendo portanto o potencial para melhorar o paradigma de interação e providenciar suporte a experiências de visualização de vídeo mais ponderosas e imersivas. Contudo, existem desafios relacionados com o design de ambientes eficazes que tirem partido deste potencial de imersão. Ecrãs panorâmicos e CAVEs são exemplos de ambientes que caminham na direção da imersão total e providenciam condições privilegiadas no que toca à reprodução de vídeo imersivo. Porém, não são muito convenientes e, especialmente no caso das CAVEs, não são facilmente acessíveis. Por outro lado, a flexibilidade associada aos dispositivos móveis poderia permitir que os utilizadores tirassem partido dos mesmos usando-os, por exemplo, como uma janela (móvel) para o vídeo no qual estariam imersos. Mais do que isso, seguindo esta abordagem os utilizadores poderiam levar estas experiências de visualização consigo para qualquer lugar. Como second screens, os dispositivos móveis podem ser usados como auxiliares de navegação relativamente aos conteúdos apresentados no ecrã principal (seja este um ecrã panorâmico ou uma CAVE), representando também uma oportunidade para fazer chegar informação adicional ao utilizador, eliminando do ecrã principal informação alheia ao conteúdo base, o que proporciona uma melhor sensação de imersão e flexibilidade. Este trabalho explora o potencial imersivo do vídeo em 360º em ambientes móveis aumentado com vários tipos de informação. Nesse sentido, e estendendo um trabalho anterior (Neng, 2010; Noronha, 2012; Álvares, 2012) que incidiu maioritariamente na dimensão participativa de imersão, a presente abordagem centrou-se na dimensão perceptual de imersão. Neste âmbito, foram concebidas, desenvolvidas e testadas várias funcionalidades, agrupadas numa aplicação de visualização de vídeo em 360º – Windy Sight Surfers. Considerando a crescente popularidade dos dispositivos móveis na sociedade e as características que os tornam numa oportunidade para melhorar a interação homem-máquina e, mais especificamente, suportar experiências de visualização de vídeo mais imersivas, a aplicação Windy Sight Surfers está fortemente relacionada com ambientes móveis. Considerando as possibilidades de interação que o uso de second screens introduz, foi concebida uma componente do Windy Sight Surfers relacionada com a interação com ecrãs de maiores dimensões. Os vídeos utilizados no Windy Sight Surfers são vídeos em 360º, aumentados com uma série de informações registadas a partir do Windy Sight Surfers durante a sua captura. Isto é, enquanto a câmara captura os vídeos, a aplicação regista informação adicional – metadados – obtida a partir de vários sensores do dispositivo, que complementa e enriquece os vídeos. Nomeadamente, são capturadas as coordenadas geográficas e a velocidade de deslocamento a partir do GPS, a orientação do utilizador a partir da bússola digital, os valores relativos às forças-G associadas ao dispositivo através do acelerómetro, e são recolhidas as condições atmosféricas relativas ao estado do tempo através de um serviço web. Quando capturados, os vídeos, assim como os seus metadados, podem ser submetidos para o sistema. Uma vez capturados e submetidos, os vídeos podem ser pesquisados através do mais tradicional conjunto de palavras chave, de filtros relacionados com a natureza da aplicação (ex. velocidade, período do dia, condições atmosféricas), ou através de um mapa, o que introduz uma componente geográfica ao processo de pesquisa. Os resultados podem ser apresentados numa convencional lista, no formato de uma cover-flow, ou através do mapa. No que respeita à visualização dos vídeos, estes são mapeados em torno de um cilindro, que permite representar a vista dos 360º e transmitir a sensação de estar parcialmente rodeado pelo vídeo. Uma vez que a visualização de vídeos decorre em dispositivos móveis, os utilizadores podem deslocar continuamente o ângulo de visão do vídeo 360º para a esquerda ou direita ao mover o dispositivo em seu redor, como se o dispositivo se tratasse de uma janela para o vídeo em 360º. Adicionalmente, os utilizadores podem alterar o ângulo de visualização arrastando o dedo pelo vídeo, uma vez que todo o ecrã consiste numa interface deslizante durante a visualização de vídeos em 360º. Foram ainda incorporadas na aplicação várias funcionalidades que pretendem dar um maior realismo à visualização de vídeos. Nomeadamente, foi desenvolvido um acessório de vento na plataforma Arduino que leva em conta os metadados de cada vídeo para produzir vento e assim dar uma sensação mais realista do vento e da velocidade do deslocamento durante a visualização dos vídeos. De referir que o algoritmo implementado leva em conta não só a velocidade de deslocamento, como também o estado do tempo em termos de vento (força e orientação) aquando da captura do vídeo, e a orientação do utilizador de acordo com o ângulo do vídeo a ser visualizado durante a reprodução do vídeo. Considerando a componente áudio dos vídeos, neste sistema, o áudio de cada vídeo é mapeado num espaço sonoro tridimensional, que pode ser reproduzido num par de auscultadores estéreo. Neste espaço sonoro, a posição das fontes sonoras está associada ao ângulo frontal do vídeo e, como tal, muda de acordo com o ângulo do vídeo a ser visualizado. Isto é, se o utilizador estiver a visualizar o ângulo frontal do vídeo, as fontes sonoras estarão localizadas diante da cabeça do utilizador; se o utilizador estiver a visualizar o ângulo traseiro do vídeo, as fontes sonoras estarão localizadas por de trás da cabeça do utilizador. Uma vez que os vídeos têm 360º, a posição das fontes sonoras varia em torno de uma circunferência à volta da cabeça do utilizador, sendo o intuito o de dar uma orientação adicional no vídeo que está a ser visualizado. Para aumentar a sensação de movimento através do áudio, foi explorado o Efeito de Doppler. Este efeito pode ser descrito como a alteração na frequência observada de uma onda, ocorrendo quando a fonte ou o observador se encontram em movimento entre si. Devido ao facto deste efeito ser associado à noção de movimento, foi conduzida uma experiência com o intuito de analisar se o uso controlado do Efeito de Doppler tem o potencial de aumentar a sensação de movimento durante a visualização dos vídeos. Para isso, foi adicionada uma segunda camada sonora cuja função é reproduzir o Efeito de Doppler ciclicamente e de forma controlada. Esta reprodução foi relacionada com a velocidade de deslocamento do vídeo de acordo seguinte proporção: quanto maior a velocidade, maior será a frequência com que este efeito é reproduzido. Estas funcionalidades são relativas à procura de melhorar as capacidades imersivas do sistema através da estimulação sensorial dos utilizadores. Adicionalmente, o Windy Sight Surfers inclui um conjunto de funcionalidades cujo objectivo se centra em melhorar as capacidades imersivas do sistema ao providenciar ao utilizador informações que consciencializem o utilizador do contexto do vídeo, permitindo assim que este se aperceba melhor do que se está a passar no vídeo. Mais especificamente, estas funcionalidades estão dispostas numa camada por cima do vídeo e disponibilizam informações como a velocidade atual, a orientação do ângulo do vídeo a ser observado, ou a força-G instantânea. A acrescentar que as diferentes funcionalidades se dividem numa categoria relativa a informação que é disponibilizada permanentemente durante a reprodução de vídeos, e numa segunda categoria (complementar da primeira) relativa a informação que é disponibilizada momentaneamente, sendo portanto relativa a determinadas porções do vídeo. Procurando conceber uma experiência mais envolvente para o utilizador, foi incorporado um reconhecedor emocional baseado em reconhecimento de expressões faciais no Windy Sight Surfers. Desta forma, as expressões faciais dos utilizadores são analisadas durante a reprodução de vídeos, sendo os resultados desta análise usados em diferentes funcionalidades da aplicação. Presentemente, a informação emocional tem três aplicações no ambiente desenvolvido, sendo usada em: funcionalidades de catalogação e pesquisa de vídeos; funcionalidades que influenciam o controlo de fluxo da aplicação; e na avaliação do próprio sistema. Considerando o contexto do projeto de investigação ImTV (url-ImTV), e com o intuito de tornar a aplicação o mais flexível possível, o Windy Sight Surfers tem uma componente second screen, permitindo a interação com ecrãs mais amplos, como por exemplo televisões. Desta forma, é possível utilizar os dois dipositivos em conjunto por forma a retirar o melhor proveito de cada um com o objectivo de aumentar as capacidades imersivas do sistema. Neste contexto, os vídeos passam a ser reproduzidos no ecrã conectado, ao passo que a aplicação móvel assume as funcionalidades de controlar o conteúdo apresentado no ecrã conectado e disponibilizar um conjunto de informações adicionais, tais como um minimapa, onde apresenta uma projeção planar dos 360º do vídeo, e um mapa da zona geográfica associada ao vídeo onde se representa o percurso em visualização em tempo real e percursos adicionais que sejam respeitantes a vídeos associados à mesma zona geográfica do vídeo a ser visualizado no momento. Foi efectuada uma avaliação de usabilidade com utilizadores, tendo como base o questionário USE e o Self-Assessment Manikin (SAM) acoplado de dois parâmetros adicionais relativos a presença e realismo. Com base na observação durante a realização de tarefas por parte dos utilizadores, foram realizadas entrevistas onde se procurou obter comentários, sugestões ou preocupações sobre as funcionalidades testadas. Adicionalmente, a ferramenta de avaliação emocional desenvolvida foi utilizada de forma a registar quais as emoções mais prevalentes durante a utilização da aplicação. Por fim, as potencialidades imersivas globais do Windy Sight Surfers foram avaliadas através da aplicação do Immersive Tendencies Questionnaire (ITQ) e de uma versão adaptada do Presence Questionnaire (PQ). Os resultados confirmaram a existência de vantagens no uso de abordagens multisensoriais como forma de melhorar as características imersivas de um ambiente de vídeo. Para além disso, foram identificadas determinadas propriedades e parâmetros que obtêm melhores resultados e são mais satisfatórios em determinadas condições, podendo assim estes resultados servir como diretrizes para futuros ambientes relacionados com vídeo imersivo.By appealing to several senses and conveying very rich information, video has the potential for a strong emotional impact on viewers, greatly influencing their sense of presence and engagement. This potential may be extended even further with multimedia sensing and the flexibility of mobility. Mobile devices are commonly used and increasingly incorporating a wide range of sensors and actuators with the potential to capture and display 360º video and metadata, thus supporting more powerful and immersive video user experiences. This work was carried out in the context of the ImTV research project (url-ImTV), and explores the immersion potential of 360º video. The matter is approached in a mobile environment context, and in a context of interaction with wider screens, using second screens in order to interact with video. It must be emphasized that, in both situations, the videos are augmented with several types of information. Therefore, several functionalities were designed regarding the capture, search, visualization and navigation of 360º video. Results confirmed advantages in using a multisensory approach as a means to increase immersion in a video environment. Furthermore, specific properties and parameters that worked better in different conditions have been identified, thus enabling these results to serve as guidelines for future environments related to immersive video

    Improving elderly access to audiovisual and social media, using a multimodal human-computer interface

    Get PDF
    With the growth of Internet and especially, the proliferation of social media services, an opportunity has emerged for greater social and technological integration of the elderly. However, the adoption of new technologies by this segment of the population is not always straightforward mainly due to the physical and cognitive difficulties that are typically associated with ageing. Thus, for elderly to take advantage of new technologies and services that can help improve their quality of life, barriers must be broken by designing solutions with those needs in mind from the start. The aim of this work is to verify whether Multimodal Human-Computer Interaction (MHCI) systems designed with Universal Accessibility principles, taking into account elderly specific requirements, facilitate the adoption and access to popular Social Media Services (SMSs) and Audiovisual Communication Services, thus potentially contributing to the elderly social and technological integration. A user study was initially conducted in order to learn about the limitations and requirements of elderly people with existing HCI, concerning particularly SMSs and Audiovisual Communication Services, such as Facebook or Windows Live Messenger (WLM). The results of the study, basically a set of new MHCI requirements, were used to inform further development and enhancement of a multimodal prototype previously proposed for mobility-impaired individuals, now targeting the elderly. The prototype allows connecting users with their social networks through a text, audio and video communication service and integrates with SMSs, using natural interaction modalities, like speech, touch and gesture. After the development stage a usability evaluation study was conducted. The study reveals that such multimodal solution could simplify accessibility to the supported services, through the provision of simpler to use interfaces, by adopting natural interaction modalities and by being more satisfying to use by the elderly population, than most of the current graphical user interfaces for those same services, such as Facebook.Com o crescimento da Internet e, especialmente, das redes sociais surge a oportunidade para uma maior integração social e tecnológica dos idosos. No entanto, a adoção de novas tecnologias por essa população nem sempre é simples, principalmente devido às dificuldades físicas e cognitivas que estão associadas com o envelhecimento. Assim, e para que os idosos possam tirar proveito das novas tecnologias e serviços que podem ajudar a melhorar sua qualidade de vida, essas barreiras devem ser ultrapassadas desenhando soluções de raiz com essas necessidades em mente. O objetivo deste trabalho é verificar se interfaces humano-computador multimodais desenhadas com base em princípios de Acessibilidade Universal, tendo em conta requisitos específicos da população idosa, proporcionam um acesso simplificado a serviços de média social e serviços de comunicação audiovisuais, potencialmente contribuindo para a integração social e tecnológica desta população. Um estudo com utilizadores foi inicialmente conduzido a fim de apurar as necessidades especiais desses utilizadores com soluções de software, mais especificamente serviços de média social e serviços de conferência, como o Facebook ou o Windows Live Messenger. Os resultados do estudo foram utilizados para planear o desenvolvimento de um protótipo multimodal proposto anteriormente para utilizadores com mobilidade reduzida. Este permite ligar utilizadores às suas redes sociais através de um serviço de conferência por texto, áudio e vídeo, e um serviço integrado de média social, usando modalidades de interação natural, como o toque, fala e gestos. Após a fase de desenvolvimento foi realizado um estudo de usabilidade. Esse estudo revelou que este tipo de soluções pode simplificar a acessibilidade aos serviços considerados, dado ter interfaces mais simples, por adotar modalidades de interação mais naturais e por ser mais gratificante do que a maioria das interfaces gráficas atuais para os mesmos serviços, como por exemplo o Facebook
    corecore