1,778 research outputs found

    Examining the role of smart TVs and VR HMDs in synchronous at-a-distance media consumption

    Get PDF
    This article examines synchronous at-a-distance media consumption from two perspectives: How it can be facilitated using existing consumer displays (through TVs combined with smartphones), and imminently available consumer displays (through virtual reality (VR) HMDs combined with RGBD sensing). First, we discuss results from an initial evaluation of a synchronous shared at-a-distance smart TV system, CastAway. Through week-long in-home deployments with five couples, we gain formative insights into the adoption and usage of at-a-distance media consumption and how couples communicated during said consumption. We then examine how the imminent availability and potential adoption of consumer VR HMDs could affect preferences toward how synchronous at-a-distance media consumption is conducted, in a laboratory study of 12 pairs, by enhancing media immersion and supporting embodied telepresence for communication. Finally, we discuss the implications these studies have for the near-future of consumer synchronous at-a-distance media consumption. When combined, these studies begin to explore a design space regarding the varying ways in which at-a-distance media consumption can be supported and experienced (through music, TV content, augmenting existing TV content for immersion, and immersive VR content), what factors might influence usage and adoption and the implications for supporting communication and telepresence during media consumption

    BOLA360: Near-optimal View and Bitrate Adaptation for 360-degree Video Streaming

    Full text link
    Recent advances in omnidirectional cameras and AR/VR headsets have spurred the adoption of 360-degree videos that are widely believed to be the future of online video streaming. 360-degree videos allow users to wear a head-mounted display (HMD) and experience the video as if they are physically present in the scene. Streaming high-quality 360-degree videos at scale is an unsolved problem that is more challenging than traditional (2D) video delivery. The data rate required to stream 360-degree videos is an order of magnitude more than traditional videos. Further, the penalty for rebuffering events where the video freezes or displays a blank screen is more severe as it may cause cybersickness. We propose an online adaptive bitrate (ABR) algorithm for 360-degree videos called BOLA360 that runs inside the client's video player and orchestrates the download of video segments from the server so as to maximize the quality-of-experience (QoE) of the user. BOLA360 conserves bandwidth by downloading only those video segments that are likely to fall within the field-of-view (FOV) of the user. In addition, BOLA360 continually adapts the bitrate of the downloaded video segments so as to enable a smooth playback without rebuffering. We prove that BOLA360 is near-optimal with respect to an optimal offline algorithm that maximizes QoE. Further, we evaluate BOLA360 on a wide range of network and user head movement profiles and show that it provides 13.6%13.6\% to 372.5%372.5\% more QoE than state-of-the-art algorithms. While ABR algorithms for traditional (2D) videos have been well-studied over the last decade, our work is the first ABR algorithm for 360-degree videos with both theoretical and empirical guarantees on its performance.Comment: 25 page

    PEA265: Perceptual Assessment of Video Compression Artifacts

    Full text link
    The most widely used video encoders share a common hybrid coding framework that includes block-based motion estimation/compensation and block-based transform coding. Despite their high coding efficiency, the encoded videos often exhibit visually annoying artifacts, denoted as Perceivable Encoding Artifacts (PEAs), which significantly degrade the visual Qualityof- Experience (QoE) of end users. To monitor and improve visual QoE, it is crucial to develop subjective and objective measures that can identify and quantify various types of PEAs. In this work, we make the first attempt to build a large-scale subjectlabelled database composed of H.265/HEVC compressed videos containing various PEAs. The database, namely the PEA265 database, includes 4 types of spatial PEAs (i.e. blurring, blocking, ringing and color bleeding) and 2 types of temporal PEAs (i.e. flickering and floating). Each containing at least 60,000 image or video patches with positive and negative labels. To objectively identify these PEAs, we train Convolutional Neural Networks (CNNs) using the PEA265 database. It appears that state-of-theart ResNeXt is capable of identifying each type of PEAs with high accuracy. Furthermore, we define PEA pattern and PEA intensity measures to quantify PEA levels of compressed video sequence. We believe that the PEA265 database and our findings will benefit the future development of video quality assessment methods and perceptually motivated video encoders.Comment: 10 pages,15 figures,4 table

    Virtual reality tools in developing industrial training for additive manufacturing

    Get PDF
    Additive manufacturing (commonly known as 3D-printing) is experiencing increasing global popularity in the manufacturing industry. The technology has been adopted by large companies and additive manufacturing services have been outsourced by smaller ones, but first-party adoption of the technology among small and medium-sized enterprises has been slow. Additive manufacturing provides new opportunities for manufacturing but also requires specialized expertise among users of the technology. Studies indicate that modern digital learning techniques such as micro learning and the use of virtual reality and 360° video can provide effective means of learning industrial skills. The purpose of this thesis was to examine digital learning techniques, 360° video and virtual reality as well as various additive manufacturing technologies in order to produce a virtual reality -based learning application for industrial training of additive manufacturing. It was also necessary to test and validate the effectiveness of the training application and derive future considerations for more advanced iterations. The value of stereoscopic 360° virtual reality video was also examined. The first version of the training application was successfully completed, and user tests were conducted. A mix of quantitative feedback in the form of a survey, and qualitative feedback in the form of interviews, was gathered from a number of test users. Feedback was overall positive, but some user interface issues, and technical shortcomings were highlighted. Qualitative feedback regarding stereoscopic 360° video indicated the technique to have additional value for learning purposes in virtual reality. The results of these tests will be taken into consideration in the design of a second version of the training application. Topics for further studies were also proposed

    Self-Supervised Visual Learning by Variable Playback Speeds Prediction of a Video

    Full text link
    We propose a self-supervised visual learning method by predicting the variable playback speeds of a video. Without semantic labels, we learn the spatio-temporal visual representation of the video by leveraging the variations in the visual appearance according to different playback speeds under the assumption of temporal coherence. To learn the spatio-temporal visual variations in the entire video, we have not only predicted a single playback speed but also generated clips of various playback speeds and directions with randomized starting points. Hence the visual representation can be successfully learned from the meta information (playback speeds and directions) of the video. We also propose a new layer dependable temporal group normalization method that can be applied to 3D convolutional networks to improve the representation learning performance where we divide the temporal features into several groups and normalize each one using the different corresponding parameters. We validate the effectiveness of our method by fine-tuning it to the action recognition and video retrieval tasks on UCF-101 and HMDB-51.Comment: Accepted by IEEE Access on May 19, 202

    New interaction models for 360º video

    Get PDF
    Esta dissertação tem como principal objectivo a incorporação de um mecanismo de buffering num sistema de multimídia, capaz de oferecer experiências multivista adaptáveis. A incorporação deste mecanismo vem provocar melhorias na qualidade de serviço e na qualidade de experiência. O sistema recorre ao protocolo MPEG-DASH e a uma câmara convencional para detecção dos movimentos da cabeça do utilizador. O sistema incorpora ainda um mecanismo de adaptação automática da qualidade, ajustável às condições da rede. O mecanismo desenvolvido é composto por um proxy e tem o objectivo de minimizar o atraso existente na transição de vistas. O proxy será capaz de enviar três vistas em simultâneo, duas em baixa qualidade, enquanto a vista principal será enviada e apresenta ao utilizador em alta qualidade.Sempre que existe um novo pedido por parte do utilizador, o mecanismo irá comutar entre as vistas enviadas até receber a resposta por parte do servidor. Deste modo, esta dissertação pretende identificar as dificuldades que se colocam relativamente à disponibilização e transmissão eficiente deste tipo de conteúdos, assim como os compromissos necessários ao nível da qualidade de experiência do utilizador.Today, the fast technological evolution and the significant increase in the demand for multimedia content has boosted the development of the transmission mechanisms used for this purpose.This development had repercussions in several areas, such as the immersive experiences that include the 360º contents. Whether through live streaming or using on demand services, the quality of service and experience have become two points whose development has assumed high importance. The capture and reproduction of 360º content allows transmitting an immersive view of reality at a given moment. With this approach, the industry intends to provide a product with better audiovisual quality, more comfortable for the user and that allows a better interaction with the same. An example of this is the choice of the view that most appeals to us in a given event (for example, football matches or concerts). This dissertation has as main objective the incorporation of a buffering mechanism in a multimedia system, able to offer adaptive multivista experiments. The system uses the MPEG-DASH protocol for efficient use of network resources and a conventional camera for detecting the movements of the user's head, selecting the points of view that one wishes to visualize in real time. The system also incorporates an automatic quality adjustment mechanism, adjustable to the network conditions. The buffering mechanism is intended to increase the quality of experience and the quality of service, minimizing the delay in the transition of views. The mechanism will consist of a proxy capable of sending three views simultaneously. Of these views, two will be sent in low quality, while the main view will be sent and presented to the user in high quality. Whenever there is a new request from the user, the mechanism will switch between sent views until it receives the response from the server. Based on these assumptions, the dissertation intends to identify the challenges that are posed regarding the availability and efficient transmission of 360º content, as well as the necessary commitments regarding the quality of user experience. This last point is particularly significant, taking into account the network requirements and the volume of data presented by the transmissions of this type of content

    Spatial representation and low vision: two studies on the content, accuracy and utility of mental representations

    Get PDF
    The paper reports on two studies being conducted with students from Dorton College - Royal London Society for the Blind (RLSB) in Kent. The first experiment will examine the content and accuracy of mental representations of a well-known environment. Students will walk a route around the college campus and learn the position of ten buildings or structures. They will then be asked to make heading judgments, estimate distances, complete a spatial cued model and sequentially visit a series of locations. The second experiment will examine the strategies and coding heuristics used to explore a complex novel environment. Students will be asked to explore a maze and learn the location of different places. Their search patterns will be digitally tracked, coded and analyzed using GIS software.Students will be tested using the same methods as in the first experiment and their performance level will be correlated with their exploratory patterns. Throughout the paper we are reminded that construct validity can only be secured by employing multiple converging techniques in the collection and analysis of cognitive data. Methods should be designed to test content and accuracy as well as the utility of mental representations

    Perceptual strategies in active and passive hearing of neotropical bats

    Get PDF
    Basic spectral and temporal sound properties, such as frequency content and timing, are evaluated by the auditory system to build an internal representation of the external world and to generate auditory guided behaviour. Using echolocating bats as model system, I investigated aspects of spectral and temporal processing during echolocation and in relation to passive listening, and the echo-acoustic object recognition for navigation. In the first project (chapter 2), the spectral processing during passive and active hearing was compared in the echolocting bat Phyllostomus discolor. Sounds are ubiquitously used for many vital behaviours, such as communication, predator and prey detection, or echolocation. The frequency content of a sound is one major component for the correct perception of the transmitted information, but it is distorted while travelling from the sound source to the receiver. In order to correctly determine the frequency content of an acoustic signal, the receiver needs to compensate for these distortions. We first investigated whether P. discolor compensates for distortions of the spectral shape of transmitted sounds during passive listening. Bats were trained to discriminate lowpass filtered from highpass filtered acoustic impulses, while hearing a continuous white noise background with a flat spectral shape. We then assessed their spontaneous classification of acoustic impulses with varying spectral content depending on the background’s spectral shape (flat or lowpass filtered). Lowpass filtered noise background increased the proportion of highpass classifications of the same filtered impulses, compared to white noise background. Like humans, the bats thus compensated for the background’s spectral shape. In an active-acoustic version of the identical experiment, the bats had to classify filtered playbacks of their emitted echolocation calls instead of passively presented impulses. During echolocation, the classification of the filtered echoes was independent of the spectral shape of the passively presented background noise. Likewise, call structure did not change to compensate for the background’s spectral shape. Hence, auditory processing differs between passive and active hearing, with echolocation representing an independent mode with its own rules of auditory spectral analysis. The second project (chapter 3) was concerned with the accurate measurement of the time of occurrence of auditory signals, and as such also distance in echolocation. In addition, the importance of passive listening compared to echolocation turned out to be an unexpected factor in this study. To measure the distance to objects, called ranging, bats measure the time delay between an outgoing call and its returning echo. Ranging accuracy received considerable interest in echolocation research for several reasons: (i) behaviourally, it is of importance for the bat’s ability to locate objects and navigate its surrounding, (ii) physiologically, the neuronal implementation of precise measurements of very short time intervals is a challenge and (iii) the conjectured echo-acoustic receiver of bats is of interest for signal processing. Here, I trained the nectarivorous bat Glossophaga soricina to detect a jittering real target and found a biologically plausible distance accuracy of 4–7 mm, corresponding to a temporal accuracy of 20–40 μs. However, presumably all bats did not learn to use the jittering echo delay as the first and most prominent cue, but relied on passive acoustic listening first, which could only be prevented by the playback of masking noise. This shows that even a non-gleaning bat heavily relies on passive acoustic cues and that the measuring of short time intervals is difficult. This result questions other studies reporting a sub-microsecond time jitter threshold. The third project (chapter 4) linked the perception of echo-acoustic stimuli to the appropriate behavioural reactions, namely evasive flight manoeuvres around virtual objects presented in the flight paths of wild, untrained bats. Echolocating bats are able to orient in complete darkness only by analysing the echoes of their emitted calls. They detect, recognize and classify objects based on the spectro-temporal reflection pattern received at the two ears. Auditory object analysis, however, is inevitably more complicated than visual object analysis, because the one-dimensional acoustic time signal only transmits range information, i.e., the object’s distance and its longitudinal extent. All other object dimensions like width and height have to be inferred from comparative analysis of the signals at both ears and over time. The purpose of this study was to measure perceived object dimensions in wild, experimentally naïve bats by video-recording and analysing the bats’ evasive flight manoeuvres in response to the presentation of virtual echo-acoustic objects with independently manipulated acoustic parameters. Flight manoeuvres were analysed by extracting the flight paths of all passing bats. As a control to our method, we also recorded the flight paths of bats in response to a real object. Bats avoided the real object by flying around it. However, we did not find any flight path changes in response to the presentation of several virtual objects. We assume that the missing spatial extent of virtual echo-acoustic objects, due to playback from only one loudspeaker, was the main reason for the failure to evoke evasive flight manoeuvres. This study therefore emphasises for the first time the importance of the spatial dimension of virtual objects, which were up to now neglected in virtual object presentations
    corecore