491 research outputs found

    Perceptual modelling for 2D and 3D

    Get PDF
    Livrable D1.1 du projet ANR PERSEECe rapport a été réalisé dans le cadre du projet ANR PERSEE (n° ANR-09-BLAN-0170). Exactement il correspond au livrable D1.1 du projet

    Saliency-aware Stereoscopic Video Retargeting

    Full text link
    Stereo video retargeting aims to resize an image to a desired aspect ratio. The quality of retargeted videos can be significantly impacted by the stereo videos spatial, temporal, and disparity coherence, all of which can be impacted by the retargeting process. Due to the lack of a publicly accessible annotated dataset, there is little research on deep learning-based methods for stereo video retargeting. This paper proposes an unsupervised deep learning-based stereo video retargeting network. Our model first detects the salient objects and shifts and warps all objects such that it minimizes the distortion of the salient parts of the stereo frames. We use 1D convolution for shifting the salient objects and design a stereo video Transformer to assist the retargeting process. To train the network, we use the parallax attention mechanism to fuse the left and right views and feed the retargeted frames to a reconstruction module that reverses the retargeted frames to the input frames. Therefore, the network is trained in an unsupervised manner. Extensive qualitative and quantitative experiments and ablation studies on KITTI stereo 2012 and 2015 datasets demonstrate the efficiency of the proposed method over the existing state-of-the-art methods. The code is available at https://github.com/z65451/SVR/.Comment: 8 pages excluding references. CVPRW conferenc

    Study of depth bias of observers in free viewing of still stereoscopic synthetic stimuli

    Get PDF
    Observers’ fixations exhibit a marked bias towards certain areas on the screen when viewing scenes on computer monitors. For instance, there exists a well-known “center-bias” which means that fixations are biased towards the center of the screen during the viewing of 2D still images. In the viewing of 3D content, stereoscopic displays enhance depth perception by the mean of binocular parallax. This additional depth cue has a great influence on guiding eye movements. Relatively little is known about the impact of binocular parallax on visual attention of the 3D content displayed on stereoscopic screen. Several studies mentioned that people tend to look preferably at the objects located at certain positions in depth. But studies proving or quantifying this depth-bias are still limited. In this paper, we conducted a binocular eye-tracking experiment by showing synthetic stimuli on a stereoscopic display. Observers were required to do a free-viewing task through passive polarized glasses. Gaze positions of both eyes were recorded and the depth of eyes’ fixation was determined. The stimuli used in the experiment were designed in such a way that the center-bias and the depth-bias affect eye movements individually. Results indicate the existence of a depth-bias: objects closer to the viewer attract attention earlier than distant objects, and the number of fixations located on objects varies as a function of objects’ depth. The closest object in a scene always attracts most fixations. The fixation distribution along depth also shows a convergent behavior as the viewing time increases

    Optical techniques for 3D surface reconstruction in computer-assisted laparoscopic surgery

    Get PDF
    One of the main challenges for computer-assisted surgery (CAS) is to determine the intra-opera- tive morphology and motion of soft-tissues. This information is prerequisite to the registration of multi-modal patient-specific data for enhancing the surgeon’s navigation capabilites by observ- ing beyond exposed tissue surfaces and for providing intelligent control of robotic-assisted in- struments. In minimally invasive surgery (MIS), optical techniques are an increasingly attractive approach for in vivo 3D reconstruction of the soft-tissue surface geometry. This paper reviews the state-of-the-art methods for optical intra-operative 3D reconstruction in laparoscopic surgery and discusses the technical challenges and future perspectives towards clinical translation. With the recent paradigm shift of surgical practice towards MIS and new developments in 3D opti- cal imaging, this is a timely discussion about technologies that could facilitate complex CAS procedures in dynamic and deformable anatomical regions

    D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos

    Get PDF
    Understanding human visual behavior within virtual reality environments is crucial to fully leverage their potential. While previous research has provided rich visual data from human observers, existing gaze datasets often suffer from the absence of multimodal stimuli. Moreover, no dataset has yet gathered eye gaze trajectories (i.e., scanpaths) for dynamic content with directional ambisonic sound, which is a critical aspect of sound perception by humans. To address this gap, we introduce D-SAV360, a dataset of 4,609 head and eye scanpaths for 360° videos with first-order ambisonics. This dataset enables a more comprehensive study of multimodal interaction on visual behavior in virtual reality environments. We analyze our collected scanpaths from a total of 87 participants viewing 85 different videos and show that various factors such as viewing mode, content type, and gender significantly impact eye movement statistics. We demonstrate the potential of D-SAV360 as a benchmarking resource for state-of-the-art attention prediction models and discuss its possible applications in further research. By providing a comprehensive dataset of eye movement data for dynamic, multimodal virtual environments, our work can facilitate future investigations of visual behavior and attention in virtual reality

    Predictive Model of Driver\u27s Eye Fixation for Maneuver Prediction in the Design of Advanced Driving Assistance Systems

    Get PDF
    Over the last few years, Advanced Driver Assistance Systems (ADAS) have been shown to significantly reduce the number of vehicle accidents. Accord- ing to the National Highway Traffic Safety Administration (NHTSA), driver errors contribute to 94% of road collisions. This research aims to develop a predictive model of driver eye fixation by analyzing the driver eye and head information (cephalo-ocular) for maneuver prediction in an Advanced Driving Assistance System (ADAS). Several ADASs have been developed to help drivers to perform driving tasks in complex environments and many studies were conducted on improving automated systems. Some research has relied on the fact that the driver plays a crucial role in most driving scenarios, recognizing the driver’s role as the central element in ADASs. The way in which a driver monitors the surrounding environment is at least partially descriptive of the driver’s situation awareness. This thesis’s primary goal is the quantitative and qualitative analysis of driver behavior to determine the relationship between driver intent and actions. The RoadLab initiative provided an instrumented vehicle equipped with an on-board diagnostic system, an eye-gaze tracker, and a stereo vision system for the extraction of relevant features from the driver, the vehicle, and the environment. Several driver behavioral features are investigated to determine whether there is a relevant relation between the driver’s eye fixations and the prediction of driving maneuvers

    Multimodal Stereoscopic Movie Summarization Conforming to Narrative Characteristics

    Get PDF
    Video summarization is a timely and rapidly developing research field with broad commercial interest, due to the increasing availability of massive video data. Relevant algorithms face the challenge of needing to achieve a careful balance between summary compactness, enjoyability, and content coverage. The specific case of stereoscopic 3D theatrical films has become more important over the past years, but not received corresponding research attention. In this paper, a multi-stage, multimodal summarization process for such stereoscopic movies is proposed, that is able to extract a short, representative video skim conforming to narrative characteristics from a 3D film. At the initial stage, a novel, low-level video frame description method is introduced (frame moments descriptor) that compactly captures informative image statistics from luminance, color, optical flow, and stereoscopic disparity video data, both in a global and in a local scale. Thus, scene texture, illumination, motion, and geometry properties may succinctly be contained within a single frame feature descriptor, which can subsequently be employed as a building block in any key-frame extraction scheme, e.g., for intra-shot frame clustering. The computed key-frames are then used to construct a movie summary in the form of a video skim, which is post-processed in a manner that also considers the audio modality. The next stage of the proposed summarization pipeline essentially performs shot pruning, controlled by a user-provided shot retention parameter, that removes segments from the skim based on the narrative prominence of movie characters in both the visual and the audio modalities. This novel process (multimodal shot pruning) is algebraically modeled as a multimodal matrix column subset selection problem, which is solved using an evolutionary computing approach. Subsequently, disorienting editing effects induced by summarization are dealt with, through manipulation of the video skim. At the last step, the skim is suitably post-processed in order to reduce stereoscopic video defects that may cause visual fatigue

    Perceptually driven stereoscopic camera control in 3D virtual environments

    Get PDF
    Ankara : The Department of Computer Engineering and the Graduate School of Engineering and Science of Bilkent University, 2013.Thesis (Master's) -- Bilkent University, 2013.Includes bibliographical references leaves 56-59.Depth notion and how to perceive depth have long been studied in the eld of psychology, physiology, and even art. Human visual perception enables to perceive spatial layout of the outside world by using visual depth cues. Binocular disparity among these depth cues, is based on the separation between two di erent views that are observed by two eyes. Disparity concept constitutes the base of the construction of the stereoscopic vision. Emerging technologies try to replicate binocular disparity principles in order to provide 3D illusion and stereoscopic vision. However, the complexity of applying the underlying principles of 3D perception, confronted researchers the problem of wrongly produced stereoscopic contents. It is still a great challenge to give realistic but also comfortable 3D experience. In this work, we present a camera control mechanism: a novel approach for disparity control and a model for path generation. We try to address the challenges of stereoscopic 3D production by presenting comfortable viewing experience to users. Therefore, our disparity system approaches the accommodation/convergence con- ict problem, which is the most known issue that causes visual fatigue in stereo systems, by taking objects' importance into consideration. Stereo camera parameters are calculated automatically with an optimization process. In the second part of our control mechanism, the camera path is constructed for a given 3D environment and scene elements. Moving around important regions of objects is a desired scene exploration task. In this respect, object saliencies are used for viewpoint selection around scene elements. Path structure is generated by using linked B ezier curves which assures to pass through pre-determined viewpoints. Though there is considerable amount of research found in the eld of stereo creation, we believe that approaching this problem from scene content aspect provides a uniquely promising experience. We validate our assumption with user studies in which our method and existing two other disparity control models are compared. The study results show that our method shows superior results in quality, depth, and comfort.Kevinç, Elif BengüM.S
    • …
    corecore