2,522 research outputs found
Estimation of a 3D motion field from a multi-camera array using a multiresolution Gaussian mixture model
The problem of modelling geometry for video based rendering has been much studied in recent years, due to the growing interest in 'free viewpoint' video and similar applications. Common approaches fall into two categories: those which approximate surfaces from dense depth maps obtained by generalisations of stereopsis and those which employ an explicit geometric representation such as a mesh. While the former have generality with respect to geometry, they are limited in terms of viewpoint; the latter, on the other hand, sacrifice generality of geometry for freedom to pick an arbitary viewpoint. The purpose of the work reported here is to bridge this gap in object representation, by employing a stochastic model of object structure: a multiresolution Gaussian mixture. Estimation of the model and tracking it through time from multiple cameras is achieved by a multiresolution stochastic simulation. After a brief outline of the method, its use in modelling human motion using data from local and other sources is presented to illustrate its effectiveness compared to the current state of the art
The Incremental Multiresolution Matrix Factorization Algorithm
Multiresolution analysis and matrix factorization are foundational tools in
computer vision. In this work, we study the interface between these two
distinct topics and obtain techniques to uncover hierarchical block structure
in symmetric matrices -- an important aspect in the success of many vision
problems. Our new algorithm, the incremental multiresolution matrix
factorization, uncovers such structure one feature at a time, and hence scales
well to large matrices. We describe how this multiscale analysis goes much
farther than what a direct global factorization of the data can identify. We
evaluate the efficacy of the resulting factorizations for relative leveraging
within regression tasks using medical imaging data. We also use the
factorization on representations learned by popular deep networks, providing
evidence of their ability to infer semantic relationships even when they are
not explicitly trained to do so. We show that this algorithm can be used as an
exploratory tool to improve the network architecture, and within numerous other
settings in vision.Comment: Computer Vision and Pattern Recognition (CVPR) 2017, 10 page
A perceptual comparison of empirical and predictive region-of-interest video
When viewing multimedia presentations, a user only
attends to a relatively small part of the video display at any one point in time. By shifting allocation of bandwidth from peripheral areas to those locations where a userâs gaze is more likely to rest, attentive displays can be produced. Attentive displays aim to reduce resource requirements while minimizing negative user perceptionâunderstood in this paper as not only a userâs ability to assimilate and understand information but also his/her subjective satisfaction with the video content. This paper introduces and discusses a perceptual comparison between two region-of-interest display (RoID) adaptation techniques. A RoID is an attentive display where bandwidth has been preallocated around measured or highly probable areas of user gaze. In this paper, video content was manipulated using two sources of data: empirical measured data (captured using eye-tracking technology) and predictive data (calculated from the physical characteristics of the video data). Results show that display adaptation causes significant variation in usersâ understanding of specific multimedia content. Interestingly, RoID adaptation and the type of video being presented both affect user perception of video quality. Moreover, the use of frame rates less than 15 frames per second, for any video adaptation technique, caused a significant reduction in user perceived quality, suggesting that although users are aware of video quality reduction, it does impact level of information assimilation and understanding. Results also highlight that user level of enjoyment is significantly affected by the type of video yet is not as affected by the quality or type of video adaptationâan interesting implication in the field of entertainment
Reinventing a teleconferencing system
Thesis (S.M.)--Massachusetts Institute of Technology, Program in Media Arts & Sciences, 2001.Includes bibliographical references (p. 67-71).In looking forward to more natural we can anticipate that the teleconferencing system of the future will enable participants at distant locations to share the same virtual space. The visual object of each participant can be transmitted to the other sites and be rendered from an individual perspective. This thesis presents an effort, X-Conference, to reinvent a teleconferencing system toward the concept of "3-D Virtual Teleconferencing." Several aspects are explored. A multiple-camera calibration approach is implemented and is employed to effectively blend the real view and the virtual view. An individualized 3-D head object is built semi-automatically by mapping the real texture to the globally modified generic model. Head motion parameters are extracted from tracking artificial and/or facial features. Without using the articulation model, facial animation is partially achieved by using texture displacement. UDP/IP multicast and TCP/IP unicast are both utilized to implement the networking scheme.by Xin Wang.S.M
- âŠ