Search CORE

834 research outputs found

Automatic video segmentation employing object/camera modeling techniques

Author: Farin D.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

Repository TU/e

Pure OAI Repository

Survey of image-based representations and compression techniques

Author: Chan SC
Kang SB
Shum HY
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

In this paper, we survey the techniques for image-based rendering (IBR) and for compressing image-based representations. Unlike traditional three-dimensional (3-D) computer graphics, in which 3-D geometry of the scene is known, IBR techniques render novel views directly from input images. IBR techniques can be classified into three categories according to how much geometric information is used: rendering without geometry, rendering with implicit geometry (i.e., correspondence), and rendering with explicit geometry (either with approximate or accurate geometry). We discuss the characteristics of these categories and their representative techniques. IBR techniques demonstrate a surprising diverse range in their extent of use of images and geometry in representing 3-D scenes. We explore the issues in trading off the use of images and geometry by revisiting plenoptic-sampling analysis and the notions of view dependency and geometric proxies. Finally, we highlight compression techniques specifically designed for image-based representations. Such compression techniques are important in making IBR techniques practical.published_or_final_versio

HKU Scholars Hub

Recommended from our members

Camera positioning for 3D panoramic image rendering

Author: Audu Abdulkadir Iyyaka
Publication venue: Brunel University London
Publication date: 01/01/2015
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University London.Virtual camera realisation and the proposition of trapezoidal camera architecture are the two broad contributions of this thesis. Firstly, multiple camera and their arrangement constitute a critical component which affect the integrity of visual content acquisition for multi-view video. Currently, linear, convergence, and divergence arrays are the prominent camera topologies adopted. However, the large number of cameras required and their synchronisation are two of prominent challenges usually encountered. The use of virtual cameras can significantly reduce the number of physical cameras used with respect to any of the known camera structures, hence adequately reducing some of the other implementation issues. This thesis explores to use image-based rendering with and without geometry in the implementations leading to the realisation of virtual cameras. The virtual camera implementation was carried out from the perspective of depth map (geometry) and use of multiple image samples (no geometry). Prior to the virtual camera realisation, the generation of depth map was investigated using region match measures widely known for solving image point correspondence problem. The constructed depth maps have been compare with the ones generated using the dynamic programming approach. In both the geometry and no geometry approaches, the virtual cameras lead to the rendering of views from a textured depth map, construction of 3D panoramic image of a scene by stitching multiple image samples and performing superposition on them, and computation of virtual scene from a stereo pair of panoramic images. The quality of these rendered images were assessed through the use of either objective or subjective analysis in Imatest software. Further more, metric reconstruction of a scene was performed by re-projection of the pixel points from multiple image samples with a single centre of projection. This was done using sparse bundle adjustment algorithm. The statistical summary obtained after the application of this algorithm provides a gauge for the efficiency of the optimisation step. The optimised data was then visualised in Meshlab software environment, hence providing the reconstructed scene. Secondly, with any of the well-established camera arrangements, all cameras are usually constrained to the same horizontal plane. Therefore, occlusion becomes an extremely challenging problem, and a robust camera set-up is required in order to resolve strongly the hidden part of any scene objects. To adequately meet the visibility condition for scene objects and given that occlusion of the same scene objects can occur, a multi-plane camera structure is highly desirable. Therefore, this thesis also explore trapezoidal camera structure for image acquisition. The approach here is to assess the feasibility and potential of several physical cameras of the same model being sparsely arranged on the edge of an efficient trapezoid graph. This is implemented both Matlab and Maya. The quality of the depth maps rendered in Matlab are better in Quality

Brunel University Research Archive

Cubic-panorama image dataset analysis for storage and transmission

Author
Publication venue: 'SPIE-Intl Soc Optical Eng'
Publication date
Field of study

Crossref

Sketching space

Author: Chapman D
Penn A
Turner A
Publication venue
Publication date: 01/01/2000
Field of study

In this paper, we present a sketch modelling system which we call Stilton. The program resembles a desktop VRML browser, allowing a user to navigate a three-dimensional model in a perspective projection, or panoramic photographs, which the program maps onto the scene as a `floor' and `walls'. We place an imaginary two-dimensional drawing plane in front of the user, and any geometric information that user sketches onto this plane may be reconstructed to form solid objects through an optimization process. We show how the system can be used to reconstruct geometry from panoramic images, or to add new objects to an existing model. While panoramic imaging can greatly assist with some aspects of site familiarization and qualitative assessment of a site, without the addition of some foreground geometry they offer only limited utility in a design context. Therefore, we suggest that the system may be of use in `just-in-time' CAD recovery of complex environments, such as shop floors, or construction sites, by recovering objects through sketched overlays, where other methods such as automatic line-retrieval may be impossible. The result of using the system in this manner is the `sketching of space' - sketching out a volume around the user - and once the geometry has been recovered, the designer is free to quickly sketch design ideas into the newly constructed context, or analyze the space around them. Although end-user trials have not, as yet, been undertaken we believe that this implementation may afford a user-interface that is both accessible and robust, and that the rapid growth of pen-computing devices will further stimulate activity in this area

CiteSeerX

UCL Discovery

Image-Based Rendering Of Real Environments For Virtual Reality

Author: Bertel Tobias
Publication venue
Publication date: 14/02/2022
Field of study

OPUS

Free Viewpoint Video Based on Stitching Technique

Author: Skaik Rami Othman M.H.
Publication venue: الجامعة الإسلامية - غزة
Publication date: 01/01/2014
Field of study

Image stitching is a technique used for creating one panoramic scene from multiple images. It is used in panoramic photography and video where the viewer can only scroll horizontally and vertically across the scene. However, stitching has not been used for creating free-viewpoint videos (FVV) where viewers can change their viewing points freely and smoothly while playing the video. current research, implemented FVV playing system using image stitching, this system allows users to enjoy the capability of moving their viewpoint freely and smoothly. To develop this system, user should capture MVV from different viewpoints and with appropriate region area for each pair of cameras then the system stitch the overlapped video to create stitched video/videos to display it in FVV playing system with applying freely and smoothly switching and interpolation of viewpoints over video playback. Current research evaluated the performance of video playing system based on system idea, system accuracy, smoothness, and user satisfaction. The results of evaluation have been very positive in most aspects

Institutional Repository of the Islamic University of Gaza

360º hypervideo

Author: Neng Luís António da Rosa
Publication venue
Publication date: 01/01/2011
Field of study

Tese de mestrado em Informática, apresentada à Universidade de Lisboa, através da Faculdade de Ciências, 2011Nesta dissertação descrevemos uma abordagem para o design e desenvolvimento de uma interface imersiva e interactiva para a visualização e navegação de hipervídeos em 360º através da internet. Estes tipos de hipervídeos permite aos utilizadores movimentarem-se em torno de um eixo para visualizar os conteúdos dos vídeos em diferentes ângulos e acedê los de forma eficiente através de hiperligações. Desafios para a apresentação deste tipo de hipervídeos incluem: proporcionar aos utilizadores uma interface adequada que seja capaz de explorar conteúdos em 360º num ecrã normal, onde o vídeo deve mudar de perspectiva para que os utilizadores sintam que estão a olhar ao redor, e formas de navegação adequadas para compreenderem facilmente a estrutura do hipervídeo, mesmo quando as hiperligações estejam fora do alcance do campo de visão. Os dispositivos para a captura de vídeo em 360º, bem como as formas de os disponibilizar na Web, são cada vez mais comuns e acessíveis ao público em geral. Neste contexto, é pertinente explorar formas e técnicas de navegação para visualizar e interagir com hipervídeos em 360º. Tradicionalmente, para visualizar o conteúdo de um vídeo, o utilizador fica limitado à região para onde a câmara estava apontada durante a sua captura, o que significa que o vídeo resultante terá limites laterais. Com a gravação de vídeo em 360º, já não há estes limites: abrindo novas direcções a explorar. Um player de hipervídeo em 360º vai permitir aos utilizadores movimentarem-se à volta para visualizar o resto do conteúdo e aceder de forma fácil às informações fornecidas pelas hiperligações. O vídeo é um tipo de informação muito rico que apresenta uma enorme quantidade de informação que muda ao longo do tempo. Um vídeo em 360º apresenta ainda mais informações ao mesmo tempo e acrescenta desafios, pois nem tudo está dentro do nosso campo de visão. No entanto, proporciona ao utilizador uma nova experiência de visualização potencialmente imersiva. Exploramos técnicas de navegação para ajudar os utilizadores a compreenderem e navegarem facilmente um espaço de hipervídeo a 360º e proporcionar uma experiência de visualização a outro nível, através dum espaço hipermédia imersivo. As hiperligações levam o utilizador para outros conteúdos hipermédia relacionados, tais como textos, imagens e vídeos ou outras páginas na Web. Depois de terminar a reprodução ou visualização dos conteúdos relacionados, o utilizador poderá retornar à posição anterior no vídeo. Através da utilização de técnicas de sumarização, podemos ainda fornecer aos utilizadores um sumário de todo o conteúdo do vídeo para que possam visualizá-lo e compreendê-lo duma forma mais eficiente e flexível, sem necessitar de visualizar o vídeo todo em sequência. O vídeo tem provado ser uma das formas mais eficientes de comunicação, permitindo a apresentação de um leque enorme e variado de informação num curto período de tempo. Os vídeos em 360º podem fornecer ainda mais informação, podendo ser mapeados sobre projecções cilíndricas ou esféricas. A projecção cilíndrica foi inventada em 1796 pelo pintor Robert Barker de Edimburgo que obteve a sua patente. A utilização de vídeo na Web tem consistido essencialmente na sua inclusão nas páginas, onde são visualizados de forma linear, e com interacções em geral limitadas às acções de play e pause, fast forward e reverse. Nos últimos anos, os avanços mais promissores no sentido do vídeo interactivo parecem ser através de hipervídeo, proporcionando uma verdadeira integração do vídeo em espaços hipermédia, onde o conteúdo pode ser estruturado e navegado através de hiperligações definidas no espaço e no tempo e de mecanismos de navegação interactivos flexíveis. Ao estender o conceito de hipervídeo para 360º, surgem novos desafios, principalmente porque grande parte do conteúdo está fora do campo de visão. O player de hipervídeo a 360º tem que fornecer aos utilizadores mecanismos apropriados para facilitar a percepção da estrutura do hipervídeo, para navegar de forma eficiente no espaço hipervídeo a 360º e idealmente proporcionar uma experiência imersiva. Para poder navegar num espaço hipervídeo a 360º, necessitamos de novos mecanismos de navegação. Apresentamos os principais mecanismos concebidos para visualização deste tipo de hipervídeo e soluções para os principais desafios em hipermédia: desorientação e sobrecarga cognitiva, agora no contexto de 360º. Focamos, essencialmente, os mecanismos de navegação que ajudam o utilizador a orientar-se no espaço de 360º. Desenvolvemos uma interface que funciona por arrastamento para a navegação no vídeo em 360º. Esta interface permite que o utilizador movimente o vídeo para visualizar o conteúdo em diferentes ângulos. O utilizador só precisa de arrastar o cursor para a esquerda ou para a direita para movimentar o campo de visão. Pode no entanto movimentar-se apenas para um dos lados para dar a volta sem qualquer tipo de limitação. A percepção da localização e do ângulo de visualização actual tornou-se um problema devido à falta de limites laterais. Durante os nossos testes, muitos utilizadores sentiram-se perdidos no espaço de 360º, sem saber que ângulos é que estavam a visualizar. Em hipervídeo, a percepção de hiperligações é mais desafiante do que em hipermédia tradicional porque as hiperligações podem ter duração, podem coexistir no tempo e no espaço e o vídeo muda ao longo do tempo. Assim, são precisos mecanismos especiais, para torná-las perceptíveis aos utilizadores. Em hipervídeo em 360º, grande parte do conteúdo é invisível ao utilizador por não estar no campo de visão, logo será necessário estudar novas abordagens e mecanismos para indicar a existência de hiperligações. Criámos os Hotspots Availability e Location Indicators para permitir aos utilizadores saberem a existência e a localização de cada uma das hiperligações. O posicionamento dos indicadores de hotspots availabity no eixo da ordenada, nas margens laterais do vídeo, serve para indicar em que posição vertical está cada uma das hiperligações. O tamanho do indicador serve para indicar a distância do hotspot em relação ao ângulo de visualização. Quanto mais perto fica o hotspot, maior é o indicador. Os indicadores são semi-transparentes e estão posicionados nas margens laterais para minimizar o impacto que têm sobre o conteúdo do vídeo. O Mini Map também fornece informações acerca da existência e localização de hotspots, que deverão conter alguma informação do conteúdo de destino, para que o utilizador possa ter alguma expectativa acerca do que vai visualizar depois de seguir a hiperligação. Uma caixa de texto com aspecto de balão de banda desenhada permite acomodar várias informações relevantes. Quando os utilizadores seleccionam o hotspot, poderão ser redireccionados para um tempo pré-definido do vídeo ou uma página com informação adicional ou a selecção pode ser memorizada pelo sistema e o seu conteúdo ser mostrado apenas quando o utilizador desejar, dependendo do tipo de aplicação. Por exemplo, se a finalidade do vídeo for o apoio à aprendizagem (e-learning), pode fazer mais sentido abrir logo o conteúdo da hiperligação, pois os utilizadores estão habituados a ver aquele tipo de informação passo a passo. Se o vídeo for de entretenimento, os utilizadores provavelmente não gostam de ser interrompidos pela abertura do novo conteúdo, podendo optar pela memorização da hiperligação, e pelo seu acesso posterior, quando quiserem. Para além do título e da descrição do vídeo, o modo Image Map fornece uma visualização global do conteúdo do vídeo. As pré-visualizações (thumbnails) referem-se às cenas do vídeo e são representadas através duma projecção cilíndrica, para que todo o conteúdo ao longo do tempo possa ser visualizado. Permite também, de forma sincronizada, saber a cena actual e oferece ao utilizador a possibilidade de navegar para outras cenas. Toda a área de pré-visualização é sensível ao clique e determina as coordenadas da pré-visualização que o utilizador seleccionou. Uma versão mais condensada disponibiliza apenas a pré-visualização da parte central de cada uma das cenas. Permite a apresentação simultânea de um maior número de cenas, mas limita a visualização e a flexibilidade para navegar para o ângulo desejado de forma mais directa. Algumas funcionalidades também foram adicionadas à linha de tempo (timeline), ou Barra de Progresso. Para além dos tradicionais botões de Play, Pause e Tempo de Vídeo, estendemos a barra para adaptar a algumas características de uma página Web. Como é um Player desenvolvido para funcionar na internet, precisamos de ter em conta que é preciso tempo para carregar o vídeo. A barra de bytes loaded indica ao utilizador o progresso do carregamento do vídeo e não permite que o utilizador aceda às informações que ainda não foram carregadas. O hiperespaço é navegado em contextos espácio-temporais que a história recorda. A barra de memória, Memory Bar, fornece informação ao utilizador acerca das partes do vídeo que já foram visualizadas. O botão Toogle Full Screen alterna o modo de visualização do vídeo entre full e standard screen . O modo full screen leva o utilizador para fora das limitações do browser e maximiza o conteúdo do vídeo para o tamanho do ecrã. É mais um passo para um modo de visualização imersiva, por exemplo numa projecção 360º dentro duma Cave, como estamos a considerar explorar em trabalho futuro. Nesta dissertação, apresentamos uma abordagem para a visualização e interacção de vídeos em 360º. A navegação num espaço de vídeo em 360º apresenta uma nova experiência para grande parte das pessoas e não existem ainda intuições consistentes sobre o comportamento deste tipo de navegação. Os utilizadores, muito provavelmente, vão sentir o problema que inicialmente houve com o hipertexto, em que o utilizador se sentia perdido no hiperespaço. Por isso, o Player de Hipervídeo a 360º tem que ser o mais claro e eficaz possível para que os utilizadores possam interagir facilmente. O teste de usabilidade foi feito com base no questionário USE e entrevistas aos utilizadores de modo a determinar a usabilidade e experiência de acordo com os seus comentários, sugestões e preocupações sobre as funcionalidades, mecanismos de acesso ou de representação de informação fornecidos. Os resultados dos testes e comentários obtidos, permitiu-nos obter mais informação sobre a usabilidade do player e identificar as possíveis melhorias. Em resumo, os comentários dos utilizadores foram muito positivos e úteis que nos ajudará a continuar a trabalhar na investigação do Hipervídeo 360º. O trabalho futuro consiste na realização de mais testes de usabilidade e desenvolvimento de diferentes versões do Player de Hipervídeo em 360º, com mecanismos de navegação revistos e estendidos, com base nos resultados das avaliações. O Player de Hipervídeo em 360º não deverá ser apenas uma aplicação para Web, deverá poder integrar com quiosques multimédia ou outras instalações imersivas. Provavelmente serão necessárias novas funcionalidades e tipos de navegação para adaptar a diferentes contextos. O exemplo do Player de Hipervídeo em 360º apresentado neste artigo utiliza um Web browser e um rato como meio de apresentação e interacção. Com o crescimento das tecnologias de vídeo 3D, multi-toque e eye-tracking, podem surgir novas formas de visualização e de interacção com o espaço 360º. Estas novas formas trazem novos desafios mas também um potencial acrescido de novas experiências a explorar.In traditional video, the user is locked to the angle where the camera was pointing to during the capture of the video. With 360º video recording, there are no longer these boundaries, and 360º video capturing devices are becoming more common and affordable to the general public. Hypervideo stretches boundaries even further, allowing to explore the video and to navigate to related information. By extending the hypervideo concept into the 360º video, which we call 360º hypervideo, new challenges arise. Challenges for presenting this type of hypervideo include: providing users with an appropriate interface capable to explore 360º contents, where the video should change perspective so that the users actually get the feeling of looking around; and providing the appropriate affordances to understand the hypervideo structure and to navigate it effectively in a 360º hypervideo space, even when link opportunities arise in places outside the current viewport. In this thesis, we describe an approach to the design and development of an immersive and interactive interface for the visualization and navigation of 360º hypervideos. Such interface allow users to pan around to view the contents in different angles and effectively access related information through the hyperlinks. Then a user study was conducted to evaluate the 360º Hypervideo Player’s user interface and functionalities. By collecting specific and global comments, concerns and suggestions for functionalities and access mechanisms that would allow us to gain more awareness about the player usability and identify directions for improvements and finally we draw some conclusions and opens perspectives for future work

Universidade de Lisboa: Repositório.UL

Multiperspective mosaics and layered representation for scene visualization

Author: Ng Jin-Choon
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2003
Field of study

This thesis documents the efforts made to implement multiperspective mosaicking for the purpose of mosaicking undervehicle and roadside sequences. For the undervehicle sequences, it is desired to create a large, high-resolution mosaic that may used to quickly inspect the entire scene shot by a camera making a single pass underneath the vehicle. Several constraints are placed on the video data, in order to facilitate the assumption that the entire scene in the sequence exists on a single plane. Therefore, a single mosaic is used to represent a single video sequence. Phase correlation is used to perform motion analysis in this case. For roadside video sequences, it is assumed that the scene is composed of several planar layers, as opposed to a single plane. Layer extraction techniques are implemented in order to perform this decomposition. Instead of using phase correlation to perform motion analysis, the Lucas-Kanade motion tracking algorithm is used in order to create dense motion maps. Using these motion maps, spatial support for each layer is determined based on a pre-initialized layer model. By separating the pixels in the scene into motion-specific layers, it is possible to sample each element in the scene correctly while performing multiperspective mosaicking. It is also possible to fill in many gaps in the mosaics caused by occlusions, hence creating more complete representations of the objects of interest. The results are several mosaics with each mosaic representing a single planar layer of the scene

University of Tennessee, Knoxville: Trace

Spherical Image Processing for Immersive Visualisation and View Generation

Author: Guan Xiao Yin
Publication venue
Publication date
Field of study

This research presents the study of processing panoramic spherical images for immersive visualisation of real environments and generation of in-between views based on two views acquired. For visualisation based on one spherical image, the surrounding environment is modelled by a unit sphere mapped with the spherical image and the user is then allowed to navigate within the modelled scene. For visualisation based on two spherical images, a view generation algorithm is developed for modelling an indoor manmade environment and new views can be generated at an arbitrary position with respect to the existing two. This allows the scene to be modelled using multiple spherical images and the user to move smoothly from one sphere mapped image to another one by going through in-between sphere mapped images generated

CLoK