Search CORE

125 research outputs found

Postprocessing of images coded using block DCT at low bit rates.

Author
Publication venue
Publication date: 01/01/2007
Field of study

Sun, Deqing.Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.Includes bibliographical references (leaves 86-91).Abstracts in English and Chinese.Abstract --- p.i摘要 --- p.iiiContributions --- p.ivAcknowledgement --- p.viAbbreviations --- p.xviiiNotations --- p.xxiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Image compression and postprocessing --- p.1Chapter 1.2 --- A brief review of postprocessing --- p.3Chapter 1.3 --- Objective and methodology of the research --- p.7Chapter 1.4 --- Thesis organization --- p.8Chapter 1.5 --- A note on publication --- p.9Chapter 2 --- Background Study --- p.11Chapter 2.1 --- Image models --- p.11Chapter 2.1.1 --- Minimum edge difference (MED) criterion for block boundaries --- p.12Chapter 2.1.2 --- van Beek's edge model for an edge --- p.15Chapter 2.1.3 --- Fields of experts (FoE) for an image --- p.16Chapter 2.2 --- Degradation models --- p.20Chapter 2.2.1 --- Quantization constraint set (QCS) and uniform noise --- p.21Chapter 2.2.2 --- Narrow quantization constraint set (NQCS) --- p.22Chapter 2.2.3 --- Gaussian noise --- p.22Chapter 2.2.4 --- Edge width enlargement after quantization --- p.25Chapter 2.3 --- Use of these models for postprocessing --- p.27Chapter 2.3.1 --- MED and edge models --- p.27Chapter 2.3.2 --- The FoE prior model --- p.27Chapter 3 --- Postprocessing using MED and edge models --- p.28Chapter 3.1 --- Blocking artifacts suppression by coefficient restoration --- p.29Chapter 3.1.1 --- AC coefficient restoration by MED --- p.29Chapter 3.1.2 --- General derivation --- p.31Chapter 3.2 --- Detailed algorithm --- p.34Chapter 3.2.1 --- Edge identification --- p.36Chapter 3.2.2 --- Region classification --- p.36Chapter 3.2.3 --- Edge reconstruction --- p.37Chapter 3.2.4 --- Image reconstruction --- p.37Chapter 3.3 --- Experimental results --- p.38Chapter 3.3.1 --- Results of the proposed method --- p.38Chapter 3.3.2 --- Comparison with one wavelet-based method --- p.39Chapter 3.4 --- On the global minimum of the edge difference . . --- p.41Chapter 3.4.1 --- The constrained minimization problem . . --- p.41Chapter 3.4.2 --- Experimental examination --- p.42Chapter 3.4.3 --- Discussions --- p.43Chapter 3.5 --- Conclusions --- p.44Chapter 4 --- Postprocessing by the MAP criterion using FoE --- p.49Chapter 4.1 --- The proposed method --- p.49Chapter 4.1.1 --- The MAP criterion --- p.49Chapter 4.1.2 --- The optimization problem --- p.51Chapter 4.2 --- Experimental results --- p.52Chapter 4.2.1 --- Setting algorithm parameters --- p.53Chapter 4.2.2 --- Results --- p.56Chapter 4.3 --- Investigation on the quantization noise model . . --- p.58Chapter 4.4 --- Conclusions --- p.61Chapter 5 --- Conclusion --- p.71Chapter 5.1 --- Contributions --- p.71Chapter 5.1.1 --- Extension of the DCCR algorithm --- p.71Chapter 5.1.2 --- Examination of the MED criterion --- p.72Chapter 5.1.3 --- Use of the FoE prior in postprocessing . . --- p.72Chapter 5.1.4 --- Investigation on the quantization noise model --- p.73Chapter 5.2 --- Future work --- p.73Chapter 5.2.1 --- Degradation model --- p.73Chapter 5.2.2 --- Efficient implementation of the MAP method --- p.74Chapter 5.2.3 --- Postprocessing of compressed video --- p.75Chapter A --- Detailed derivation of coefficient restoration --- p.76Chapter B --- Implementation details of the FoE prior --- p.81Chapter B.1 --- The FoE prior model --- p.81Chapter B.2 --- Energy function and its gradient --- p.83Chapter B.3 --- Conjugate gradient descent method --- p.84Bibliography --- p.8

CUHK Digital Repository

Motion compensated blocking artefact repair on low bit rate block transform coded video

Author: Coezijn E.R.E.
Publication venue
Publication date: 01/01/2005
Field of study

CiteSeerX

Repository TU/e

Pure OAI Repository

Image representation and compression using steered hermite transforms

Author: Dijk van, A.M.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1997
Field of study

Repository TU/e

Pure OAI Repository

Realistic Visualization of Animated Virtual Cloth

Author: Sattler Mirko
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Photo-realistic rendering of real-world objects is a broad research area with applications in various different areas, such as computer generated films, entertainment, e-commerce and so on. Within photo-realistic rendering, the rendering of cloth is a subarea which involves many important aspects, ranging from material surface reflection properties and macroscopic self-shadowing to animation sequence generation and compression. In this thesis, besides an introduction to the topic plus a broad overview of related work, different methods to handle major aspects of cloth rendering are described. Material surface reflection properties play an important part to reproduce the look & feel of materials, that is, to identify a material only by looking at it. The BTF (bidirectional texture function), as a function of viewing and illumination direction, is an appropriate representation of reflection properties. It captures effects caused by the mesostructure of a surface, like roughness, self-shadowing, occlusion, inter-reflections, subsurface scattering and color bleeding. Unfortunately a BTF data set of a material consists of hundreds to thousands of images, which exceeds current memory size of personal computers by far. This work describes the first usable method to efficiently compress and decompress a BTF data for rendering at interactive to real-time frame rates. It is based on PCA (principal component analysis) of the BTF data set. While preserving the important visual aspects of the BTF, the achieved compression rates allow the storage of several different data sets in main memory of consumer hardware, while maintaining a high rendering quality. Correct handling of complex illumination conditions plays another key role for the realistic appearance of cloth. Therefore, an upgrade of the BTF compression and rendering algorithm is described, which allows the support of distant direct HDR (high-dynamic-range) illumination stored in environment maps. To further enhance the appearance, macroscopic self-shadowing has to be taken into account. For the visualization of folds and the life-like 3D impression, these kind of shadows are absolutely necessary. This work describes two methods to compute these shadows. The first is seamlessly integrated into the illumination part of the rendering algorithm and optimized for static meshes. Furthermore, another method is proposed, which allows the handling of dynamic objects. It uses hardware-accelerated occlusion queries for the visibility determination. In contrast to other algorithms, the presented algorithm, despite its simplicity, is fast and produces less artifacts than other methods. As a plus, it incorporates changeable distant direct high-dynamic-range illumination. The human perception system is the main target of any computer graphics application and can also be treated as part of the rendering pipeline. Therefore, optimization of the rendering itself can be achieved by analyzing human perception of certain visual aspects in the image. As a part of this thesis, an experiment is introduced that evaluates human shadow perception to speedup shadow rendering and provides optimization approaches. Another subarea of cloth visualization in computer graphics is the animation of the cloth and avatars for presentations. This work also describes two new methods for automatic generation and compression of animation sequences. The first method to generate completely new, customizable animation sequences, is based on the concept of finding similarities in animation frames of a given basis sequence. Identifying these similarities allows jumps within the basis sequence to generate endless new sequences. Transmission of any animated 3D data over bandwidth-limited channels, like extended networks or to less powerful clients requires efficient compression schemes. The second method included in this thesis in the animation field is a geometry data compression scheme. Similar to the BTF compression, it uses PCA in combination with clustering algorithms to segment similar moving parts of the animated objects to achieve high compression rates in combination with a very exact reconstruction quality.Realistische Visualisierung von animierter virtueller Kleidung Das photorealistisches Rendering realer Gegenstände ist ein weites Forschungsfeld und hat Anwendungen in vielen Bereichen. Dazu zählen Computer generierte Filme (CGI), die Unterhaltungsindustrie und E-Commerce. Innerhalb dieses Forschungsbereiches ist das Rendern von photorealistischer Kleidung ein wichtiger Bestandteil. Hier reichen die wichtigen Aspekte, die es zu berücksichtigen gilt, von optischen Materialeigenschaften über makroskopische Selbstabschattung bis zur Animationsgenerierung und -kompression. In dieser Arbeit wird, neben der Einführung in das Thema, ein weiter Überblick über ähnlich gelagerte Arbeiten gegeben. Der Schwerpunkt der Arbeit liegt auf den wichtigen Aspekten der virtuellen Kleidungsvisualisierung, die oben beschrieben wurden. Die optischen Reflektionseigenschaften von Materialoberflächen spielen eine wichtige Rolle, um das so genannte look & feel von Materialien zu charakterisieren. Hierbei kann ein Material vom Nutzer identifiziert werden, ohne dass er es direkt anfassen muss. Die BTF (bidirektionale Texturfunktion)ist eine Funktion die abhängig von der Blick- und Beleuchtungsrichtung ist. Daher ist sie eine angemessene Repräsentation von Reflektionseigenschaften. Sie enthält Effekte wie Rauheit, Selbstabschattungen, Verdeckungen, Interreflektionen, Streuung und Farbbluten, die durch die Mesostruktur der Oberfläche hervorgerufen werden. Leider besteht ein BTF Datensatz eines Materials aus hunderten oder tausenden von Bildern und sprengt damit herkömmliche Hauptspeicher in Computern bei weitem. Diese Arbeit beschreibt die erste praktikable Methode, um BTF Daten effizient zu komprimieren, zu speichern und für Echtzeitanwendungen zum Visualisieren wieder zu dekomprimieren. Die Methode basiert auf der Principal Component Analysis (PCA), die Daten nach Signifikanz ordnet. Während die PCA die entscheidenen visuellen Aspekte der BTF erhält, können mit ihrer Hilfe Kompressionsraten erzielt werden, die es erlauben mehrere BTF Materialien im Hauptspeicher eines Consumer PC zu verwalten. Dies erlaubt ein High-Quality Rendering. Korrektes Verwenden von komplexen Beleuchtungssituationen spielt eine weitere, wichtige Rolle, um Kleidung realistisch erscheinen zu lassen. Daher wird zudem eine Erweiterung des BTF Kompressions- und Renderingalgorithmuses erläutert, die den Einsatz von High-Dynamic Range (HDR) Beleuchtung erlaubt, die in environment maps gespeichert wird. Um die realistische Erscheinung der Kleidung weiter zu unterstützen, muss die makroskopische Selbstabschattung integriert werden. Für die Visualisierung von Falten und den lebensechten 3D Eindruck ist diese Art von Schatten absolut notwendig. Diese Arbeit beschreibt daher auch zwei Methoden, diese Schatten schnell und effizient zu berechnen. Die erste ist nahtlos in den Beleuchtungspart des obigen BTF Renderingalgorithmuses integriert und für statische Geometrien optimiert. Die zweite Methode behandelt dynamische Objekte. Dazu werden hardwarebeschleunigte Occlusion Queries verwendet, um die Sichtbarkeitsberechnung durchzuführen. Diese Methode ist einerseits simpel und leicht zu implementieren, anderseits ist sie schnell und produziert weniger Artefakte, als vergleichbare Methoden. Zusätzlich ist die Verwendung von veränderbarer, entfernter HDR Beleuchtung integriert. Das menschliche Wahrnehmungssystem ist das eigentliche Ziel jeglicher Anwendung in der Computergrafik und kann daher selbst als Teil einer erweiterten Rendering Pipeline gesehen werden. Daher kann das Rendering selbst optimiert werden, wenn man die menschliche Wahrnehmung verschiedener visueller Aspekte der berechneten Bilder analysiert. Teil der vorliegenden Arbeit ist die Beschreibung eines Experimentes, das menschliche Schattenwahrnehmung untersucht, um das Rendern der Schatten zu beschleunigen. Ein weiteres Teilgebiet der Kleidungsvisualisierung in der Computergrafik ist die Animation der Kleidung und von Avataren für Präsentationen. Diese Arbeit beschreibt zwei neue Methoden auf diesem Teilgebiet. Einmal ein Algorithmus, der für die automatische Generierung neuer Animationssequenzen verwendet werden kann und zum anderen einen Kompressionsalgorithmus für eben diese Sequenzen. Die automatische Generierung von völlig neuen, anpassbaren Animationen basiert auf dem Konzept der Ähnlichkeitssuche. Hierbei werden die einzelnen Schritte von gegebenen Basisanimationen auf Ähnlichkeiten hin untersucht, die zum Beispiel die Geschwindigkeiten einzelner Objektteile sein können. Die Identifizierung dieser Ähnlichkeiten erlaubt dann Sprünge innerhalb der Basissequenz, die dazu benutzt werden können, endlose, neue Sequenzen zu erzeugen. Die Übertragung von animierten 3D Daten über bandbreitenlimitierte Kanäle wie ausgedehnte Netzwerke, Mobilfunk oder zu sogenannten thin clients erfordert eine effiziente Komprimierung. Die zweite, in dieser Arbeit vorgestellte Methode, ist ein Kompressionsschema für Geometriedaten. Ähnlich wie bei der Kompression von BTF Daten wird die PCA in Verbindung mit Clustering benutzt, um die animierte Geometrie zu analysieren und in sich ähnlich bewegende Teile zu segmentieren. Diese erkannten Segmente lassen sich dann hoch komprimieren. Der Algorithmus arbeitet automatisch und erlaubt zudem eine sehr exakte Rekonstruktionsqualität nach der Dekomprimierung

bonndoc – Der Publikationsserver der Universität Bonn

The 1993 Space and Earth Science Data Compression Workshop

Author: Tilton James C.
Publication venue
Publication date
Field of study

The Earth Observing System Data and Information System (EOSDIS) is described in terms of its data volume, data rate, and data distribution requirements. Opportunities for data compression in EOSDIS are discussed

NASA Technical Reports Server

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Author: Chen Hao
Chen Taolue
Huang Yunpeng
Jiang Zixu
Lai Junyu
Li Shupeng
Li Zenan
Ma Xiaoxing
Xu Jingwei
Yang Lijuan
Yao Yuan
Zhao Penghao
Publication venue
Publication date: 23/02/2024
Field of study

Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current LLMs are predominantly pretrained on short text snippets, which compromises their effectiveness in processing the long-context prompts that are frequently encountered in practical scenarios. This article offers a comprehensive survey of the recent advancement in Transformer-based LLM architectures aimed at enhancing the long-context capabilities of LLMs throughout the entire model lifecycle, from pre-training through to inference. We first delineate and analyze the problems of handling long-context input and output with the current Transformer-based models. We then provide a taxonomy and the landscape of upgrades on Transformer architecture to solve these problems. Afterwards, we provide an investigation on wildly used evaluation necessities tailored for long-context LLMs, including datasets, metrics, and baseline models, as well as optimization toolkits such as libraries, frameworks, and compilers to boost the efficacy of LLMs across different stages in runtime. Finally, we discuss the challenges and potential avenues for future research. A curated repository of relevant literature, continuously updated, is available at https://github.com/Strivin0311/long-llms-learning.Comment: 40 pages, 3 figures, 4 table

arXiv.org e-Print Archive

Model- and image-based scene representation.

Author
Publication venue
Publication date: 01/01/1999
Field of study

Lee Kam Sum.Thesis (M.Phil.)--Chinese University of Hong Kong, 1999.Includes bibliographical references (leaves 97-101).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.2Chapter 1.1 --- Video representation using panorama mosaic and 3D face model --- p.2Chapter 1.2 --- Mosaic-based Video Representation --- p.3Chapter 1.3 --- "3D Human Face modeling ," --- p.7Chapter 2 --- Background --- p.13Chapter 2.1 --- Video Representation using Mosaic Image --- p.13Chapter 2.1.1 --- Traditional Video Compression --- p.17Chapter 2.2 --- 3D Face model Reconstruction via Multiple Views --- p.19Chapter 2.2.1 --- Shape from Silhouettes --- p.19Chapter 2.2.2 --- Head and Face Model Reconstruction --- p.22Chapter 2.2.3 --- Reconstruction using Generic Model --- p.24Chapter 3 --- System Overview --- p.27Chapter 3.1 --- Panoramic Video Coding Process --- p.27Chapter 3.2 --- 3D Face model Reconstruction Process --- p.28Chapter 4 --- Panoramic Video Representation --- p.32Chapter 4.1 --- Mosaic Construction --- p.32Chapter 4.1.1 --- Cylindrical Panorama Mosaic --- p.32Chapter 4.1.2 --- Cylindrical Projection of Mosaic Image --- p.34Chapter 4.2 --- Foreground Segmentation and Registration --- p.37Chapter 4.2.1 --- Segmentation Using Panorama Mosaic --- p.37Chapter 4.2.2 --- Determination of Background by Local Processing --- p.38Chapter 4.2.3 --- Segmentation from Frame-Mosaic Comparison --- p.40Chapter 4.3 --- Compression of the Foreground Regions --- p.44Chapter 4.3.1 --- MPEG-1 Compression --- p.44Chapter 4.3.2 --- MPEG Coding Method: I/P/B Frames --- p.45Chapter 4.4 --- Video Stream Reconstruction --- p.48Chapter 5 --- Three Dimensional Human Face modeling --- p.52Chapter 5.1 --- Capturing Images for 3D Face modeling --- p.53Chapter 5.2 --- Shape Estimation and Model Deformation --- p.55Chapter 5.2.1 --- Head Shape Estimation and Model deformation --- p.55Chapter 5.2.2 --- Face organs shaping and positioning --- p.58Chapter 5.2.3 --- Reconstruction with both intrinsic and extrinsic parameters --- p.59Chapter 5.2.4 --- Reconstruction with only Intrinsic Parameter --- p.63Chapter 5.2.5 --- Essential Matrix --- p.65Chapter 5.2.6 --- Estimation of Essential Matrix --- p.66Chapter 5.2.7 --- Recovery of 3D Coordinates from Essential Matrix --- p.67Chapter 5.3 --- Integration of Head Shape and Face Organs --- p.70Chapter 5.4 --- Texture-Mapping --- p.71Chapter 6 --- Experimental Result & Discussion --- p.74Chapter 6.1 --- Panoramic Video Representation --- p.74Chapter 6.1.1 --- Compression Improvement from Foreground Extraction --- p.76Chapter 6.1.2 --- Video Compression Performance --- p.78Chapter 6.1.3 --- Quality of Reconstructed Video Sequence --- p.80Chapter 6.2 --- 3D Face model Reconstruction --- p.91Chapter 7 --- Conclusion and Future Direction --- p.94Bibliography --- p.10

CUHK Digital Repository

The Telecommunications and Data Acquisition Report

Author: Posner E. C.
Publication venue
Publication date: 15/02/1986
Field of study

This publication, one of a series formerly titled The Deep Space Network Progress Report, documents DSN progress in flight project support, tracking and data acquisition research and technology, network engineering, hardware and software implementation, and operations. In addition, developments in Earth-based radio technology as applied to geodynamics, astrophysics and the radio search for extraterrestrial intelligence are reported

NASA Technical Reports Server

Automatic video segmentation employing object/camera modeling techniques

Author: Farin D.S.
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/2005
Field of study

Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

Repository TU/e

Pure OAI Repository