675 research outputs found
Shape representation and coding of visual objets in multimedia applications — An overview
Emerging multimedia applications have created the need for new functionalities in digital communications. Whereas existing compression standards only deal with the audio-visual scene at a frame level, it is now necessary to handle individual objects separately, thus allowing scalable transmission as well as interactive scene recomposition by the receiver. The future MPEG-4 standard aims at providing compression tools addressing these functionalities. Unlike existing frame-based standards, the corresponding coding schemes need to encode shape information explicitly. This paper reviews existing solutions to the problem of shape representation and coding. Region and contour coding techniques are presented and their performance is discussed, considering coding efficiency and rate-distortion control capability, as well as flexibility to application requirements such as progressive transmission, low-delay coding, and error robustnes
Multiorder polygonal approximation of digital curves
In this paper, we propose a quick threshold-free algorithm, which computes the angular shape of a 2D object from the points of its contour. For that, we have extended the method defined in [4, 5] to a multiorder analysis. It is based on the arithmetical definition of discrete lines [11] with variable thickness. We provide a framework to analyse a digital curve at different levels of thickness. The extremities of a segment provided at a high resolution are tracked at lower resolution in order to refine their location. The method is thresholdfree and automatically provides a partitioning of a digital curve into its meaningful parts
On the Detection of Visual Features from Digital Curves using a Metaheuristic Approach
In computational shape analysis a crucial step consists in extracting meaningful features from digital curves. Dominant points are those points with curvature extreme on the curve that can suitably describe the curve both for visual perception and for recognition. Many approaches have been developed for detecting dominant points. In this paper we present a novel method that combines the dominant point detection and the ant colony optimization search. The method is inspired by the ant colony search (ACS) suggested by Yin in [1] but it results in a much more efficient and effective approximation algorithm. The excellent results have been compared both to works using an optimal search approach and to works based on exact approximation strateg
Dominant points detection for shape analysis
The growing interest in recent years towards the multimedia and the large amount of information exchanged across the network involves the various fields of research towards the study of methods for automatic identification. One of
the main objectives is to associate the information content of images, using techniques for identifying composing objects. Among image descriptors, contours reveal are very important because most of the information can be extracted
from them and the contour analysis offers a lower computational complexity also. The contour analysis can be restricted to the study of some salient points with high curvature from which it is possible to reconstruct the original contour. The thesis is focused on the polygonal approximation of closed digital curves. After an overview of the most common shape descriptors, distinguished between
simple descriptors and external methods, that focus on the analysis of boundary points of objects, and internal methods, which use the pixels inside the object also, a description of the major methods regarding the extraction of dominant points studied so far and the metrics typically used to evaluate the goodness of the polygonal approximation found is given. Three novel approaches to the
problem are then discussed in detail: a fast iterative method (DPIL), more suitable for realtime processing, and two metaheuristics methods (GAPA, ACOPA) based on genetic algorithms and Ant Colony Optimization (ACO), more com-
plex from the point of view of the calculation, but more precise. Such techniques are then compared with the other main methods cited in literature, in order to
assess the performance in terms of computational complexity and polygonal approximation error, and measured between them, in order to evaluate the robustness with respect to affine transformations and conditions of noise. Two
new techniques of shape matching, i.e. identification of objects belonging to the same class in a database of images, are then described. The first one is based on the shape alignment and the second is based on a correspondence by ACO, which puts in evidence the excellent results, both in terms of computational time and recognition accuracy, obtained through the use of dominant points. In the first matching algorithm the results are compared with a selection of dominant points generated by a human operator while in the second the dominant points are used instead of a constant sampling of the outline typically used for
this kind of approach
A discrete geometry approach for dominant point detection
International audienceWe propose two fast methods for dominant point detection and polygonal representation of noisy and possibly disconnected curves based on a study of the decomposition of the curve into the sequence of maximal blurred segments \cite{ND07}. Starting from results of discrete geometry \cite{FT99,Deb05}, the notion of maximal blurred segment of width \cite{ND07} has been proposed, well adapted to noisy curves. The first method uses a fixed parameter that is the width of considered maximal blurred segments. The second one is proposed based on a multi-width approach to obtain a non-parametric method that uses no threshold for working with noisy curves. Comparisons with other methods in the literature prove the efficiency of our approach. Thanks to a recent result \cite{FF08} concerning the construction of the sequence of maximal blurred segments, the complexity of the proposed methods is . An application of vectorization is also given in this paper
Contribuciones sobre métodos óptimos y subóptimos de aproximaciones poligonales de curvas 2-D
Esta tesis versa sobre el an álisis de la forma de objetos 2D. En visión articial existen
numerosos aspectos de los que se pueden extraer información. Uno de los más usados es la
forma o el contorno de esos objetos. Esta característica visual de los objetos nos permite,
mediante el procesamiento adecuado, extraer información de los objetos, analizar escenas, etc.
No obstante el contorno o silueta de los objetos contiene información redundante. Este
exceso de datos que no aporta nuevo conocimiento debe ser eliminado, con el objeto de agilizar
el procesamiento posterior o de minimizar el tamaño de la representación de ese contorno, para
su almacenamiento o transmisión. Esta reducción de datos debe realizarse sin que se produzca
una pérdida de información importante para representación del contorno original. Se puede
obtener una versión reducida de un contorno eliminando puntos intermedios y uniendo los
puntos restantes mediante segmentos. Esta representación reducida de un contorno se conoce
como aproximación poligonal.
Estas aproximaciones poligonales de contornos representan, por tanto, una versión comprimida
de la información original. El principal uso de las mismas es la reducción del volumen
de información necesario para representar el contorno de un objeto. No obstante, en los últimos años estas aproximaciones han sido usadas para el reconocimiento de objetos. Para ello los algoritmos
de aproximaci ón poligonal se han usado directamente para la extracci ón de los vectores
de caracter ísticas empleados en la fase de aprendizaje.
Las contribuciones realizadas por tanto en esta tesis se han centrado en diversos aspectos de
las aproximaciones poligonales. En la primera contribución se han mejorado varios algoritmos
de aproximaciones poligonales, mediante el uso de una fase de preprocesado que acelera estos algoritmos permitiendo incluso mejorar la calidad de las soluciones en un menor tiempo. En la segunda contribución se ha propuesto un nuevo algoritmo de aproximaciones poligonales que obtiene soluciones optimas en un menor espacio de tiempo que el resto de métodos que aparecen en la literatura. En la tercera contribución se ha propuesto un algoritmo de aproximaciones que
es capaz de obtener la solución óptima en pocas iteraciones en la mayor parte de los casos. Por último, se ha propuesto una versi ón mejorada del algoritmo óptimo para obtener aproximaciones poligonales que soluciona otro problema de optimización alternativo.This thesis focus on the analysis of the shape of objects. In computer vision there are
several sources from which we can extract information. One of the most important source of
information is the shape or contour of objects. This visual characteristic can be used to extract
information, analyze the scene, etc.
However, the contour of the objects contains redundant information. This redundant data
does not add new information and therefore, must be deleted in order to minimize the processing
burden and reducing the amount of data to represent that shape. This reduction of data
should be done without losing important information to represent the original contour. A
reduced version of a contour can be obtained by deleting some points of the contour and linking
the remaining points by using line segments. This reduced version of a contour is known as
polygonal approximation in the literature.
Therefore, these polygonal approximation represent a compressed version of the original
information. The main use of polygonal approximations is to reduce the amount of information
needed to represent the contour of an object. However, in recent years polygonal approximations
have been used to recognize objects. For this purpose, the feature vectors have been extracted
from the polygonal approximations.
The contributions proposed in this thesis have focused on several aspects of polygonal approximations.
The rst contribution has improved several algorithms to obtain polygonal approximations,
by adding a new stage of preprocessing which boost the whole method. The
quality of the solutions obtained has also been improved and the computation time reduced.
The second contribution proposes a novel algorithm which obtains optimal polygonal approximations
in a shorter time than the optimal methods found in the literature. The third contribution
proposes a new method which may obtain the optimal solution after few iterations
in most cases. Finally, an improved version of the optimal polygonal approximation algorithm
has been proposed to solve an alternative optimization problem
Contribución al reconocimiento de objetos 2D mediante aproximaciones poligonales
En la presente Tesis Doctoral se realizan aportaciones novedosas en las etapas de
descripción e interpretación del proceso de reconocimiento de objetos bidimensionales.
Se proponen nuevas técnicas de umbralización unimodal aplicadas a la generación
de aproximaciones poligonales. Estas técnicas se han comparado con las estrategias de
umbralización clásicas propuestas por Rosin.
Se propone un nuevo método que obtiene aproximaciones poligonales de manera no
supervisada; es decir, no paramétrica. Este método incorpora una etapa de umbralización
unimodal.
Se ha realizado un análisis exhaustivo del método propuesto para diseñar nuevas
versiones, según la combinación de las características de algunas de sus etapas. Se han
considerado dos estrategias principales: estrategia de división de puntos y estrategia de
fusión de puntos. Se han comparado las nuevas versiones propuestas con respecto al método
original y se han obtenido nuevas versiones que representan una mejora considerable,
mejorando también a todas las estrategias clásicas analizadas.
Se ha incorporado una fase final de optimización, que se basa en el método propuesto
por Masood. Posteriormente, se ha realizado un estudio comparativo para seleccionar la
versión más eficiente de cada estrategia, así como la versión que obtiene el mejor resultado
de entre todas ellas. Las versiones optimizadas mejoran al algoritmo original propuesto y
a todas las demás versiones analizadas.
En resumen, se ha desarrollado un nuevo método heurístico que permite generar
aproximaciones poligonales eficientes de forma no supervisada. Este método puede ser
utilizado en aplicaciones de tiempo real, superando las dificultades que presentan los
algoritmos óptimos, que requieren de una carga computacional mayor.This doctoral thesis introduces original contributions to the description and interpretation
stages of the bidimensional object recognition process.
A new unimodal thresholding approach has been proposed in order to generate poligonal
approximations of bidimensional contours. These techniques have been compared
with the classic thresholding techniques proposed by Rosin.
A new unsupervised method has been proposed. This method obtains poligonal
approximations automatically and also includes the new unimodal thresholding approach
proposed.
An exhausted analysis has been developed in order to design new versions of the proposed
method, according to the combination of its different characteristics. Two strategies
have been considered: point division (split) and point fusion (merge). All the versions have
been compared with the original method and some of them proof to obtain a measurable
improvement. Also, the new versions improve all the classic approaches that have been
analyzed
A final optimization stage has been incorporated. This optimization is based in the
algorithm proposed by Masood. A comparative study has been developed and the best
optimized method has been chosen. The optimized versions improve the original method
proposed and all the versions analyzed.
To summarise, a new heuristic method has been developed. This approach can obtain
efficient polygonal approximations automatically, uses a new unimodal thresholding
algorithm and includes a final optimization stage. This method can be used in real time
applications, exceeding the difficulties suffered by optimal algorithms, that need a higher
computational load
Compression of 3D models with NURBS
With recent progress in computing, algorithmics and telecommunications, 3D models are increasingly used in various multimedia applications. Examples include visualization, gaming, entertainment and virtual reality. In the multimedia domain 3D models have been traditionally represented as polygonal meshes. This piecewise planar representation can be thought of as the analogy of bitmap images for 3D surfaces. As bitmap images, they enjoy great flexibility and are particularly well suited to describing information captured from the real world, through, for instance, scanning processes. They suffer, however, from the same shortcomings, namely limited resolution and large storage size. The compression of polygonal meshes has been a very active field of research in the last decade and rather efficient compression algorithms have been proposed in the literature that greatly mitigate the high storage costs. However, such a low level description of a 3D shape has a bounded performance. More efficient compression should be reachable through the use of higher level primitives. This idea has been explored to a great extent in the context of model based coding of visual information. In such an approach, when compressing the visual information a higher level representation (e.g., 3D model of a talking head) is obtained through analysis methods. This can be seen as an inverse projection problem. Once this task is fullled, the resulting parameters of the model are coded instead of the original information. It is believed that if the analysis module is efficient enough, the total cost of coding (in a rate distortion sense) will be greatly reduced. The relatively poor performance and high complexity of currently available analysis methods (except for specific cases where a priori knowledge about the nature of the objects is available), has refrained a large deployment of coding techniques based on such an approach. Progress in computer graphics has however changed this situation. In fact, nowadays, an increasing number of pictures, video and 3D content are generated by synthesis processing rather than coming from a capture device such as a camera or a scanner. This means that the underlying model in the synthesis stage can be used for their efficient coding without the need for a complex analysis module. In other words it would be a mistake to attempt to compress a low level description (e.g., a polygonal mesh) when a higher level one is available from the synthesis process (e.g., a parametric surface). This is, however, what is usually done in the multimedia domain, where higher level 3D model descriptions are converted to polygonal meshes, if anything by the lack of standard coded formats for the former. On a parallel but related path, the way we consume audio-visual information is changing. As opposed to recent past and a large part of today's applications, interactivity is becoming a key element in the way we consume information. In the context of interest in this dissertation, this means that when coding visual information (an image or a video for instance), previously obvious considerations such as decision on sampling parameters are not so obvious anymore. In fact, as in an interactive environment the effective display resolution can be controlled by the user through zooming, there is no clear optimal setting for the sampling period. This means that because of interactivity, the representation used to code the scene should allow the display of objects in a variety of resolutions, and ideally up to infinity. One way to resolve this problem would be by extensive over-sampling. But this approach is unrealistic and too expensive to implement in many situations. The alternative would be to use a resolution independent representation. In the realm of 3D modeling, such representations are usually available when the models are created by an artist on a computer. The scope of this dissertation is precisely the compression of 3D models in higher level forms. The direct coding in such a form should yield improved rate-distortion performance while providing a large degree of resolution independence. There has not been, so far, any major attempt to efficiently compress these representations, such as parametric surfaces. This thesis proposes a solution to overcome this gap. A variety of higher level 3D representations exist, of which parametric surfaces are a popular choice among designers. Within parametric surfaces, Non-Uniform Rational B-Splines (NURBS) enjoy great popularity as a wide range of NURBS based modeling tools are readily available. Recently, NURBS has been included in the Virtual Reality Modeling Language (VRML) and its next generation descendant eXtensible 3D (X3D). The nice properties of NURBS and their widespread use has lead us to choose them as the form we use for the coded representation. The primary goal of this dissertation is the definition of a system for coding 3D NURBS models with guaranteed distortion. The basis of the system is entropy coded differential pulse coded modulation (DPCM). In the case of NURBS, guaranteeing the distortion is not trivial, as some of its parameters (e.g., knots) have a complicated influence on the overall surface distortion. To this end, a detailed distortion analysis is performed. In particular, previously unknown relations between the distortion of knots and the resulting surface distortion are demonstrated. Compression efficiency is pursued at every stage and simple yet efficient entropy coder realizations are defined. The special case of degenerate and closed surfaces with duplicate control points is addressed and an efficient yet simple coding is proposed to compress the duplicate relationships. Encoder aspects are also analyzed. Optimal predictors are found that perform well across a wide class of models. Simplification techniques are also considered for improved compression efficiency at negligible distortion cost. Transmission over error prone channels is also considered and an error resilient extension defined. The data stream is partitioned by independently coding small groups of surfaces and inserting the necessary resynchronization markers. Simple strategies for achieving the desired level of protection are proposed. The same extension also serves the purpose of random access and on-the-fly reordering of the data stream
Object detection and activity recognition in digital image and video libraries
This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates.
The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties.
The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients
- …