675 research outputs found

    Shape representation and coding of visual objets in multimedia applications — An overview

    Get PDF
    Emerging multimedia applications have created the need for new functionalities in digital communications. Whereas existing compression standards only deal with the audio-visual scene at a frame level, it is now necessary to handle individual objects separately, thus allowing scalable transmission as well as interactive scene recomposition by the receiver. The future MPEG-4 standard aims at providing compression tools addressing these functionalities. Unlike existing frame-based standards, the corresponding coding schemes need to encode shape information explicitly. This paper reviews existing solutions to the problem of shape representation and coding. Region and contour coding techniques are presented and their performance is discussed, considering coding efficiency and rate-distortion control capability, as well as flexibility to application requirements such as progressive transmission, low-delay coding, and error robustnes

    Multiorder polygonal approximation of digital curves

    Get PDF
    In this paper, we propose a quick threshold-free algorithm, which computes the angular shape of a 2D object from the points of its contour. For that, we have extended the method defined in [4, 5] to a multiorder analysis. It is based on the arithmetical definition of discrete lines [11] with variable thickness. We provide a framework to analyse a digital curve at different levels of thickness. The extremities of a segment provided at a high resolution are tracked at lower resolution in order to refine their location. The method is thresholdfree and automatically provides a partitioning of a digital curve into its meaningful parts

    On the Detection of Visual Features from Digital Curves using a Metaheuristic Approach

    Get PDF
    In computational shape analysis a crucial step consists in extracting meaningful features from digital curves. Dominant points are those points with curvature extreme on the curve that can suitably describe the curve both for visual perception and for recognition. Many approaches have been developed for detecting dominant points. In this paper we present a novel method that combines the dominant point detection and the ant colony optimization search. The method is inspired by the ant colony search (ACS) suggested by Yin in [1] but it results in a much more efficient and effective approximation algorithm. The excellent results have been compared both to works using an optimal search approach and to works based on exact approximation strateg

    Dominant points detection for shape analysis

    Get PDF
    The growing interest in recent years towards the multimedia and the large amount of information exchanged across the network involves the various fields of research towards the study of methods for automatic identification. One of the main objectives is to associate the information content of images, using techniques for identifying composing objects. Among image descriptors, contours reveal are very important because most of the information can be extracted from them and the contour analysis offers a lower computational complexity also. The contour analysis can be restricted to the study of some salient points with high curvature from which it is possible to reconstruct the original contour. The thesis is focused on the polygonal approximation of closed digital curves. After an overview of the most common shape descriptors, distinguished between simple descriptors and external methods, that focus on the analysis of boundary points of objects, and internal methods, which use the pixels inside the object also, a description of the major methods regarding the extraction of dominant points studied so far and the metrics typically used to evaluate the goodness of the polygonal approximation found is given. Three novel approaches to the problem are then discussed in detail: a fast iterative method (DPIL), more suitable for realtime processing, and two metaheuristics methods (GAPA, ACOPA) based on genetic algorithms and Ant Colony Optimization (ACO), more com- plex from the point of view of the calculation, but more precise. Such techniques are then compared with the other main methods cited in literature, in order to assess the performance in terms of computational complexity and polygonal approximation error, and measured between them, in order to evaluate the robustness with respect to affine transformations and conditions of noise. Two new techniques of shape matching, i.e. identification of objects belonging to the same class in a database of images, are then described. The first one is based on the shape alignment and the second is based on a correspondence by ACO, which puts in evidence the excellent results, both in terms of computational time and recognition accuracy, obtained through the use of dominant points. In the first matching algorithm the results are compared with a selection of dominant points generated by a human operator while in the second the dominant points are used instead of a constant sampling of the outline typically used for this kind of approach

    A discrete geometry approach for dominant point detection

    Get PDF
    International audienceWe propose two fast methods for dominant point detection and polygonal representation of noisy and possibly disconnected curves based on a study of the decomposition of the curve into the sequence of maximal blurred segments \cite{ND07}. Starting from results of discrete geometry \cite{FT99,Deb05}, the notion of maximal blurred segment of width ν\nu \cite{ND07} has been proposed, well adapted to noisy curves. The first method uses a fixed parameter that is the width of considered maximal blurred segments. The second one is proposed based on a multi-width approach to obtain a non-parametric method that uses no threshold for working with noisy curves. Comparisons with other methods in the literature prove the efficiency of our approach. Thanks to a recent result \cite{FF08} concerning the construction of the sequence of maximal blurred segments, the complexity of the proposed methods is O(nlogn)O(n\log n). An application of vectorization is also given in this paper

    Contribuciones sobre métodos óptimos y subóptimos de aproximaciones poligonales de curvas 2-D

    Get PDF
    Esta tesis versa sobre el an álisis de la forma de objetos 2D. En visión articial existen numerosos aspectos de los que se pueden extraer información. Uno de los más usados es la forma o el contorno de esos objetos. Esta característica visual de los objetos nos permite, mediante el procesamiento adecuado, extraer información de los objetos, analizar escenas, etc. No obstante el contorno o silueta de los objetos contiene información redundante. Este exceso de datos que no aporta nuevo conocimiento debe ser eliminado, con el objeto de agilizar el procesamiento posterior o de minimizar el tamaño de la representación de ese contorno, para su almacenamiento o transmisión. Esta reducción de datos debe realizarse sin que se produzca una pérdida de información importante para representación del contorno original. Se puede obtener una versión reducida de un contorno eliminando puntos intermedios y uniendo los puntos restantes mediante segmentos. Esta representación reducida de un contorno se conoce como aproximación poligonal. Estas aproximaciones poligonales de contornos representan, por tanto, una versión comprimida de la información original. El principal uso de las mismas es la reducción del volumen de información necesario para representar el contorno de un objeto. No obstante, en los últimos años estas aproximaciones han sido usadas para el reconocimiento de objetos. Para ello los algoritmos de aproximaci ón poligonal se han usado directamente para la extracci ón de los vectores de caracter ísticas empleados en la fase de aprendizaje. Las contribuciones realizadas por tanto en esta tesis se han centrado en diversos aspectos de las aproximaciones poligonales. En la primera contribución se han mejorado varios algoritmos de aproximaciones poligonales, mediante el uso de una fase de preprocesado que acelera estos algoritmos permitiendo incluso mejorar la calidad de las soluciones en un menor tiempo. En la segunda contribución se ha propuesto un nuevo algoritmo de aproximaciones poligonales que obtiene soluciones optimas en un menor espacio de tiempo que el resto de métodos que aparecen en la literatura. En la tercera contribución se ha propuesto un algoritmo de aproximaciones que es capaz de obtener la solución óptima en pocas iteraciones en la mayor parte de los casos. Por último, se ha propuesto una versi ón mejorada del algoritmo óptimo para obtener aproximaciones poligonales que soluciona otro problema de optimización alternativo.This thesis focus on the analysis of the shape of objects. In computer vision there are several sources from which we can extract information. One of the most important source of information is the shape or contour of objects. This visual characteristic can be used to extract information, analyze the scene, etc. However, the contour of the objects contains redundant information. This redundant data does not add new information and therefore, must be deleted in order to minimize the processing burden and reducing the amount of data to represent that shape. This reduction of data should be done without losing important information to represent the original contour. A reduced version of a contour can be obtained by deleting some points of the contour and linking the remaining points by using line segments. This reduced version of a contour is known as polygonal approximation in the literature. Therefore, these polygonal approximation represent a compressed version of the original information. The main use of polygonal approximations is to reduce the amount of information needed to represent the contour of an object. However, in recent years polygonal approximations have been used to recognize objects. For this purpose, the feature vectors have been extracted from the polygonal approximations. The contributions proposed in this thesis have focused on several aspects of polygonal approximations. The rst contribution has improved several algorithms to obtain polygonal approximations, by adding a new stage of preprocessing which boost the whole method. The quality of the solutions obtained has also been improved and the computation time reduced. The second contribution proposes a novel algorithm which obtains optimal polygonal approximations in a shorter time than the optimal methods found in the literature. The third contribution proposes a new method which may obtain the optimal solution after few iterations in most cases. Finally, an improved version of the optimal polygonal approximation algorithm has been proposed to solve an alternative optimization problem

    Contribución al reconocimiento de objetos 2D mediante aproximaciones poligonales

    Get PDF
    En la presente Tesis Doctoral se realizan aportaciones novedosas en las etapas de descripción e interpretación del proceso de reconocimiento de objetos bidimensionales. Se proponen nuevas técnicas de umbralización unimodal aplicadas a la generación de aproximaciones poligonales. Estas técnicas se han comparado con las estrategias de umbralización clásicas propuestas por Rosin. Se propone un nuevo método que obtiene aproximaciones poligonales de manera no supervisada; es decir, no paramétrica. Este método incorpora una etapa de umbralización unimodal. Se ha realizado un análisis exhaustivo del método propuesto para diseñar nuevas versiones, según la combinación de las características de algunas de sus etapas. Se han considerado dos estrategias principales: estrategia de división de puntos y estrategia de fusión de puntos. Se han comparado las nuevas versiones propuestas con respecto al método original y se han obtenido nuevas versiones que representan una mejora considerable, mejorando también a todas las estrategias clásicas analizadas. Se ha incorporado una fase final de optimización, que se basa en el método propuesto por Masood. Posteriormente, se ha realizado un estudio comparativo para seleccionar la versión más eficiente de cada estrategia, así como la versión que obtiene el mejor resultado de entre todas ellas. Las versiones optimizadas mejoran al algoritmo original propuesto y a todas las demás versiones analizadas. En resumen, se ha desarrollado un nuevo método heurístico que permite generar aproximaciones poligonales eficientes de forma no supervisada. Este método puede ser utilizado en aplicaciones de tiempo real, superando las dificultades que presentan los algoritmos óptimos, que requieren de una carga computacional mayor.This doctoral thesis introduces original contributions to the description and interpretation stages of the bidimensional object recognition process. A new unimodal thresholding approach has been proposed in order to generate poligonal approximations of bidimensional contours. These techniques have been compared with the classic thresholding techniques proposed by Rosin. A new unsupervised method has been proposed. This method obtains poligonal approximations automatically and also includes the new unimodal thresholding approach proposed. An exhausted analysis has been developed in order to design new versions of the proposed method, according to the combination of its different characteristics. Two strategies have been considered: point division (split) and point fusion (merge). All the versions have been compared with the original method and some of them proof to obtain a measurable improvement. Also, the new versions improve all the classic approaches that have been analyzed A final optimization stage has been incorporated. This optimization is based in the algorithm proposed by Masood. A comparative study has been developed and the best optimized method has been chosen. The optimized versions improve the original method proposed and all the versions analyzed. To summarise, a new heuristic method has been developed. This approach can obtain efficient polygonal approximations automatically, uses a new unimodal thresholding algorithm and includes a final optimization stage. This method can be used in real time applications, exceeding the difficulties suffered by optimal algorithms, that need a higher computational load

    Compression of 3D models with NURBS

    Get PDF
    With recent progress in computing, algorithmics and telecommunications, 3D models are increasingly used in various multimedia applications. Examples include visualization, gaming, entertainment and virtual reality. In the multimedia domain 3D models have been traditionally represented as polygonal meshes. This piecewise planar representation can be thought of as the analogy of bitmap images for 3D surfaces. As bitmap images, they enjoy great flexibility and are particularly well suited to describing information captured from the real world, through, for instance, scanning processes. They suffer, however, from the same shortcomings, namely limited resolution and large storage size. The compression of polygonal meshes has been a very active field of research in the last decade and rather efficient compression algorithms have been proposed in the literature that greatly mitigate the high storage costs. However, such a low level description of a 3D shape has a bounded performance. More efficient compression should be reachable through the use of higher level primitives. This idea has been explored to a great extent in the context of model based coding of visual information. In such an approach, when compressing the visual information a higher level representation (e.g., 3D model of a talking head) is obtained through analysis methods. This can be seen as an inverse projection problem. Once this task is fullled, the resulting parameters of the model are coded instead of the original information. It is believed that if the analysis module is efficient enough, the total cost of coding (in a rate distortion sense) will be greatly reduced. The relatively poor performance and high complexity of currently available analysis methods (except for specific cases where a priori knowledge about the nature of the objects is available), has refrained a large deployment of coding techniques based on such an approach. Progress in computer graphics has however changed this situation. In fact, nowadays, an increasing number of pictures, video and 3D content are generated by synthesis processing rather than coming from a capture device such as a camera or a scanner. This means that the underlying model in the synthesis stage can be used for their efficient coding without the need for a complex analysis module. In other words it would be a mistake to attempt to compress a low level description (e.g., a polygonal mesh) when a higher level one is available from the synthesis process (e.g., a parametric surface). This is, however, what is usually done in the multimedia domain, where higher level 3D model descriptions are converted to polygonal meshes, if anything by the lack of standard coded formats for the former. On a parallel but related path, the way we consume audio-visual information is changing. As opposed to recent past and a large part of today's applications, interactivity is becoming a key element in the way we consume information. In the context of interest in this dissertation, this means that when coding visual information (an image or a video for instance), previously obvious considerations such as decision on sampling parameters are not so obvious anymore. In fact, as in an interactive environment the effective display resolution can be controlled by the user through zooming, there is no clear optimal setting for the sampling period. This means that because of interactivity, the representation used to code the scene should allow the display of objects in a variety of resolutions, and ideally up to infinity. One way to resolve this problem would be by extensive over-sampling. But this approach is unrealistic and too expensive to implement in many situations. The alternative would be to use a resolution independent representation. In the realm of 3D modeling, such representations are usually available when the models are created by an artist on a computer. The scope of this dissertation is precisely the compression of 3D models in higher level forms. The direct coding in such a form should yield improved rate-distortion performance while providing a large degree of resolution independence. There has not been, so far, any major attempt to efficiently compress these representations, such as parametric surfaces. This thesis proposes a solution to overcome this gap. A variety of higher level 3D representations exist, of which parametric surfaces are a popular choice among designers. Within parametric surfaces, Non-Uniform Rational B-Splines (NURBS) enjoy great popularity as a wide range of NURBS based modeling tools are readily available. Recently, NURBS has been included in the Virtual Reality Modeling Language (VRML) and its next generation descendant eXtensible 3D (X3D). The nice properties of NURBS and their widespread use has lead us to choose them as the form we use for the coded representation. The primary goal of this dissertation is the definition of a system for coding 3D NURBS models with guaranteed distortion. The basis of the system is entropy coded differential pulse coded modulation (DPCM). In the case of NURBS, guaranteeing the distortion is not trivial, as some of its parameters (e.g., knots) have a complicated influence on the overall surface distortion. To this end, a detailed distortion analysis is performed. In particular, previously unknown relations between the distortion of knots and the resulting surface distortion are demonstrated. Compression efficiency is pursued at every stage and simple yet efficient entropy coder realizations are defined. The special case of degenerate and closed surfaces with duplicate control points is addressed and an efficient yet simple coding is proposed to compress the duplicate relationships. Encoder aspects are also analyzed. Optimal predictors are found that perform well across a wide class of models. Simplification techniques are also considered for improved compression efficiency at negligible distortion cost. Transmission over error prone channels is also considered and an error resilient extension defined. The data stream is partitioned by independently coding small groups of surfaces and inserting the necessary resynchronization markers. Simple strategies for achieving the desired level of protection are proposed. The same extension also serves the purpose of random access and on-the-fly reordering of the data stream

    Object detection and activity recognition in digital image and video libraries

    Get PDF
    This thesis is a comprehensive study of object-based image and video retrieval, specifically for car and human detection and activity recognition purposes. The thesis focuses on the problem of connecting low level features to high level semantics by developing relational object and activity presentations. With the rapid growth of multimedia information in forms of digital image and video libraries, there is an increasing need for intelligent database management tools. The traditional text based query systems based on manual annotation process are impractical for today\u27s large libraries requiring an efficient information retrieval system. For this purpose, a hierarchical information retrieval system is proposed where shape, color and motion characteristics of objects of interest are captured in compressed and uncompressed domains. The proposed retrieval method provides object detection and activity recognition at different resolution levels from low complexity to low false rates. The thesis first examines extraction of low level features from images and videos using intensity, color and motion of pixels and blocks. Local consistency based on these features and geometrical characteristics of the regions is used to group object parts. The problem of managing the segmentation process is solved by a new approach that uses object based knowledge in order to group the regions according to a global consistency. A new model-based segmentation algorithm is introduced that uses a feedback from relational representation of the object. The selected unary and binary attributes are further extended for application specific algorithms. Object detection is achieved by matching the relational graphs of objects with the reference model. The major advantages of the algorithm can be summarized as improving the object extraction by reducing the dependence on the low level segmentation process and combining the boundary and region properties. The thesis then addresses the problem of object detection and activity recognition in compressed domain in order to reduce computational complexity. New algorithms for object detection and activity recognition in JPEG images and MPEG videos are developed. It is shown that significant information can be obtained from the compressed domain in order to connect to high level semantics. Since our aim is to retrieve information from images and videos compressed using standard algorithms such as JPEG and MPEG, our approach differentiates from previous compressed domain object detection techniques where the compression algorithms are governed by characteristics of object of interest to be retrieved. An algorithm is developed using the principal component analysis of MPEG motion vectors to detect the human activities; namely, walking, running, and kicking. Object detection in JPEG compressed still images and MPEG I frames is achieved by using DC-DCT coefficients of the luminance and chrominance values in the graph based object detection algorithm. The thesis finally addresses the problem of object detection in lower resolution and monochrome images. Specifically, it is demonstrated that the structural information of human silhouettes can be captured from AC-DCT coefficients
    corecore