18 research outputs found

    Fuzzy metrics and fuzzy logic for colour image filtering

    Full text link
    El filtrado de imagen es una tarea fundamental para la mayoría de los sistemas de visión por computador cuando las imágenes se usan para análisis automático o, incluso, para inspección humana. De hecho, la presencia de ruido en una imagen puede ser un grave impedimento para las sucesivas tareas de procesamiento de imagen como, por ejemplo, la detección de bordes o el reconocimiento de patrones u objetos y, por lo tanto, el ruido debe ser reducido. En los últimos años el interés por utilizar imágenes en color se ha visto incrementado de forma significativa en una gran variedad de aplicaciones. Es por esto que el filtrado de imagen en color se ha convertido en un área de investigación interesante. Se ha observado ampliamente que las imágenes en color deben ser procesadas teniendo en cuenta la correlación existente entre los distintos canales de color de la imagen. En este sentido, la solución probablemente más conocida y estudiada es el enfoque vectorial. Las primeras soluciones de filtrado vectorial, como por ejemplo el filtro de mediana vectorial (VMF) o el filtro direccional vectorial (VDF), se basan en la teoría de la estadística robusta y, en consecuencia, son capaces de realizar un filtrado robusto. Desafortunadamente, estas técnicas no se adaptan a las características locales de la imagen, lo que implica que usualmente los bordes y detalles de las imágenes se emborronan y pierden calidad. A fin de solventar este problema, varios filtros vectoriales adaptativos se han propuesto recientemente. En la presente Tesis doctoral se han llevado a cabo dos tareas principales: (i) el estudio de la aplicabilidad de métricas difusas en tareas de procesamiento de imagen y (ii) el diseño de nuevos filtros para imagen en color que sacan provecho de las propiedades de las métricas difusas y la lógica difusa. Los resultados experimentales presentados en esta Tesis muestran que las métricas difusas y la lógica difusa son herramientas útiles para diseñar técnicas de filtrado,Morillas Gómez, S. (2007). Fuzzy metrics and fuzzy logic for colour image filtering [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1879Palanci

    Medidas difusas en procesamiento de imágenes

    Get PDF
    Tesis (Lic. en Matemática)--Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física, 2013.Fil: Marenchino, Matías Leandro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina.El presente trabajo define formalmente el concepto de “medida difusa”, el cual generaliza a las medidas clásicas. A éstas nuevas medidas, les agregamos condiciones para obtener medidas λ-difusas, medidas de Sugeno y quasi-medidas. Luego introducimos las medidas de credibilidad, de plausibilidad, de posibilidad y de necesidad. Como en la teoría de la medida convencional, naturalmente surgen las funciones medibles y la integración bajo esta nueva medida. Finalmente, analizamos un par de casos de estudio aplicados al procesamiento de imágenes y de video.Fil: Marenchino, Matías Leandro. Universidad Nacional de Córdoba. Facultad de Matemática, Astronomía y Física; Argentina

    Framework for the detection and classification of colorectal polyps

    No full text
    In this thesis we propose a framework for the detection and classification of colorectal polyps to assist endoscopists in bowel cancer screening. Such a system will help reduce not only the miss rate of possibly malignant polyps during screening but also reduce the number of unnecessary polypectomies where the histopathologic analysis could be spared. Our polyp detection scheme is based on a cascade filter to pre-process the incoming video frames, select a group of candidate polyp regions and then proceed to algorithmically isolate the most probable polyps based on their geometry. We also tested this system on a number of endoscopic and capsule endoscopy videos collected with the help of our clinical collaborators. Furthermore, we developed and tested a classification system for distinguishing cancerous colorectal polyps from non-cancerous ones. By analyzing the surface vasculature of high magnification polyp images from two endoscopic platforms we extracted a number of features based primarily on the vessel contrast, orientation and colour. The feature space was then filtered as to leave only the most relevant subset and this was subsequently used to train our classifier. In addition, we examined the scenario of splitting up the polyp surface into patches and including only the most feature rich areas into our classifier instead of the surface as a whole. The stability of our feature space relative to patch size was also examined to ensure reliable and robust classification. In addition, we devised a scale selection strategy to minimize the effect of inconsistencies in magnification and geometric polyp size between samples. Lastly, several techniques were also employed to ensure that our results will generalise well in real world practise. We believe this to be a solid step in forming a toolbox designed to aid endoscopists not only in the detection but also in the optical biopsy of colorectal polyps during in vivo colonoscopy.Open Acces

    Energy efficient enabling technologies for semantic video processing on mobile devices

    Get PDF
    Semantic object-based processing will play an increasingly important role in future multimedia systems due to the ubiquity of digital multimedia capture/playback technologies and increasing storage capacity. Although the object based paradigm has many undeniable benefits, numerous technical challenges remain before the applications becomes pervasive, particularly on computational constrained mobile devices. A fundamental issue is the ill-posed problem of semantic object segmentation. Furthermore, on battery powered mobile computing devices, the additional algorithmic complexity of semantic object based processing compared to conventional video processing is highly undesirable both from a real-time operation and battery life perspective. This thesis attempts to tackle these issues by firstly constraining the solution space and focusing on the human face as a primary semantic concept of use to users of mobile devices. A novel face detection algorithm is proposed, which from the outset was designed to be amenable to be offloaded from the host microprocessor to dedicated hardware, thereby providing real-time performance and reducing power consumption. The algorithm uses an Artificial Neural Network (ANN), whose topology and weights are evolved via a genetic algorithm (GA). The computational burden of the ANN evaluation is offloaded to a dedicated hardware accelerator, which is capable of processing any evolved network topology. Efficient arithmetic circuitry, which leverages modified Booth recoding, column compressors and carry save adders, is adopted throughout the design. To tackle the increased computational costs associated with object tracking or object based shape encoding, a novel energy efficient binary motion estimation architecture is proposed. Energy is reduced in the proposed motion estimation architecture by minimising the redundant operations inherent in the binary data. Both architectures are shown to compare favourable with the relevant prior art

    Recent Advances in Signal Processing

    Get PDF
    The signal processing task is a very critical issue in the majority of new technological inventions and challenges in a variety of applications in both science and engineering fields. Classical signal processing techniques have largely worked with mathematical models that are linear, local, stationary, and Gaussian. They have always favored closed-form tractability over real-world accuracy. These constraints were imposed by the lack of powerful computing tools. During the last few decades, signal processing theories, developments, and applications have matured rapidly and now include tools from many areas of mathematics, computer science, physics, and engineering. This book is targeted primarily toward both students and researchers who want to be exposed to a wide variety of signal processing techniques and algorithms. It includes 27 chapters that can be categorized into five different areas depending on the application at hand. These five categories are ordered to address image processing, speech processing, communication systems, time-series analysis, and educational packages respectively. The book has the advantage of providing a collection of applications that are completely independent and self-contained; thus, the interested reader can choose any chapter and skip to another without losing continuity

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

    DCT-Based Image Feature Extraction and Its Application in Image Self-Recovery and Image Watermarking

    Get PDF
    Feature extraction is a critical element in the design of image self-recovery and watermarking algorithms and its quality can have a big influence on the performance of these processes. The objective of the work presented in this thesis is to develop an effective methodology for feature extraction in the discrete cosine transform (DCT) domain and apply it in the design of adaptive image self-recovery and image watermarking algorithms. The methodology is to use the most significant DCT coefficients that can be at any frequency range to detect and to classify gray level patterns. In this way, gray level variations with a wider range of spatial frequencies can be looked into without increasing computational complexity and the methodology is able to distinguish gray level patterns rather than the orientations of simple edges only as in many existing DCT-based methods. The proposed image self-recovery algorithm uses the developed feature extraction methodology to detect and classify blocks that contain significant gray level variations. According to the profile of each block, the critical frequency components representing the specific gray level pattern of the block are chosen for encoding. The code lengths are made variable depending on the importance of these components in defining the block’s features, which makes the encoding of critical frequency components more precise, while keeping the total length of the reference code short. The proposed image self-recovery algorithm has resulted in remarkably shorter reference codes that are only 1/5 to 3/5 of those produced by existing methods, and consequently a superior visual quality in the embedded images. As the shorter codes contain the critical image information, the proposed algorithm has also achieved above average reconstruction quality for various tampering rates. The proposed image watermarking algorithm is computationally simple and designed for the blind extraction of the watermark. The principle of the algorithm is to embed the watermark in the locations where image data alterations are the least visible. To this end, the properties of the HVS are used to identify the gray level image features of such locations. The characteristics of the frequency components representing these features are identifying by applying the DCT-based feature extraction methodology developed in this thesis. The strength with which the watermark is embedded is made adaptive to the local gray level characteristics. Simulation results have shown that the proposed watermarking algorithm results in significantly higher visual quality in the watermarked images than that of the reported methods with a difference in PSNR of about 2.7 dB, while the embedded watermark is highly robustness against JPEG compression even at low quality factors and to some other common image processes. The good performance of the proposed image self-recovery and watermarking algorithms is an indication of the effectiveness of the developed feature extraction methodology. This methodology can be applied in a wide range of applications and it is suitable for any process where the DCT data is available
    corecore