4 research outputs found

    Optimum non linear binary image restoration through linear grey-scale operations

    Get PDF
    Non-linear image processing operators give excellent results in a number of image processing tasks such as restoration and object recognition. However they are frequently excluded from use in solutions because the system designer does not wish to introduce additional hardware or algorithms and because their design can appear to be ad hoc. In practice the median filter is often used though it is rarely optimal. This paper explains how various non-linear image processing operators may be implemented on a basic linear image processing system using only convolution and thresholding operations. The paper is aimed at image processing system developers wishing to include some non-linear processing operators without introducing additional system capabilities such as extra hardware components or software toolboxes. It may also be of benefit to the interested reader wishing to learn more about non-linear operators and alternative methods of design and implementation. The non-linear tools include various components of mathematical morphology, median and weighted median operators and various order statistic filters. As well as describing novel algorithms for implementation within a linear system the paper also explains how the optimum filter parameters may be estimated for a given image processing task. This novel approach is based on the weight monotonic property and is a direct rather than iterated method

    Fast recursive grayscale morphology operators: from the algorithm to the pipeline architecture

    Get PDF
    International audienceThis paper presents a new algorithm for an efficient computation of morphological operations for gray images and its specific hardware. The method is based on a new recursive morphological decomposition method of 8-convex structuring elements by only causal two-pixel structuring elements (2PSE). Whatever the element size, erosion or/and dilation can then be performed during a unique raster-like image scan involving a fixed reduced analysis neighborhood. The resulting process offers low computation complexity combined with easy description of the element form. The dedicated hardware is generic and fully regular, built from elementary interconnected stages. It has been synthesized into an FPGA and achieves high frequency performances for any shape and size of structuring element

    Conception de processeurs spécialisés pour le traitement vidéo en temps réel par filtre local

    Get PDF
    RÉSUMÉ Ce mémoire décrit les travaux visant à explorer les possibilités qu'offrent les processeurs à jeu d'instructions spécialisé pour des applications de vidéo numérique. Spécifiquement une classe particulière d'algorithmes de traitement vidéo est considérée: les filtres locaux. Pour cette classe d'algorithmes, une exploration architecturale a permis d'identifier un ensemble de techniques formant une approche cohérente et systématique pour la conception de processeurs spécialisés performants adaptés au traitement vidéo en temps réel. L'approche de conception proposée vise une utilisation efficace de la bande passante vers la mémoire, laquelle bande passante constitue le goulot d'étranglement de l'application du point de vue de la vitesse de traitement. Il est possible d'approcher la performance limite imposée par ce goulot par une stratégie appropriée de réutilisation des données et en exploitant le parallélisme des données inhérent à la classe d'algorithmes visée. L'approche comporte quatre étapes: tout d'abord, une instruction parallèle (SIMD) qui effectue le calcul de plusieurs pixels de sortie à la fois est créée. Puis, des registres à décalage permettant la réutilisation intra-ligne des pixels d'entrée sont ajoutés. Ensuite, un pipeline est créé par le découpage de l'instruction parallèle et l'ajout de registres pour les résultats intermédiaires. Finalement, les instructions spécialisées de chargement et de sauvegarde sont créées. Quelques-unes de ces étapes ouvrent la porte à des simplifications matérielles spécifiques pour certains algorithmes de la classe cible. La structure matérielle obtenue au final, alliée à la parallélisation des instructions par l'utilisation d'une architecture VLIW, se comporte d'une manière semblable à un réseau systolique pipeliné. Afin de démontrer expérimentalement la validité de l'approche de conception proposée, sept processeurs spécialisés pour des algorithmes de la classe visée ont été conçus par extension du jeu d'instructions d'un processeur configurable à jeu d'instructions extensible. Trois de ces processeurs spécialisés mettent en œuvre autant d'algorithmes de désentrelacement intra-trames, et quatre visent plutôt la convolution 2D, différant entre eux par la taille de la fenêtre de convolution. Les résultats de performance obtenus sont prometteurs. Pour les algorithmes de désentrelacement intra-trames, les facteurs d'accélération varient entre 95 et 1330, alors que les facteurs d'amélioration du produit temps-surface varient entre 29 et 243, tout ceci par rapport à un processeur d'usage général de référence roulant une implémentation purement logicielle de l'algorithme.----------ABSTRACT This master thesis explores the possibilities offered by Application-Specific Instruction-Set Processors (ASIP) for digital video applications, more specifically for a particular algorithm class used for video processing: local neighbourhood functions. For this algorithm class, an architectural exploration lead to the identification of a set of design techniques which, together, form a coherent and systematic approach for the design of high performance ASIPs usable for real-time video processing. The proposed design approach aims at an efficient utilization of available bandwidth to memory, which constitutes the main performance bottleneck of the application. It is possible to approach the processing speed limit imposed by this bottleneck through an appropriate data reuse strategy and by exploiting the data parallelism inherent to the target algorithm class. The design approach comprises four steps: first, a Single Instruction Multiple Data (SIMD) instruction which calculates more than one pixel in parallel is created. Then, shift registers, which are used for intra-line input pixel reuse, are added. Next, a processing pipeline is created by the addition of application-specific registers. Finally, the custom load/store instructions are created. Some of these steps lead to possible hardware simplifications for some algorithms of the target class. The hardware structure thus obtained, together with the instruction-level parallelism made possible through the use of a Very Long Instruction Word (VLIW) architecture, mimics a pipelined systolic array. In order to demonstrate the validity of the proposed design approach experimentally, seven ASIPs have been designed by extending the instruction-set of a configurable and extensible processor. Three of the ASIPs implement intra-field deinterlacing algorithms, and four implement the 2D convolution with different kernel sizes. The results show a significant improvement in performance. For the intra-field deinterlacing algorithms, speedup factors are between 95 and 1330, while the factors of improvement of the Area-Time (AT) product are between 29 and 243, all this compared to a pure software implementation running on a general-purpose processor. In the case of the two-dimensional convolution, speedup factors are between 36 and 80, while factors of improvement of the AT product are between 12 and 22. In all cases, real-time processing of high definition video in the 1080i (deinterlacing) or 1080p (convolution) format is possible given a 130 nm manufacturing process
    corecore