8 research outputs found

    Effective SAR image despeckling based on bandlet and SRAD

    Get PDF
    Despeckling of a SAR image without losing features of the image is a daring task as it is intrinsically affected by multiplicative noise called speckle. This thesis proposes a novel technique to efficiently despeckle SAR images. Using an SRAD filter, a Bandlet transform based filter and a Guided filter, the speckle noise in SAR images is removed without losing the features in it. Here a SAR image input is given parallel to both SRAD and Bandlet transform based filters. The SRAD filter despeckles the SAR image and the despeckled output image is used as a reference image for the guided filter. In the Bandlet transform based despeckling scheme, the input SAR image is first decomposed using the bandlet transform. Then the coefficients obtained are thresholded using a soft thresholding rule. All coefficients other than the low-frequency ones are so adjusted. The generalized cross-validation (GCV) technique is employed here to find the most favorable threshold for each subband. The bandlet transform is able to extract edges and fine features in the image because it finds the direction where the function gives maximum value and in the same direction it builds extended orthogonal vectors. Simple soft thresholding using an optimum threshold despeckles the input SAR image. The guided filter with the help of a reference image removes the remaining speckle from the bandlet transform output. In terms of numerical and visual quality, the proposed filtering scheme surpasses the available despeckling schemes

    Novel Video Completion Approaches and Their Applications

    Get PDF
    Video completion refers to automatically restoring damaged or removed objects in a video sequence, with applications ranging from sophisticated video removal of undesired static or dynamic objects to correction of missing or corrupted video frames in old movies and synthesis of new video frames to add, modify, or generate a new visual story. The video completion problem can be solved using texture synthesis and/or data interpolation to fill-in the holes of the sequence inward. This thesis makes a distinction between still image completion and video completion. The latter requires visually pleasing consistency by taking into account the temporal information. Based on their applied concepts, video completion techniques are categorized as inpainting and texture synthesis. We present a bandlet transform-based technique for each of these categories of video completion techniques. The proposed inpainting-based technique is a 3D volume regularization scheme that takes advantage of bandlet bases for exploiting the anisotropic regularities to reconstruct a damaged video. The proposed exemplar-based approach, on the other hand, performs video completion using a precise patch fusion in the bandlet domain instead of patch replacement. The video completion task is extended to two important applications in video restoration. First, we develop an automatic video text detection and removal that benefits from the proposed inpainting scheme and a novel video text detector. Second, we propose a novel video super-resolution technique that employs the inpainting algorithm spatially in conjunction with an effective structure tensor, generated using bandlet geometry. The experimental results show a good performance of the proposed video inpainting method and demonstrate the effectiveness of bandlets in video completion tasks. The proposed video text detector and the video super resolution scheme also show a high performance in comparison with existing methods

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div

    Curvelets and Ridgelets

    Get PDF
    International audienceDespite the fact that wavelets have had a wide impact in image processing, they fail to efficiently represent objects with highly anisotropic elements such as lines or curvilinear structures (e.g. edges). The reason is that wavelets are non-geometrical and do not exploit the regularity of the edge curve. The Ridgelet and the Curvelet [3, 4] transforms were developed as an answer to the weakness of the separable wavelet transform in sparsely representing what appears to be simple building atoms in an image, that is lines, curves and edges. Curvelets and ridgelets take the form of basis elements which exhibit high directional sensitivity and are highly anisotropic [5, 6, 7, 8]. These very recent geometric image representations are built upon ideas of multiscale analysis and geometry. They have had an important success in a wide range of image processing applications including denoising [8, 9, 10], deconvolution [11, 12], contrast enhancement [13], texture analysis [14, 15], detection [16], watermarking [17], component separation [18], inpainting [19, 20] or blind source separation[21, 22]. Curvelets have also proven useful in diverse fields beyond the traditional image processing application. Let’s cite for example seismic imaging [10, 23, 24], astronomical imaging [25, 26, 27], scientific computing and analysis of partial differential equations [28, 29]. Another reason for the success of ridgelets and curvelets is the availability of fast transform algorithms which are available in non-commercial software packages following the philosophy of reproducible research, see [30, 31]

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Pattern detection and recognition using over-complete and sparse representations

    Get PDF
    Recent research in harmonic analysis and mammalian vision systems has revealed that over-complete and sparse representations play an important role in visual information processing. The research on applying such representations to pattern recognition and detection problems has become an interesting field of study. The main contribution of this thesis is to propose two feature extraction strategies - the global strategy and the local strategy - to make use of these representations. In the global strategy, over-complete and sparse transformations are applied to the input pattern as a whole and features are extracted in the transformed domain. This strategy has been applied to the problems of rotation invariant texture classification and script identification, using the Ridgelet transform. Experimental results have shown that better performance has been achieved when compared with Gabor multi-channel filtering method and Wavelet based methods. The local strategy is divided into two stages. The first one is to analyze the local over-complete and sparse structure, where the input 2-D patterns are divided into patches and the local over-complete and sparse structure is learned from these patches using sparse approximation techniques. The second stage concerns the application of the local over-complete and sparse structure. For an object detection problem, we propose a sparsity testing technique, where a local over-complete and sparse structure is built to give sparse representations to the text patterns and non-sparse representations to other patterns. Object detection is achieved by identifying patterns that can be sparsely represented by the learned. structure. This technique has been applied. to detect texts in scene images with a recall rate of 75.23% (about 6% improvement compared with other works) and a precision rate of 67.64% (about 12% improvement). For applications like character or shape recognition, the learned over-complete and sparse structure is combined. with a Convolutional Neural Network (CNN). A second text detection method is proposed based on such a combination to further improve (about 11% higher compared with our first method based on sparsity testing) the accuracy of text detection in scene images. Finally, this method has been applied to handwritten Farsi numeral recognition, which has obtained a 99.22% recognition rate on the CENPARMI Database and a 99.5% recognition rate on the HODA Database. Meanwhile, a SVM with gradient features achieves recognition rates of 98.98% and 99.22% on these databases respectivel

    Image Restoration Methods for Retinal Images: Denoising and Interpolation

    Get PDF
    Retinal imaging provides an opportunity to detect pathological and natural age-related physiological changes in the interior of the eye. Diagnosis of retinal abnormality requires an image that is sharp, clear and free of noise and artifacts. However, to prevent tissue damage, retinal imaging instruments use low illumination radiation, hence, the signal-to-noise ratio (SNR) is reduced which means the total noise power is increased. Furthermore, noise is inherent in some imaging techniques. For example, in Optical Coherence Tomography (OCT) speckle noise is produced due to the coherence between the unwanted backscattered light. Improving OCT image quality by reducing speckle noise increases the accuracy of analyses and hence the diagnostic sensitivity. However, the challenge is to preserve image features while reducing speckle noise. There is a clear trade-off between image feature preservation and speckle noise reduction in OCT. Averaging multiple OCT images taken from a unique position provides a high SNR image, but it drastically increases the scanning time. In this thesis, we develop a multi-frame image denoising method for Spectral Domain OCT (SD-OCT) images extracted from a very close locations of a SD-OCT volume. The proposed denoising method was tested using two dictionaries: nonlinear (NL) and KSVD-based adaptive dictionary. The NL dictionary was constructed by adding phases, polynomial, exponential and boxcar functions to the conventional Discrete Cosine Transform (DCT) dictionary. The proposed denoising method denoises nearby frames of SD-OCT volume using a sparse representation method and combines them by selecting median intensity pixels from the denoised nearby frames. The result showed that both dictionaries reduced the speckle noise from the OCT images; however, the adaptive dictionary showed slightly better results at the cost of a higher computational complexity. The NL dictionary was also used for fundus and OCT image reconstruction. The performance of the NL dictionary was always better than that of other analytical-based dictionaries, such as DCT and Haar. The adaptive dictionary involves a lengthy dictionary learning process, and therefore cannot be used in real situations. We dealt this problem by utilizing a low-rank approximation. In this approach SD-OCT frames were divided into a group of noisy matrices that consist of non-local similar patches. A noise-free patch matrix was obtained from a noisy patch matrix utilizing a low-rank approximation. The noise-free patches from nearby frames were averaged to enhance the denoising. The denoised image obtained from the proposed approach was better than those obtained by several state-of-the-art methods. The proposed approach was extended to jointly denoise and interpolate SD-OCT image. The results show that joint denoising and interpolation method outperforms several existing state-of-the-art denoising methods plus bicubic interpolation.4 month
    corecore