34 research outputs found

    A VISION-BASED QUALITY INSPECTION SYSTEM FOR FABRIC DEFECT DETECTION AND CLASSIFICATION

    Get PDF
    Published ThesisQuality inspection of textile products is an important issue for fabric manufacturers. It is desirable to produce the highest quality goods in the shortest amount of time possible. Fabric faults or defects are responsible for nearly 85% of the defects found by the garment industry. Manufacturers recover only 45 to 65% of their profits from second or off-quality goods. There is a need for reliable automated woven fabric inspection methods in the textile industry. Numerous methods have been proposed for detecting defects in textile. The methods are generally grouped into three main categories according to the techniques they use for texture feature extraction, namely statistical approaches, spectral approaches and model-based approaches. In this thesis, we study one method from each category and propose their combinations in order to get improved fabric defect detection and classification accuracy. The three chosen methods are the grey level co-occurrence matrix (GLCM) from the statistical category, the wavelet transform from the spectral category and the Markov random field (MRF) from the model-based category. We identify the most effective texture features for each of those methods and for different fabric types in order to combine them. Using GLCM, we identify the optimal number of features, the optimal quantisation level of the original image and the optimal intersample distance to use. We identify the optimal GLCM features for different types of fabrics and for three different classifiers. Using the wavelet transform, we compare the defect detection and classification performance of features derived from the undecimated discrete wavelet and those derived from the dual-tree complex wavelet transform. We identify the best features for different types of fabrics. Using the Markov random field, we study the performance for fabric defect detection and classification of features derived from different models of Gaussian Markov random fields of order from 1 through 9. For each fabric type we identify the best model order. Finally, we propose three combination schemes of the best features identified from the three methods and study their fabric detection and classification performance. They lead generally to improved performance as compared to the individual methods, but two of them need further improvement

    Scattering Vision Transformer: Spectral Mixing Matters

    Full text link
    Vision transformers have gained significant attention and achieved state-of-the-art performance in various computer vision tasks, including image classification, instance segmentation, and object detection. However, challenges remain in addressing attention complexity and effectively capturing fine-grained information within images. Existing solutions often resort to down-sampling operations, such as pooling, to reduce computational cost. Unfortunately, such operations are non-invertible and can result in information loss. In this paper, we present a novel approach called Scattering Vision Transformer (SVT) to tackle these challenges. SVT incorporates a spectrally scattering network that enables the capture of intricate image details. SVT overcomes the invertibility issue associated with down-sampling operations by separating low-frequency and high-frequency components. Furthermore, SVT introduces a unique spectral gating network utilizing Einstein multiplication for token and channel mixing, effectively reducing complexity. We show that SVT achieves state-of-the-art performance on the ImageNet dataset with a significant reduction in a number of parameters and FLOPS. SVT shows 2\% improvement over LiTv2 and iFormer. SVT-H-S reaches 84.2\% top-1 accuracy, while SVT-H-B reaches 85.2\% (state-of-art for base versions) and SVT-H-L reaches 85.7\% (again state-of-art for large versions). SVT also shows comparable results in other vision tasks such as instance segmentation. SVT also outperforms other transformers in transfer learning on standard datasets such as CIFAR10, CIFAR100, Oxford Flower, and Stanford Car datasets. The project page is available on this webpage.\url{https://badripatro.github.io/svt/}.Comment: Accepted @NeurIPS 202

    Computer lipreading via hybrid deep neural network hidden Markov models

    Get PDF
    Constructing a viable lipreading system is a challenge because it is claimed that only 30% of information of speech production is visible on the lips. Nevertheless, in small vocabulary tasks, there have been several reports of high accuracies. However, investigation of larger vocabulary tasks is rare. This work examines constructing a large vocabulary lipreading system using an approach based-on Deep Neural Network Hidden Markov Models (DNN-HMMs). We present the historical development of computer lipreading technology and the state-ofthe-art results in small and large vocabulary tasks. In preliminary experiments, we evaluate the performance of lipreading and audiovisual speech recognition in small vocabulary data sets. We then concentrate on the improvement of lipreading systems in a more substantial vocabulary size with a multi-speaker data set. We tackle the problem of lipreading an unseen speaker. We investigate the effect of employing several stepstopre-processvisualfeatures. Moreover, weexaminethecontributionoflanguage modelling in a lipreading system where we use longer n-grams to recognise visual speech. Our lipreading system is constructed on the 6000-word vocabulary TCDTIMIT audiovisual speech corpus. The results show that visual-only speech recognition can definitely reach about 60% word accuracy on large vocabularies. We actually achieved a mean of 59.42% measured via three-fold cross-validation on the speaker independent setting of the TCD-TIMIT corpus using Deep autoencoder features and DNN-HMM models. This is the best word accuracy of a lipreading system in a large vocabulary task reported on the TCD-TIMIT corpus. In the final part of the thesis, we examine how the DNN-HMM model improves lipreading performance. We also give an insight into lipreading by providing a feature visualisation. Finally, we present an analysis of lipreading results and suggestions for future development

    Perceptual Image Fusion Using Wavelets

    Get PDF

    Classification of Gastric Lesions Using Gabor Block Local Binary Patterns

    Get PDF
    The identification of cancer tissues in Gastroenterology imaging poses novel challenges to the computer vision community in designing generic decision support systems. This generic nature demands the image descriptors to be invariant to illumination gradients, scaling, homogeneous illumination, and rotation. In this article, we devise a novel feature extraction methodology, which explores the effectiveness of Gabor filters coupled with Block Local Binary Patterns in designing such descriptors. We effectively exploit the illumination invariance properties of Block Local Binary Patterns and the inherent capability of convolutional neural networks to construct novel rotation, scale and illumination invariant features. The invariance characteristics of the proposed Gabor Block Local Binary Patterns (GBLBP) are demonstrated using a publicly available texture dataset. We use the proposed feature extraction methodology to extract texture features from Chromoendoscopy (CH) images for the classification of cancer lesions. The proposed feature set is later used in conjuncture with convolutional neural networks to classify the CH images. The proposed convolutional neural network is a shallow network comprising of fewer parameters in contrast to other state-of-the-art networks exhibiting millions of parameters required for effective training. The obtained results reveal that the proposed GBLBP performs favorably to several other state-of-the-art methods including both hand crafted and convolutional neural networks-based features

    Colorization of Multispectral Image Fusion using Convolutional Neural Network approach

    Get PDF
    The proposed technique  offers a significant advantage in enhancing multiband nighttime imagery for surveillance and navigation purposes., The multi-band image data set comprises visual  and infrared  motion sequences with various military and civilian surveillance scenarios which include people that are stationary, walking or running, Vehicles and buildings or other man-made structures. Colorization method led to provide superior discrimination, identification of objects (Lesions), faster reaction times and an increased scene understanding than monochrome fused image. The guided filtering approach is used to decompose the source images hence they are divided into two parts: approximation part and detail content part further the weighted-averaging method is used to fuse the approximation part. The multi-layer features are extracted from the detail content part using the VGG-19 network. Finally, the approximation part and detail content part will be combined to reconstruct the fused image. The proposed approach has offers better outcomes equated to prevailing state-of-the-art techniques in terms of quantitative and qualitative parameters. In future, propose technique will help Battlefield monitoring, Defence for situation awareness, Surveillance, Target tracking and Person authentication

    An Approach for Object Tracking in Video Sequences

    Get PDF
    In recent past there has been a significant increase in number of applications effectively utilizing digital videos because of less costly but superior devices. This upsurge in video acquisition has led to huge augmentation of data, which are quite impossible to handle manually. Therefore, an automated means of processing these videos is indispensable. In this thesis one such attempt has been made to track objects in videos. Object tracking comprises two closely related processes; object detection followed by tracking of the detected objects. Algorithms on these two processes are proposed in this thesis. Simple object detection algorithms compare a static background frame at pixel level with the current frame in a video. Existing methods in this domain first try to detect objects and then remove shadows associated with them, which is a two-stage process. The proposed approach combines both the stages into a single stage. Two different algorithms are proposed on object detection. First one to model the background and the next to extract the objects and remove shadows from them. Initially, from first few frames the nature of each pixel is determined as stationary or non-stationary and considering only the stationary pixels a background model is developed. Subsequently, a local thresholding technique is used to extract objects and discard shadows. After successfully detecting all the foreground objects, two different algorithms are proposed for tracking the objects and updating the background model. The first algorithm suggests a centroid searching technique, where a centroid in current frame is estimated from the previous frame. Its accuracy is verified by comparing the entropy of dual-tree complex wavelet coefficients in the bounding boxes of both the frames. If estimation becomes inaccurate, a dynamic window is utilized to search for accurate centroid. The second algorithm updates the background using a randomized updating scheme. Both stages of the proposed tracking model is simulated with various recorded videos. Simulation results are compared with the recent schemes to show the superiority of the model
    corecore