111 research outputs found

    A novel lip geometry approach for audio-visual speech recognition

    Get PDF
    By identifying lip movements and characterizing their associations with speech sounds, the performance of speech recognition systems can be improved, particularly when operating in noisy environments. Various method have been studied by research group around the world to incorporate lip movements into speech recognition in recent years, however exactly how best to incorporate ,the additional visual information is still not known. This study aims to extend the knowledge of relationships between visual and speech information specifically using lip geometry information due to its robustness to head rotation and the fewer number of features required to represent movement. A new method has been developed to extract lip geometry information, to perform classification and to integrate visual and speech modalities. This thesis makes several contributions. First, this work presents a new method to extract lip geometry features using the combination ofa skin colour filter, a border following algorithm and a convex hull approach. The proposed method was found to improve lip shape extraction performance compared to existing approaches. Lip geometry features including height, width, ratio, area, perimeter and various combinations of these features were evaluated to determine which performs best when representing speech in the visual domain. Second, a novel template matching techniqLie able to adapt dynamic differences in the way words are uttered by speakers has been developed, which determines the best fit of an unseen feature signal to those stored in a database template. Third, following on evaluation of integration strategies, a novel method has been developed based on alternative decision fusion strategy, in which the outcome from the visual and speech modality is chosen by measuring the quality of audio based on kurtosis and skewness analysis and driven by white noise confusion. Finally, the performance of the new methods introduced in this work are evaluated using the CUAVE and LUNA-V data corpora under a range of different signal to noise ratio conditions using the NOISEX-92 dataset

    Image segmentation of womenā€™s salivary ferning patterns using harmony frangi filter

    Get PDF
    Medical research proves that entering the fertile period, especially during ovulation, all-female body fluids contain ferning patterns in the form of crystallization of salt shaped like a fern tree. Until now, not many research topics have been carried out related to the segmentation process in the salivary ferning pattern, this is due to several problems including first, the unavailability of a database of image salivary ferning pattern online. Second, the salivary ferning pattern has several hidden layers and uneven intensity. The purpose of this study was to detect and determine the line shape of the salivary ferning crystal pattern using the Harmony Frangi Filter method based on the Hessian matrix operation. The results of the segmentation process from this study are a crucial basis in determining the level of accuracy and precision at the next stage of research, namely: the prediction process of a womanā€™s ovulation in each menstrual cycle. The measurement of segmentation results has an average value of MSE 2.25, PSNR 44.86 dB, FSIM 0.954, accuracy 99.88%, sensitivity 99.98% and specificity 99.88%

    A novel fern-like lines detection using a hybrid of pre-trained convolutional neural network model and Frangi filter

    Get PDF
    Full ferning is the peak of the formation of a salt crystallization line pattern shaped like a fern tree in a womanā€™s saliva at the time of ovulation. The main problem in this study is how to detect the shape of the salivary ferning line patterns that are transparent, irregular and the surface lighting is uneven. This study aims to detect transparent and irregular lines on the salivary ferning surface using a comparison of 15 pre-trained convolutional neural network models. To detect fern-like lines on transparent and irregular layers, a pre-processing stage using the Frangi filter is required. The pre-trained convolutional neural network model is a promising framework with high precision and accuracy for detecting fern-like lines in salivary ferning. The results of this study using the fixed learning rate model ResNet50 showed the best performance with an error rate of 4.37% and an accuracy of 95.63%. Meanwhile, in implementing the automatic learning rate, ResNet18 achieved the best results with an error rate of 1.99% and an accuracy of 98.01%. The results of visual detection of fern-like lines in salivary ferning using a patch size of 34Ɨ34 pixels indicate that the ResNet34 model gave the best appearance

    In-The-Wild deepfake detection using adaptable CNN models with visual class activation mapping for improved accuracy

    Get PDF
    Deepfake technology has become increasingly sophisticated in recent years, making detecting fake images and videos challenging. This paper investigates the performance of adaptable convolutional neural network (CNN) models for detecting Deepfakes. In-the-wild OpenForensics dataset was used to evaluate four different CNN models (DenseNet121, ResNet18, SqueezeNet, and VGG11) at different batch sizes and with various performance metrics. Results show that the adapted VGG11 model with a batch size of 32 achieved the highest accuracy of 94.46% in detecting Deepfakes, outperforming the other models, with DenseNet121 as the second-best performer achieving an accuracy of 93.89% with the same batch size. Grad-CAM techniques are utilized to visualize the decision-making process within the models, aiding in understanding the Deepfake classification process. These findings provide valuable insights into the performance of different deep learning models and can guide the selection of an appropriate model for a specific application

    An attention-augmented convolutional neural network with focal loss for mixed-type wafer defect classification

    Get PDF
    Silicon wafer defect classification is crucial for improving fabrication and chip production. Although deep learning methods have been successful in single-defect wafer classification, the increasing complexity of the fabrication process has introduced the challenge of multiple defects on wafers, which requires more robust feature learning and classification techniques. Attention mechanisms have been used to enhance feature learning for multiple wafer defects. However, they have limited use in a few mixed-type defect categories, and their performance declines as the number of mixed patterns increases. This work proposes an attention-augmented convolutional neural networks (A2CNN) model for enhanced discriminative feature learning of complex defects. The A2CNN model emphasizes the features in the channel and spatial dimensions. Additionally, the model adopts the focal loss function to reduce misclassification and a global average pooling layer to enhance the network's generalization by reducing overfitting. The A2CNN model is evaluated on the MixedWM38 wafer defect dataset using 10-fold cross-validation. It achieves impressive results, with accuracy, precision, recall, and F1-score reported as 98.66%, 99.0%, 98.55%, and 98.82% respectively. Compared to existing works, the A2CNN model performs better by effectively learning valuable information for complex mixed-type wafer defects

    A Survey on Building Safety after Completing the Construction Process in Malaysia Using Statistical Approach

    Get PDF
    Building condition is an important issue in all over the world to enhance safety, health and sustainability of built environment. The objective of this study is to determine the most frequent causes of building failures in order to avoid the building from collapses, cracks and so on. The collection of data has been done among the engineers, workers and public. The questionnaire was distributed among engineers, contractors and public with 100 respondents. This survey focuses on two main parts of the safety which are building design and building management. The building designs are divided into four main criteria which are building structure, service design, building fitting and hazard environment. Meanwhile, the item of building management is focused on the management criteria. Results are analysed using statistical approach. Structural equation modeling (SEM) is used to evaluate the efficiency of the modelsā€™ fitness and goodness. The survey shows that all criteria are importantly needed in maintaining the safety of building after completing the contraction process

    Implementation of artiļ¬cial neural network to recognize numbers from voice

    Get PDF
    Speech recognition is a subjective phenomenon which also an important part of humanā€“machine interaction which still faces a lot of problem. The purpose of this work is to investigate and apply the artificial neural network (ANN) to recognise numbers using voice. In this work, MATLAB neural network toolbox is used to create, train and simulate the ANN. The dataset consisted a voice from ā€˜oneā€™ to ā€˜fiveā€™ undergo windowing process to view a short time segment of a longer signal and analyse its frequency content and then being filtered by using a band-pass filter to remove the unwanted noise and been converted into histograms as an input for the network. From the experiments, the highest accuracy level obtained is 72.5% by using histograms as Feature Extraction

    Estimation of volume and weight of apple by using 2D contactless computer vision measuring method

    Get PDF
    Volume and weight are key parameters that have been used as a benchmark to identify the quality of apples. These two parameters can be easily measured individually by using a weighing balance to measure weight and the water displacement method (WDM) to measure volume. However, these two methods are not suitable to apply in industries since both methods require a lot of time to obtain the final output. Therefore, a new approach is needed. The main objective of this work is to develop a contactless system based on computer vision system that can estimate the volume and weight of apples by using the width and height via 2D image captured. The camera needs to calibrate in order to get the ratio of pixel/cm by using the checkerboard point detection technique. Mask regional convolution neural network (R-CNN) was used to detect and segment apple images while providing the height and width of apples. The system was tested with four different settings, with 20cm and 30cm distance, and two different camera models. The best estimation of the volume and weight of apples obtained were with errors of 11.97 % and 11.49 % respectively. Overall, the findings showed that height and width from a 2D calibrated perspective can be used as an alternative method for the contactless assessment of apple volume and weight
    • ā€¦
    corecore