125 research outputs found

    An IoT System for Converting Handwritten Text to Editable Format via Gesture Recognition

    Get PDF
    Evaluation of traditional classroom has led to electronic classroom i.e. e-learning. Growth of traditional classroom doesn’t stop at e-learning or distance learning. Next step to electronic classroom is a smart classroom. Most popular features of electronic classroom is capturing video/photos of lecture content and extracting handwriting for note-taking. Numerous techniques have been implemented in order to extract handwriting from video/photo of the lecture but still the deficiency of few techniques can be resolved, and which can turn electronic classroom into smart classroom. In this thesis, we present a real-time IoT system to convert handwritten text into editable format by implementing hand gesture recognition (HGR) with Raspberry Pi and camera. Hand Gesture Recognition (HGR) is built using edge detection algorithm and HGR is used in this system to reduce computational complexity of previous systems i.e. removal of redundant images and lecture’s body from image, recollecting text from previous images to fill area from where lecture’s body has been removed. Raspberry Pi is used to retrieve, perceive HGR and to build a smart classroom based on IoT. Handwritten images are converted into editable format by using OpenCV and machine learning algorithms. In text conversion, recognition of uppercase and lowercase alphabets, numbers, special characters, mathematical symbols, equations, graphs and figures are included with recognition of word, lines, blocks, and paragraphs. With the help of Raspberry Pi and IoT, the editable format of lecture notes is given to students via desktop application which helps students to edit notes and images according to their necessity

    Modeling of Performance Creative Evaluation Driven by Multimodal Affective Data

    Get PDF
    Performance creative evaluation can be achieved through affective data, and the use of affective featuresto evaluate performance creative is a new research trend. This paper proposes a “Performance Creative—Multimodal Affective (PC-MulAff)” model based on the multimodal affective features for performance creative evaluation. The multimedia data acquisition equipment is used to collect the physiological data of the audience, including the multimodal affective data such as the facial expression, heart rate and eye movement. Calculate affective features of multimodal data combined with director annotation, and defined “Performance Creative—Affective Acceptance (PC-Acc)” based on multimodal affective features to evaluate the quality of performance creative. This paper verifies the PC-MulAff model on different performance data sets. The experimental results show that the PC-MulAff model shows high evaluation quality in different performance forms. In the creative evaluation of dance performance, the accuracy of the model is 7.44% and 13.95% higher than that of the single textual and single video evaluation

    Image Enhancement for Scanned Historical Documents in the Presence of Multiple Degradations

    Get PDF
    Historical documents are treasured sources of information but typically suffer from problems with quality and degradation. Scanned images of historical documents suffer from difficulties due to paper quality and poor image capture, producing images with low contrast, smeared ink, bleed-through and uneven illumination. This PhD thesis proposes a novel adaptative histogram matching method to remove these artefacts from scanned images of historical documents. The adaptive histogram matching is modelled to create an ideal histogram by dividing the histogram using its Otsu level and applying Gaussian distributions to each segment with iterative output refinement applied to individual images. The pre-processing techniques of contrast stretching, wiener filtering, and bilateral filtering are used before the proposed adaptive histogram matching approach to maximise the dynamic range and reduce noise. The goal is to better represent document images and improve readability and the source images for Optical Character Recognition (OCR). Unlike other enhancement methods designed for single artefacts, the proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). In addition to developing an algorithm for historical document enhancement, the research also contributes a new dataset of scanned historical newspapers (an annotated subset of the Europeana Newspaper - ENP – dataset) where the enhancement technique is tested, which can also be used for further research. Experimental results show that the proposed method significantly reduces background noise and improves image quality on multiple artefacts compared to other enhancement methods. Several performance criteria are utilised to evaluate the proposed method’s efficiency. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric called Visual Document Image Quality Assessment Metric (VDQAM). Additional assessment criteria to measure post-processing binarization quality are also discussed with enhanced results based on the Peak signal-to-noise ratio (PSNR), negative rate metric (NRM) and F-measure.Keywords: Image Enhancement, Historical Documents, OCR, Digitisation, Adaptive histogram matchin

    Face Liveness Detection under Processed Image Attacks

    Get PDF
    Face recognition is a mature and reliable technology for identifying people. Due to high-definition cameras and supporting devices, it is considered the fastest and the least intrusive biometric recognition modality. Nevertheless, effective spoofing attempts on face recognition systems were found to be possible. As a result, various anti-spoofing algorithms were developed to counteract these attacks. They are commonly referred in the literature a liveness detection tests. In this research we highlight the effectiveness of some simple, direct spoofing attacks, and test one of the current robust liveness detection algorithms, i.e. the logistic regression based face liveness detection from a single image, proposed by the Tan et al. in 2010, against malicious attacks using processed imposter images. In particular, we study experimentally the effect of common image processing operations such as sharpening and smoothing, as well as corruption with salt and pepper noise, on the face liveness detection algorithm, and we find that it is especially vulnerable against spoofing attempts using processed imposter images. We design and present a new facial database, the Durham Face Database, which is the first, to the best of our knowledge, to have client, imposter as well as processed imposter images. Finally, we evaluate our claim on the effectiveness of proposed imposter image attacks using transfer learning on Convolutional Neural Networks. We verify that such attacks are more difficult to detect even when using high-end, expensive machine learning techniques

    Extraction of Text from Images and Videos

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A Review of Deep Learning Techniques for Speech Processing

    Full text link
    The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field
    corecore