Search CORE

125 research outputs found

An IoT System for Converting Handwritten Text to Editable Format via Gesture Recognition

Author: patel Nidhi
Publication venue: DigitalCommons@Kennesaw State University
Publication date: 03/08/2018
Field of study

Evaluation of traditional classroom has led to electronic classroom i.e. e-learning. Growth of traditional classroom doesn’t stop at e-learning or distance learning. Next step to electronic classroom is a smart classroom. Most popular features of electronic classroom is capturing video/photos of lecture content and extracting handwriting for note-taking. Numerous techniques have been implemented in order to extract handwriting from video/photo of the lecture but still the deficiency of few techniques can be resolved, and which can turn electronic classroom into smart classroom. In this thesis, we present a real-time IoT system to convert handwritten text into editable format by implementing hand gesture recognition (HGR) with Raspberry Pi and camera. Hand Gesture Recognition (HGR) is built using edge detection algorithm and HGR is used in this system to reduce computational complexity of previous systems i.e. removal of redundant images and lecture’s body from image, recollecting text from previous images to fill area from where lecture’s body has been removed. Raspberry Pi is used to retrieve, perceive HGR and to build a smart classroom based on IoT. Handwritten images are converted into editable format by using OpenCV and machine learning algorithms. In text conversion, recognition of uppercase and lowercase alphabets, numbers, special characters, mathematical symbols, equations, graphs and figures are included with recognition of word, lines, blocks, and paragraphs. With the help of Raspberry Pi and IoT, the editable format of lecture notes is given to students via desktop application which helps students to edit notes and images according to their necessity

DigitalCommons@Kennesaw State University

Modeling of Performance Creative Evaluation Driven by Multimodal Affective Data

Author: Ding Gangyi
Wu Yufeng
Xue Tong
Zhang Fuquan
Zhang Longfei
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 01/09/2021
Field of study

Performance creative evaluation can be achieved through affective data, and the use of affective featuresto evaluate performance creative is a new research trend. This paper proposes a “Performance Creative—Multimodal Affective (PC-MulAff)” model based on the multimodal affective features for performance creative evaluation. The multimedia data acquisition equipment is used to collect the physiological data of the audience, including the multimodal affective data such as the facial expression, heart rate and eye movement. Calculate affective features of multimodal data combined with director annotation, and defined “Performance Creative—Affective Acceptance (PC-Acc)” based on multimodal affective features to evaluate the quality of performance creative. This paper verifies the PC-MulAff model on different performance data sets. The experimental results show that the PC-MulAff model shows high evaluation quality in different performance forms. In the creative evaluation of dance performance, the accuracy of the model is 7.44% and 13.95% higher than that of the single textual and single video evaluation

Directory of Open Access Journals

Re-UNIR

Image Enhancement for Scanned Historical Documents in the Presence of Multiple Degradations

Author: Suleiman Farouk
Publication venue
Publication date: 26/03/2024
Field of study

Historical documents are treasured sources of information but typically suffer from problems with quality and degradation. Scanned images of historical documents suffer from difficulties due to paper quality and poor image capture, producing images with low contrast, smeared ink, bleed-through and uneven illumination. This PhD thesis proposes a novel adaptative histogram matching method to remove these artefacts from scanned images of historical documents. The adaptive histogram matching is modelled to create an ideal histogram by dividing the histogram using its Otsu level and applying Gaussian distributions to each segment with iterative output refinement applied to individual images. The pre-processing techniques of contrast stretching, wiener filtering, and bilateral filtering are used before the proposed adaptive histogram matching approach to maximise the dynamic range and reduce noise. The goal is to better represent document images and improve readability and the source images for Optical Character Recognition (OCR). Unlike other enhancement methods designed for single artefacts, the proposed method enhances multiple (low-contrast, smeared-ink, bleed-through and uneven illumination). In addition to developing an algorithm for historical document enhancement, the research also contributes a new dataset of scanned historical newspapers (an annotated subset of the Europeana Newspaper - ENP – dataset) where the enhancement technique is tested, which can also be used for further research. Experimental results show that the proposed method significantly reduces background noise and improves image quality on multiple artefacts compared to other enhancement methods. Several performance criteria are utilised to evaluate the proposed method’s efficiency. These include Signal to Noise Ratio (SNR), Mean opinion score (MOS), and visual document image quality assessment (VDIQA) metric called Visual Document Image Quality Assessment Metric (VDQAM). Additional assessment criteria to measure post-processing binarization quality are also discussed with enhanced results based on the Peak signal-to-noise ratio (PSNR), negative rate metric (NRM) and F-measure.Keywords: Image Enhancement, Historical Documents, OCR, Digitisation, Adaptive histogram matchin

University of Salford Institutional Repository

Facial Emotion Recognition for Citizens with Traumatic Brain Injury for Therapeutic Robot Interaction

Author: Ilyas Chaudhary Muhammad
Publication venue: Aalborg University
Publication date: 01/01/2021
Field of study

VBN

Face Liveness Detection under Processed Image Attacks

Author: OMAR LUMA,QASSAM,ABEDALQADER
Publication venue
Publication date: 01/01/2018
Field of study

Face recognition is a mature and reliable technology for identifying people. Due to high-deﬁnition cameras and supporting devices, it is considered the fastest and the least intrusive biometric recognition modality. Nevertheless, eﬀective spooﬁng attempts on face recognition systems were found to be possible. As a result, various anti-spooﬁng algorithms were developed to counteract these attacks. They are commonly referred in the literature a liveness detection tests. In this research we highlight the eﬀectiveness of some simple, direct spooﬁng attacks, and test one of the current robust liveness detection algorithms, i.e. the logistic regression based face liveness detection from a single image, proposed by the Tan et al. in 2010, against malicious attacks using processed imposter images. In particular, we study experimentally the eﬀect of common image processing operations such as sharpening and smoothing, as well as corruption with salt and pepper noise, on the face liveness detection algorithm, and we ﬁnd that it is especially vulnerable against spooﬁng attempts using processed imposter images. We design and present a new facial database, the Durham Face Database, which is the ﬁrst, to the best of our knowledge, to have client, imposter as well as processed imposter images. Finally, we evaluate our claim on the eﬀectiveness of proposed imposter image attacks using transfer learning on Convolutional Neural Networks. We verify that such attacks are more diﬃcult to detect even when using high-end, expensive machine learning techniques

Durham e-Theses

Automatic Emotion Identification: Analysis and Detection of Facial Expressions in Movies

Author: João Carlos Miranda de Almeida
Publication venue
Publication date: 19/10/2020
Field of study

Repositório Aberto da Universidade do Porto

Extraction of Text from Images and Videos

Author: PHAN QUY TRUNG
Publication venue
Publication date: 23/01/2014
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

A Review of Deep Learning Techniques for Speech Processing

Author: Bhardwaj Rishabh
Majumder Navonil
Mehrish Ambuj
Mihalcea Rada
Poria Soujanya
Publication venue
Publication date: 01/05/2023
Field of study

The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

arXiv.org e-Print Archive