20 research outputs found
An IoT System for Converting Handwritten Text to Editable Format via Gesture Recognition
Evaluation of traditional classroom has led to electronic classroom i.e. e-learning. Growth of traditional classroom doesn’t stop at e-learning or distance learning. Next step to electronic classroom is a smart classroom. Most popular features of electronic classroom is capturing video/photos of lecture content and extracting handwriting for note-taking. Numerous techniques have been implemented in order to extract handwriting from video/photo of the lecture but still the deficiency of few techniques can be resolved, and which can turn electronic classroom into smart classroom.
In this thesis, we present a real-time IoT system to convert handwritten text into editable format by implementing hand gesture recognition (HGR) with Raspberry Pi and camera. Hand Gesture Recognition (HGR) is built using edge detection algorithm and HGR is used in this system to reduce computational complexity of previous systems i.e. removal of redundant images and lecture’s body from image, recollecting text from previous images to fill area from where lecture’s body has been removed. Raspberry Pi is used to retrieve, perceive HGR and to build a smart classroom based on IoT. Handwritten images are converted into editable format by using OpenCV and machine learning algorithms. In text conversion, recognition of uppercase and lowercase alphabets, numbers, special characters, mathematical symbols, equations, graphs and figures are included with recognition of word, lines, blocks, and paragraphs. With the help of Raspberry Pi and IoT, the editable format of lecture notes is given to students via desktop application which helps students to edit notes and images according to their necessity
Enhancing Rice Plant Disease Recognition and Classification Using Modified Sand Cat Swarm Optimization with Deep Learning
Rice plant diseases play a critical challenge to agricultural productivity and food safety. Timely and accurate recognition and classification of these ailments are vital for efficient management of the disease. Classifying and recognizing rice plant disease by implementing Deep Learning (DL) has emerged as a powerful approach to tackle the challenges associated with automated disease diagnosis in rice crops. DL, a subfield of artificial intelligence, concentrates to train neural networks with several layers for automated learning of the complex patterns and illustrations from data. In the context of rice plant diseases, DL methods can effectually extract meaningful features from images and accurately classify them into different disease categories. Therefore, this study introduces a new Modified Sand Cat Swarm Optimization with Deep Learning based Rice Plant Disease Detection and Classification (MSCSO-DLRPDC) technique. The main objective of the MSCSO-DLRPDC technique focalize on the automated classification and recognition of rice plant ailments. To achieve this, the MSCSO-DLRPDC methodology involves two levels of pre-processing such as median filter-based noise removal and CLAHE-based contrast enhancement. Besides, Multi-Layer ShuffleNet with Depthwise Separable Convolution (MLS-DSC) methodology is utilized for feature extraction purposes. Moreover, the Multi-Head Attention-based Long Short-Term Memory (MHA-LSTM) methodology is utilized for the process of rice plant disease detection. At last, the MSCSO method is utilized for the tuning process of the MHA-LSTM approach. The MSCSO approach inspired by the collective behaviour of sand cats and the mutation operator, is implemented for optimizing the parameters of the MHA-LSTM network. To demonstrate the enhanced accomplishment of the MSCSO-DLRPDC method, a broad set of simulations were carried out. The extensive outputs show the greater accomplishment of the MSCSO-DLRPDC method over other methods. The proposed approach has the capability in assisting farmers and agricultural stakeholders in effectively managing rice plant diseases, contributing to improved crop yield and sustainable agricultural practices
Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment. FMER is a subset of image processing and it
is a multidisciplinary topic to analysis. So, it requires familiarity with
other topics of Artifactual Intelligence (AI) such as machine learning, digital
image processing, psychology and more. So, it is a great opportunity to write a
book which covers all of these topics for beginner to professional readers in
the field of AI and even without having background of AI. Our goal is to
provide a standalone introduction in the field of MFER analysis in the form of
theorical descriptions for readers with no background in image processing with
reproducible Matlab practical examples. Also, we describe any basic definitions
for FMER analysis and MATLAB library which is used in the text, that helps
final reader to apply the experiments in the real-world applications. We
believe that this book is suitable for students, researchers, and professionals
alike, who need to develop practical skills, along with a basic understanding
of the field. We expect that, after reading this book, the reader feels
comfortable with different key stages such as color and depth image processing,
color and depth image representation, classification, machine learning, facial
micro-expressions recognition, feature extraction and dimensionality reduction.
The book attempts to introduce a gentle introduction to the field of Facial
Micro Expressions Recognition (FMER) using Color and Depth images, with the aid
of MATLAB programming environment.Comment: This is the second edition of the boo
Registration and statistical analysis of the tongue shape during speech production
This thesis analyzes the human tongue shape during speech production. First, a semi-supervised approach is derived for estimating the tongue shape from volumetric magnetic resonance imaging data of the human vocal tract. Results of this extraction are used to derive parametric tongue models. Next, a framework is presented for registering sparse motion capture data of the tongue by means of such a model. This method allows to generate full three-dimensional animations of the tongue. Finally, a multimodal and statistical text-to-speech system is developed that is able to synthesize audio and synchronized tongue motion from text.Diese Dissertation beschäftigt sich mit der Analyse der menschlichen Zungenform während der Sprachproduktion. Zunächst wird ein semi-überwachtes Verfahren vorgestellt, mit dessen Hilfe sich Zungenformen von volumetrischen Magnetresonanztomographie- Aufnahmen des menschlichen Vokaltrakts schätzen lassen. Die Ergebnisse dieses Extraktionsverfahrens werden genutzt, um ein parametrisches Zungenmodell zu konstruieren. Danach wird eine Methode hergeleitet, die ein solches Modell nutzt, um spärliche Bewegungsaufnahmen der Zunge zu registrieren. Dieser Ansatz erlaubt es, dreidimensionale Animationen der Zunge zu erstellen. Zuletzt wird ein multimodales und statistisches Text-to-Speech-System entwickelt, das in der Lage ist, Audio und die dazu synchrone Zungenbewegung zu synthetisieren.German Research Foundatio
Estimation and validation of temporal gait features using a markerless 2D video system
Background and Objective: Estimation of temporal gait features, such as stance time, swing time and gait cycle time, can be used for clinical evaluations of various patient groups having gait pathologies, such as Parkinson’s diseases, neuropathy, hemiplegia and diplegia. Most clinical laboratories employ an optoelectronic motion capture system to acquire such features. However, the operation of these systems requires specially trained operators, a controlled environment and attaching reflective markers to the patient’s body. To allow the estimation of the same features in a daily life setting, this paper presents a novel vision based system whose operation does not require the presence of skilled technicians or markers and uses a single 2D camera.
Method: The proposed system takes as input a 2D video, computes the silhouettes of the walking person, and then estimates key biomedical gait indicators, such as the initial foot contact with the ground and the toe off instants, from which several other temporal gait features can be derived.
Results: The proposed system is tested on two datasets: (i) a public gait dataset made available by CASIA, which contains 20 users, with 4 sequences per user; and (ii) a dataset acquired simultaneously by a marker-based optoelectronic motion capture system and a simple 2D video camera, containing 10 users, with 5 sequences per user. For the CASIA gait dataset A the relevant temporal biomedical gait indicators were manually annotated, and the proposed automated video analysis system achieved an accuracy of 99% on their identification. It was able to obtain accurate estimations even on segmented silhouettes where, the state-of-the-art markerless 2D video based systems fail. For the second database, the temporal features obtained by the proposed system achieved an average intra-class correlation coefficient of 0.86, when compared to the "gold standard" optoelectronic motion capture system.
Conclusions: The proposed markerless 2D video based system can be used to evaluate patients’ gait without requiring the usage of complex laboratory settings and without the need for physical attachment of sensors/markers to the patients. The good accuracy of the results obtained suggests that the proposed system can be used as an alternative to the optoelectronic motion capture system in non-laboratory environments, which can be enable more regular clinical evaluations.info:eu-repo/semantics/acceptedVersio
Optical Character Recognition of Printed Persian/Arabic Documents
Texts are an important representation of language. Due to the volume of texts generated and the historical value of some documents, it is imperative to use computers to read generated texts, and make them editable and searchable. This task, however, is not trivial. Recreating human perception capabilities in artificial systems like documents is one of the major goals of pattern recognition research. After decades of research and improvements in computing capabilities, humans\u27 ability to read typed or handwritten text is hardly matched by machine intelligence. Although, classical applications of Optical Character Recognition (OCR) like reading machine-printed addresses in a mail sorting machine is considered solved, more complex scripts or handwritten texts push the limits of the existing technology. Moreover, many of the existing OCR systems are language dependent. Therefore, improvements in OCR technologies have been uneven across different languages. Especially, for Persian, there has been limited research. Despite the need to process many Persian historical documents or use of OCR in variety of applications, few Persian OCR systems work with good recognition rate. Consequently, the task of automatically reading Persian typed documents with close-to-human performance is still an open problem and the main focus of this dissertation. In this dissertation, after a literature survey of the existing technology, we propose new techniques in the two important preprocessing steps in any OCR system: Skew detection and Page segmentation. Then, rather than the usual practice of character segmentation, we propose segmentation of Persian documents into sub-words. The choice of sub-word segmentation is to avoid the challenges of segmenting highly cursive Persian texts to isolated characters. For feature extraction, we will propose a hybrid scheme between three commonly used methods and finally use a nonparametric classification method. A large number of papers and patents advertise recognition rates near 100%. Such claims give the impression that automation problems seem to have been solved. Although OCR is widely used, its accuracy today is still far from a child\u27s reading skills. Failure of some real applications show that performance problems still exist on composite and degraded documents and that there is still room for progress
Pedestrian detection for underground mine vehicles using thermal imaging
Vehicle accidents are one of the major causes of deaths in South African un-
derground mines. A computer vision-based pedestrian detection and track-
ing system is presented in this research that will assist locomotive drivers
in operating their vehicles safer. The detection and tracking system uses a
combination of thermal and three-dimensional (3D) imagery for the detec-
tion and tracking of people. The developed system uses a segment-classify-
track methodology which eliminates computationally expensive multi-scale
classi cation. A minimum error thresholding algorithm for segmentation is
shown to be e ective in a wide range of environments with temperature up to
26 C and in a 1000 m deep mine. The classi er uses a principle component
analysis and support vector classi er to achieve a 95% accuracy and 97%
speci city in classifying the segmented images. It is shown that each detec-
tion is not independent of the previous but the probability of missing two
detections in a row is 0.6%, which is considered acceptably low. The tracker
uses the Kinect's structured-light 3D sensor for tracking the identi ed peo-
ple. It is shown that the useful range of the Kinect is insu cient to provide
timeous warning of a collision. The error in the Kinect depth, measurements
increases quadratically with depth resulting in very noisy velocity estimates
at longer ranges. The use of the Kinect for the tracker demonstrates the
principle of the tracker but due to budgetary constraints the replacement of
the Kinect with a long range sensor remains future work
Image segmentation and pattern classification using support vector machines
Image segmentation and pattern classification have long been important topics in computer science research. Image segmentation is one of the basic and challenging lower-level image processing tasks. Feature extraction, feature reduction, and classifier design based on selected features are the three essential issues for the pattern classification problem.
In this dissertation, an automatic Seeded Region Growing (SRG) algorithm for color image segmentation is developed. In the SRG algorithm, the initial seeds are automatically determined. An adaptive morphological edge-linking algorithm to fill in the gaps between edge segments is designed. Broken edges are extended along their slope directions by using the adaptive dilation operation with suitably sized elliptical structuring elements. The size and orientation of the structuring element are adjusted according to local properties.
For feature reduction, an improved feature reduction method in input and feature spaces using Support Vector Machines (SVMs) is developed. In the input space, a subset of input features is selected by the ranking of their contributions to the decision function. In the feature space, features are ranked according to the weighted support vectors in each dimension.
For object detection, a fast face detection system using SVMs is designed. Twoeye patterns are first detected using a linear SVM, so that most of the background can be eliminated quickly. Two-layer 2nd-degree polynomial SVMs are trained for further face verification. The detection process is implemented directly in feature space, which leads to a faster SVM. By training a two-layer SVM, higher classification rates can be achieved.
For active learning, an improved incremental training algorithm for SVMs is developed. Instead of selecting training samples randomly, the k-mean clustering algorithm is applied to collect the initial set of training samples. In active query, a weight is assigned to each sample according to its distance to the current separating hyperplane and the confidence factor. The confidence factor, calculated from the upper bounds of SVM errors, is used to indicate the degree of closeness of the current separating hyperplane to the optimal solution
Emotion Recognition for Affective Computing: Computer Vision and Machine Learning Approach
The purpose of affective computing is to develop reliable and intelligent models that computers can use to interact more naturally with humans. The critical requirements for such models are that they enable computers to recognise, understand and interpret the emotional states expressed by humans. The emotion recognition has been a research topic of interest for decades, not only in relation to developments in the affective computing field but also due to its other potential applications.
A particularly challenging problem that has emerged from this body of work, however, is the task of recognising facial expressions and emotions from still images or videos in real-time. This thesis aimed to solve this challenging problem by developing new techniques involving computer vision, machine learning and different levels of information fusion.
Firstly, an efficient and effective algorithm was developed to improve the performance of the Viola-Jones algorithm. The proposed method achieved significantly higher detection accuracy (95%) than the standard Viola-Jones method (90%) in face detection from thermal images, while also doubling the detection speed. Secondly, an automatic subsystem for detecting eyeglasses, Shallow-GlassNet, was proposed to address the facial occlusion problem by designing a shallow convolutional neural network capable of detecting eyeglasses rapidly and accurately. Thirdly, a novel neural network model for decision fusion was proposed in order to make use of multiple classifier systems, which can increase the classification accuracy by up to 10%. Finally, a high-speed approach to emotion recognition from videos, called One-Shot Only (OSO), was developed based on a novel spatio-temporal data fusion method for representing video frames. The OSO method tackled video classification as a single image classification problem, which not only made it extremely fast but also reduced the overfitting problem