159 research outputs found

    Novel Deep Convolutional Neural Network-Based Contextual Recognition of Arabic Handwritten Scripts

    Get PDF
    Offline Arabic Handwriting Recognition (OAHR) has recently become instrumental in the areas of pattern recognition and image processing due to its application in several fields, such as office automation and document processing. However, OAHR continues to face several challenges, including the high variability of the Arabic script and its intrinsic characteristics such as cursiveness, ligatures, and diacritics, the unlimited variation in human handwriting, and the lack of large public databases. In this paper, we have introduced a novel context-aware model based on deep neural networks to address the challenges of recognizing offline handwritten Arabic text, including isolated digits, characters, and words. Specifically, we have proposed a supervised Convolutional Neural Network (CNN) model that contextually extracts optimal features and employs batch normalization and dropout regularization parameters to prevent overfitting and further enhance its generalization performance when compared to conventional deep learning models. We employed numerous deep stacked-convolutional layers to design the proposed Deep CNN (DCNN) architecture. The proposed model was extensively evaluated, and it was observed to achieve excellent classification accuracy when compared to the existing state-of-the-art OAHR approaches on a diverse set of six benchmark databases, including MADBase (Digits), CMATERDB (Digits), HACDB (Characters), SUST-ALT (Digits), SUST-ALT (Characters), and SUST-ALT (Names). Further comparative experiments were conducted on the respective databases using the pre-trained VGGNet-19 and Mobile-Net models; additionally, generalization capabilities experiments on another language database (i.e., MNIST English Digits) were conducted, which showed the superiority of the proposed DCNN model

    Pattern detection and recognition using over-complete and sparse representations

    Get PDF
    Recent research in harmonic analysis and mammalian vision systems has revealed that over-complete and sparse representations play an important role in visual information processing. The research on applying such representations to pattern recognition and detection problems has become an interesting field of study. The main contribution of this thesis is to propose two feature extraction strategies - the global strategy and the local strategy - to make use of these representations. In the global strategy, over-complete and sparse transformations are applied to the input pattern as a whole and features are extracted in the transformed domain. This strategy has been applied to the problems of rotation invariant texture classification and script identification, using the Ridgelet transform. Experimental results have shown that better performance has been achieved when compared with Gabor multi-channel filtering method and Wavelet based methods. The local strategy is divided into two stages. The first one is to analyze the local over-complete and sparse structure, where the input 2-D patterns are divided into patches and the local over-complete and sparse structure is learned from these patches using sparse approximation techniques. The second stage concerns the application of the local over-complete and sparse structure. For an object detection problem, we propose a sparsity testing technique, where a local over-complete and sparse structure is built to give sparse representations to the text patterns and non-sparse representations to other patterns. Object detection is achieved by identifying patterns that can be sparsely represented by the learned. structure. This technique has been applied. to detect texts in scene images with a recall rate of 75.23% (about 6% improvement compared with other works) and a precision rate of 67.64% (about 12% improvement). For applications like character or shape recognition, the learned over-complete and sparse structure is combined. with a Convolutional Neural Network (CNN). A second text detection method is proposed based on such a combination to further improve (about 11% higher compared with our first method based on sparsity testing) the accuracy of text detection in scene images. Finally, this method has been applied to handwritten Farsi numeral recognition, which has obtained a 99.22% recognition rate on the CENPARMI Database and a 99.5% recognition rate on the HODA Database. Meanwhile, a SVM with gradient features achieves recognition rates of 98.98% and 99.22% on these databases respectivel

    Robust regression on clustered data and signature based online Arabic handwriting recognition

    Get PDF
    In this thesis, we present two different methodologies; one for processing time series data for use in machine learning and the other, a robust linear regression for clustered data. The foundation in both of these methodologies is attempting to utilise time series data in ways in which have traditionally been prohibitive, owing to the ragged nature of such data. In order to use standard machine learning tools to classify online Arabic handwritten characters, we develop a dyadic iterated integral path signature approach to processing the underlying time series data. The process developed transforms raw online Arabic handwritten character data in the form of multiple time series, into a single set of features that can be used as features for machine learning. When applied to the Online KHATT segmented character data set, the methodology combined with both random forests and long short term memory (LSTM) neural networks demonstrates a dramatic improvement in recognition performance over the previously published best (using hidden Markov models). Furthermore, this processing methodology can be applied to any number of similar scenarios including other online handwritten scripts and even drawings on tablets. Secondly, with the aim of carrying out polynomial regression using the iterated integral pathlogsignature, we present a robust eigenvalue polynomial regression. This new form of regression is designed to significantly reduce the impact of clustered data on the fitting of apolynomial approximation to the data. Using knowledge of the location of the clusters of data in space, combined with the region over which we wish to obtain a robust estimate, this eigen- value based method can be seen to have vast improvements over standard least squares linear regression. The methodology is demonstrated to result in a large decrease in the L2 error of polynomial approximations to a number of functions

    Handwritten Character Recognition of a Vernacular Language: The Odia Script

    Get PDF
    Optical Character Recognition, i.e., OCR taking into account the principle of applying electronic or mechanical translation of images from printed, manually written or typewritten sources to editable version. As of late, OCR technology has been utilized in most of the industries for better management of various documents. OCR helps to edit the text, allow us to search for a word or phrase, and store it more compactly in the computer memory for future use and moreover, it can be processed by other applications. In India, a couple of organizations have designed OCR for some mainstream Indic dialects, for example, Devanagari, Hindi, Bangla and to some extent Telugu, Tamil, Gurmukhi, Odia, etc. However, it has been observed that the progress for Odia script recognition is quite less when contrasted with different dialects. Any recognition process works on some nearby standard databases. Till now, no such standard database available in the literature for Odia script. Apart from the existing standard databases for other Indic languages, in this thesis, we have designed databases on handwritten Odia Digit, and character for the simulation of the proposed schemes. In this thesis, four schemes have been suggested, one for the recognition of Odia digit and other three for atomic Odia character. Various issues of handwritten character recognition have been examined including feature extraction, the grouping of samples based on some characteristics, and designing classifiers. Also, different features such as statistical as well as structural of a character have been studied. It is not necessary that the character written by a person next time would always be of same shape and stroke. Hence, variability in the personal writing of different individual makes the character recognition quite challenging. Standard classifiers have been utilized for the recognition of Odia character set. An array of Gabor filters has been employed for recognition of Odia digits. In this regard, each image is divided into four blocks of equal size. Gabor filters with various scales and orientations have been applied to these sub-images keeping other filter parameters constant. The average energy is computed for each transformed image to obtain a feature vector for each digit. Further, a Back Propagation Neural Network (BPNN) has been employed to classify the samples taking the feature vector as input. In addition, the proposed scheme has also been tested on standard digit databases like MNIST and USPS. Toward the end of this part, an application has been intended to evaluate simple arithmetic equation. viii A multi-resolution scheme has been suggested to extract features from Odia atomic character and recognize them using the back propagation neural network. It has been observed that few Odia characters have a vertical line present toward the end. It helps in dividing the whole dataset into two subgroups, in particular, Group I and Group II such that all characters in Group I have a vertical line and rest are in Group II. The two class classification problem has been tackled by a single layer perceptron. Besides, the two-dimensional Discrete Orthogonal S-Transform (DOST) coefficients are extracted from images of each group, subsequently, Principal Component Analysis (PCA) has been applied to find significant features. For each group, a separate BPNN classifier is utilized to recognize the character set

    Adaptive Fusion Techniques for Effective Multimodal Deep Learning

    Get PDF
    Effective fusion of data from multiple modalities, such as video, speech, and text, is a challenging task due to the heterogeneous nature of multimodal data. In this work, we propose fusion techniques that aim to model context from different modalities effectively. Instead of defining a deterministic fusion operation, such as concatenation, for the network, we let the network decide “how” to combine given multimodal features more effectively. We propose two networks: 1) Auto-Fusion network, which aims to compress information from different modalities while preserving the context, and 2) GAN-Fusion, which regularizes the learned latent space given context from complementing modalities. A quantitative evaluation on the tasks of multimodal machine translation and emotion recognition suggests that our adaptive networks can better model context from other modalities than all existing methods, many of which employ massive transformer-based networks

    Deep Multi Temporal Scale Networks for Human Motion Analysis

    Get PDF
    The movement of human beings appears to respond to a complex motor system that contains signals at different hierarchical levels. For example, an action such as ``grasping a glass on a table'' represents a high-level action, but to perform this task, the body needs several motor inputs that include the activation of different joints of the body (shoulder, arm, hand, fingers, etc.). Each of these different joints/muscles have a different size, responsiveness, and precision with a complex non-linearly stratified temporal dimension where every muscle has its temporal scale. Parts such as the fingers responds much faster to brain input than more voluminous body parts such as the shoulder. The cooperation we have when we perform an action produces smooth, effective, and expressive movement in a complex multiple temporal scale cognitive task. Following this layered structure, the human body can be described as a kinematic tree, consisting of joints connected. Although it is nowadays well known that human movement and its perception are characterised by multiple temporal scales, very few works in the literature are focused on studying this particular property. In this thesis, we will focus on the analysis of human movement using data-driven techniques. In particular, we will focus on the non-verbal aspects of human movement, with an emphasis on full-body movements. The data-driven methods can interpret the information in the data by searching for rules, associations or patterns that can represent the relationships between input (e.g. the human action acquired with sensors) and output (e.g. the type of action performed). Furthermore, these models may represent a new research frontier as they can analyse large masses of data and focus on aspects that even an expert user might miss. The literature on data-driven models proposes two families of methods that can process time series and human movement. The first family, called shallow models, extract features from the time series that can help the learning algorithm find associations in the data. These features are identified and designed by domain experts who can identify the best ones for the problem faced. On the other hand, the second family avoids this phase of extraction by the human expert since the models themselves can identify the best set of features to optimise the learning of the model. In this thesis, we will provide a method that can apply the multi-temporal scales property of the human motion domain to deep learning models, the only data-driven models that can be extended to handle this property. We will ask ourselves two questions: what happens if we apply knowledge about how human movements are performed to deep learning models? Can this knowledge improve current automatic recognition standards? In order to prove the validity of our study, we collected data and tested our hypothesis in specially designed experiments. Results support both the proposal and the need for the use of deep multi-scale models as a tool to better understand human movement and its multiple time-scale nature

    Dictogloss in the Primary School EFL classroom: Investigating the process, product and perceptions of collaborative writing

    Get PDF
    544 p.Debido al auge del aprendizaje de inglés como lengua extranjera en Educación Primaria, resulta de gran importancia investigar propuestas pedagógicas apropiadas para el alumnado de este grupo de edad. Las tareas de escritura colaborativa han demostrado ser eficaces con adultos y niños a la hora de favorecer la interacción entre pares y la atención a la forma, dos procesos fundamentales en el aprendizaje de segundas lenguas, en especial en aquellos contextos donde el contacto con la lengua extranjera fuera del aula es escaso. La presente tesis doctoral explora el potencial de un tipo de tarea de escritura colaborativa, la dictoglosia, entre jóvenes aprendices de inglés como lengua extranjera (de 11 y 12 años). En el estudio presentado, se analizó la cantidad y la calidad de la atención a la forma durante el desarrollo de la dictoglosia, así como la complejidad y la corrección de la producción escrita en inglés. Paralelamente, se cuantificó la atención dedicada a dos formas lingüísticas meta (la flexión verbal de tercera persona singular -s y los determinantes posesivos de tercera persona his/her), diseminadas en los textos originales que los aprendices tenían que reconstruir como parte de la dictoglosia. Se analizó tanto la atención a la forma como la producción escrita en función de la repetición de la tarea, la instrucción focalizada hacia las formas meta (IFF) y una serie de factores individuales, como las actitudes y la calidad de la escritura en la primera lengua (L1) (español). Los participantes se asignaron a dos grupos experimentales: uno deellos realizó las dos dictoglosias experimentales en parejas (Colab), mientras que el otro recibió una instrucción sobre las formas meta antes de completar las tareas de forma colaborativa (IFF+Colab). El grupo de comparación llevó a cabo las tareas individualmente. Utilizando un diseño pretest (T1), postest (T2) y postest diferido (T3), se obtuvo información sobre la complejidad y la corrección de la escritura narrativa individual en inglés de los participantes. Por otra parte, antes de las tareas se administró un cuestionario relativo a las actitudes hacia la escritura y el trabajo colaborativo, y una vez realizadas las dictoglosias, los participantes ofrecieron su opinión acerca de estas mediante otro cuestionario y entrevistas focales. Los datos indicaron que los jóvenes aprendices que realizaron la tarea de manera colaborativa se centraron principalmente en aspectos mecánicos y gramaticales de la lengua, si bien las discusiones sobre el léxico resultaron más extensas y elaboradas. La mayoría de las cuestiones lingüísticas se resolvieron correctamente, y el uso del inglés fue mayor en discusiones relativas a la gramática, mientras que la L1 predominó en episodios léxicos. En cuanto a la variable de la repetición de la tarea, el grupo IFF+Colab empleó significativamente menos tiempo el segundo día que el primero, mientras que las características de las discusiones resultaron similares ambos días para ambos grupos colaborativos. Por otra parte, la instrucción sobre la -s proporcionada antes de la tarea se tradujo en una atención significativamente mayor a esta forma con respecto al grupo Colab. El análisis del producto escrito en respuesta a la tarea no evidenció una ventaja de los grupos colaborativos frente al individual ni a nivel de complejidad ni de corrección. Con todo, el grupo IFF+Colab obtuvo el mayor índice de corrección en las formas meta. La producción individual en respuesta a los tests permitió observar una ventaja del grupo Colab frente a los otros dos grupos, mientras que en el caso de los participantes de IFF+Colab se detectaron ciertas tendencias positivas entre el T1 y T2, incluyendo la corrección de las formas meta. Los jóvenes aprendices manifestaron una predisposición positiva al trabajo colaborativo y a la escritura, a pesar de que la escritura en inglés suscitó más recelos que la escritura en su L1. Los participantes percibieron favorablemente la tarea de la dictoglosia, siendo esta valoración más notable en el caso de aquellos que la realizaron en parejas y, en especial, del grupo Colab. Por el contrario, los conflictos generados durante la resolución de la tarea en algunas parejas de la condición IFF+Colab influyeron negativamente en la percepción de la misma en este grupo. Finalmente, a la hora de predecir el nivel de logro en la escritura en inglés, las actitudes hacia la escritura resultaron ser más relevantes que la competencia escrita en L1. En suma, la presente tesis contribuye a ampliar el conocimiento sobre el potencial de la tarea de la dictoglosia en Educación Primaria y a informar la práctica docente en esta etapa educativa

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Aerospace medicine and biology: A continuing bibliography with indexes (supplement 368)

    Get PDF
    This bibliography lists 305 reports, articles, and other documents introduced into the NASA Scientific and Technical Information System during Sep. 1992. The subject coverage concentrates on the biological, physiological, psychological, and environmental effects to which humans are subjected during and following simulated or actual flight in the Earth's atmosphere or in interplanetary space. References describing similar effects on biological organisms of lower order are also included. Such related topics as sanitary problems, pharmacology, toxicology, safety and survival, life support systems, exobiology, and personnel factors receive appropriate attention. Applied research receives the most emphasis, but references to fundamental studies and theoretical principles related to experimental development also qualify for inclusion
    corecore