58 research outputs found

    Arabic cursive text recognition from natural scene images

    Full text link
    © 2019 by the authors. This paper presents a comprehensive survey on Arabic cursive scene text recognition. The recent years' publications in this field have witnessed the interest shift of document image analysis researchers from recognition of optical characters to recognition of characters appearing in natural images. Scene text recognition is a challenging problem due to the text having variations in font styles, size, alignment, orientation, reflection, illumination change, blurriness and complex background. Among cursive scripts, Arabic scene text recognition is contemplated as a more challenging problem due to joined writing, same character variations, a large number of ligatures, the number of baselines, etc. Surveys on the Latin and Chinese script-based scene text recognition system can be found, but the Arabic like scene text recognition problem is yet to be addressed in detail. In this manuscript, a description is provided to highlight some of the latest techniques presented for text classification. The presented techniques following a deep learning architecture are equally suitable for the development of Arabic cursive scene text recognition systems. The issues pertaining to text localization and feature extraction are also presented. Moreover, this article emphasizes the importance of having benchmark cursive scene text dataset. Based on the discussion, future directions are outlined, some of which may provide insight about cursive scene text to researchers

    Evaluation of handwritten Urdu text by integration of MNIST dataset learning experience

    Full text link
    © 2019 IEEE. The similar nature of patterns may enhance the learning if the experience they attained during training is utilized to achieve maximum accuracy. This paper presents a novel way to exploit the transfer learning experience of similar patterns on handwritten Urdu text analysis. The MNIST pre-trained network is employed by transferring it's learning experience on Urdu Nastaliq Handwritten Dataset (UNHD) samples. The convolutional neural network is used for feature extraction. The experiments were performed using deep multidimensional long short term (MDLSTM) memory networks. The obtained result shows immaculate performance on number of experiments distinguished on the basis of handwritten complexity. The result of demonstrated experiments show that pre-trained network outperforms on subsequent target networks which enable them to focus on a particular feature learning. The conducted experiments presented astonishingly good accuracy on UNHD dataset

    Deep learning for universal emotion recognition in still images

    Get PDF
    This work propose a methodology for still image facial expression. The proposed method contains a face detection and alignment module followed by a deep convolutional neural network (CNN) that outputs a seven emotions probability vector

    Recognition of Japanese handwritten characters with Machine learning techniques

    Get PDF
    The recognition of Japanese handwritten characters has always been a challenge for researchers. A large number of classes, their graphic complexity, and the existence of three different writing systems make this problem particularly difficult compared to Western writing. For decades, attempts have been made to address the problem using traditional OCR (Optical Character Recognition) techniques, with mixed results. With the recent popularization of machine learning techniques through neural networks, this research has been revitalized, bringing new approaches to the problem. These new results achieve performance levels comparable to human recognition. Furthermore, these new techniques have allowed collaboration with very different disciplines, such as the Humanities or East Asian studies, achieving advances in them that would not have been possible without this interdisciplinary work. In this thesis, these techniques are explored until reaching a sufficient level of understanding that allows us to carry out our own experiments, training neural network models with public datasets of Japanese characters. However, the scarcity of public datasets makes the task of researchers remarkably difficult. Our proposal to minimize this problem is the development of a web application that allows researchers to easily collect samples of Japanese characters through the collaboration of any user. Once the application is fully operational, the examples collected until that point will be used to create a new dataset in a specific format. Finally, we can use the new data to carry out comparative experiments with the previous neural network models

    文字レベル深層学習によるテキスト分類と転移学習

    Get PDF
    自然言語処理において,1 次元畳み込みニューラルネットワーク(Temporal Convolutional Neural Networks;Temporal CNN,ConvNet)を適用する事例が報告されている.ConvNetの入力として単語の系列や文字の系列が使われるが,後者の文字レベルConvNetの場合は形態素解析等の言語に依存する処理が一切不要となる.先行研究では,英語やローマ字化された中国語のデータセットにおいてニュースカテゴリ分類タスクや感情分析タスクでの有効性が示された.本研究では,文字レベルConvNetを日本語へ適用し,その有効性を検証した.その際,埋め込み層を導入することにより,先行研究で中国語に対して行われていたローマ字化処理が省かれ,簡便に適用することができた.また,大規模データセットで学習したネットワークが抽出する有用な中間表現を再利用すべく,どういった特徴を抽出しているか分析を行った.さらに,画像認識分野において盛んに行われている転移学習について,自然言語処理分野では研究が少ない.そこで本研究では,文字レベルConvNetを用いた転移学習を試みた.本研究における転移学習は主に 2 種類あり,1 つは,入力層である埋め込み層について言語モデルの 1 つであるSkip-gram モデルにより事前学習を行うというもの.もう 1つは,畳み込み層を主とするネットワーク全体について学習済みのモデルから転移を行うというものである.実験の結果,次のことがわかった.日本語のニュースカテゴリ分類タスクと感情分析タスクに文字レベルConvNetを適用した結果,データセットの規模が大きい場合に従来手法であるBag-of-Words 等に比べて精度が向上した.また,文字レベルConvNetの畳み込みフィルタ 1枚で複数の N-gramを表現できており,畳み込みフィルタで学習された特徴量の優位性が明らかとなった.埋め込み層の転移学習については,事前学習用の大規模データセットと転移先のデータセットのタスクが異なっているにもかかわらず,転移により精度が向上した.更に,他言語である中国語のデータセットを用いて埋め込み層の事前学習を行い,日本語のタスクへ転移させたところ精度向上が認められた.また,ネットワーク全体の転移学習については,類似タスク間の場合に精度が向上した.電気通信大学201

    Real-Time Facial Emotion Recognition Using Fast R-CNN

    Get PDF
    In computer vision and image processing, object detection algorithms are used to detect semantic objects of certain classes of images and videos. Object detector algorithms use deep learning networks to classify detected regions. Unprecedented advancements in Convolutional Neural Networks (CNN) have led to new possibilities and implementations for object detectors. An object detector which uses a deep learning algorithm detect objects through proposed regions, and then classifies the region using a CNN. Object detectors are computationally efficient unlike a typical CNN which is computationally complex and expensive. Object detectors are widely used for face detection, recognition, and object tracking. In this thesis, deep learning based object detection algorithms are implemented to classify facially expressed emotions in real-time captured through a webcam. A typical CNN would classify images without specifying regions within an image, which could be considered as a limitation towards better understanding the network performance which depend on different training options. It would also be more difficult to verify whether a network have converged and is able to generalize, which is the ability to classify unseen data, data which was not part of the training set. Fast Region-based Convolutional Neural Network, an object detection algorithm; used to detect facially expressed emotion in real-time by classifying proposed regions. The Fast R-CNN is trained using a high-quality video database, consisting of 24 actors, facially expressing eight different emotions, obtained from images which were processed from 60 videos per actor. An object detector’s performance is measured using various metrics. Regardless of how an object detector performed with respect to average precision or miss rate, doing well on such metrics would not necessarily mean that the network is correctly classifying regions. This may result from the fact that the network model has been over-trained. In our work we showed that object detector algorithm such as Fast R-CNN performed surprisingly well in classifying facially expressed emotions in real-time, performing better than CNN

    Machine Learning Approaches to Human Body Shape Analysis

    Get PDF
    Soft biometrics, biomedical sciences, and many other fields of study pay particular attention to the study of the geometric description of the human body, and its variations. Although multiple contributions, the interest is particularly high given the non-rigid nature of the human body, capable of assuming different poses, and numerous shapes due to variable body composition. Unfortunately, a well-known costly requirement in data-driven machine learning, and particularly in the human-based analysis, is the availability of data, in the form of geometric information (body measurements) with related vision information (natural images, 3D mesh, etc.). We introduce a computer graphics framework able to generate thousands of synthetic human body meshes, representing a population of individuals with stratified information: gender, Body Fat Percentage (BFP), anthropometric measurements, and pose. This contribution permits an extensive analysis of different bodies in different poses, avoiding the demanding, and expensive acquisition process. We design a virtual environment able to take advantage of the generated bodies, to infer the body surface area (BSA) from a single view. The framework permits to simulate the acquisition process of newly introduced RGB-D devices disentangling different noise components (sensor noise, optical distortion, body part occlusions). Common geometric descriptors in soft biometric, as well as in biomedical sciences, are based on body measurements. Unfortunately, as we prove, these descriptors are not pose invariant, constraining the usability in controlled scenarios. We introduce a differential geometry approach assuming body pose variations as isometric transformations of the body surface, and body composition changes covariant to the body surface area. This setting permits the use of the Laplace-Beltrami operator on the 2D body manifold, describing the body with a compact, efficient, and pose invariant representation. We design a neural network architecture able to infer important body semantics from spectral descriptors, closing the gap between abstract spectral features, and traditional measurement-based indices. Studying the manifold of body shapes, we propose an innovative generative adversarial model able to learn the body shapes. The method permits to generate new bodies with unseen geometries as a walk on the latent space, constituting a significant advantage over traditional generative methods
    corecore