29,186 research outputs found

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

    MSAT-X: A technical introduction and status report

    Get PDF
    A technical introduction and status report for the Mobile Satellite Experiment (MSAT-X) program is presented. The concepts of a Mobile Satellite System (MSS) and its unique challenges are introduced. MSAT-X's role and objectives are delineated with focus on its achievements. An outline of MSS design philosophy is followed by a presentation and analysis of the MSAT-X results, which are cast in a broader context of an MSS. The current phase of MSAT-X has focused notably on the ground segment of MSS. The accomplishments in the four critical technology areas of vehicle antennas, modem and mobile terminal design, speech coding, and networking are presented. A concise evolutionary trace is incorporated in each area to elucidate the rationale leading to the current design choices. The findings in the area of propagation channel modeling are also summarized and their impact on system design discussed. To facilitate the assessment of the MSAT-X results, technology and subsystem recommendations are also included and integrated with a quantitative first-generation MSS design

    Time-Efficient Hybrid Approach for Facial Expression Recognition

    Get PDF
    Facial expression recognition is an emerging research area for improving human and computer interaction. This research plays a significant role in the field of social communication, commercial enterprise, law enforcement, and other computer interactions. In this paper, we propose a time-efficient hybrid design for facial expression recognition, combining image pre-processing steps and different Convolutional Neural Network (CNN) structures providing better accuracy and greatly improved training time. We are predicting seven basic emotions of human faces: sadness, happiness, disgust, anger, fear, surprise and neutral. The model performs well regarding challenging facial expression recognition where the emotion expressed could be one of several due to their quite similar facial characteristics such as anger, disgust, and sadness. The experiment to test the model was conducted across multiple databases and different facial orientations, and to the best of our knowledge, the model provided an accuracy of about 89.58% for KDEF dataset, 100% accuracy for JAFFE dataset and 71.975% accuracy for combined (KDEF + JAFFE + SFEW) dataset across these different scenarios. Performance evaluation was done by cross-validation techniques to avoid bias towards a specific set of images from a database

    Challenging Neural Dialogue Models with Natural Data: Memory Networks Fail on Incremental Phenomena

    Full text link
    Natural, spontaneous dialogue proceeds incrementally on a word-by-word basis; and it contains many sorts of disfluency such as mid-utterance/sentence hesitations, interruptions, and self-corrections. But training data for machine learning approaches to dialogue processing is often either cleaned-up or wholly synthetic in order to avoid such phenomena. The question then arises of how well systems trained on such clean data generalise to real spontaneous dialogue, or indeed whether they are trainable at all on naturally occurring dialogue data. To answer this question, we created a new corpus called bAbI+ by systematically adding natural spontaneous incremental dialogue phenomena such as restarts and self-corrections to the Facebook AI Research's bAbI dialogues dataset. We then explore the performance of a state-of-the-art retrieval model, MemN2N, on this more natural dataset. Results show that the semantic accuracy of the MemN2N model drops drastically; and that although it is in principle able to learn to process the constructions in bAbI+, it needs an impractical amount of training data to do so. Finally, we go on to show that an incremental, semantic parser -- DyLan -- shows 100% semantic accuracy on both bAbI and bAbI+, highlighting the generalisation properties of linguistically informed dialogue models.Comment: 9 pages, 3 figures, 2 tables. Accepted as a full paper for SemDial 201
    corecore