61 research outputs found

    Unconstrained Scene Text and Video Text Recognition for Arabic Script

    Full text link
    Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

    Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

    Get PDF
    International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

    UNIMAS Today : Education for the Future , November 1995

    Get PDF

    Sayısal çoğulortam verisinin anlamsal çokkipli analizi

    Get PDF
    TÜBİTAK EEEAG Proje30.09.200

    Vision 21: Interdisciplinary Science and Engineering in the Era of Cyberspace

    Get PDF
    The symposium Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace was held at the NASA Lewis Research Center on March 30-31, 1993. The purpose of the symposium was to simulate interdisciplinary thinking in the sciences and technologies which will be required for exploration and development of space over the next thousand years. The keynote speakers were Hans Moravec, Vernor Vinge, Carol Stoker, and Myron Krueger. The proceedings consist of transcripts of the invited talks and the panel discussion by the invited speakers, summaries of workshop sessions, and contributed papers by the attendees

    Applications of satellite technology to broadband ISDN networks

    Get PDF
    Two satellite architectures for delivering broadband integrated services digital network (B-ISDN) service are evaluated. The first is assumed integral to an existing terrestrial network, and provides complementary services such as interconnects to remote nodes as well as high-rate multicast and broadcast service. The interconnects are at a 155 Mbs rate and are shown as being met with a nonregenerative multibeam satellite having 10-1.5 degree spots. The second satellite architecture focuses on providing private B-ISDN networks as well as acting as a gateway to the public network. This is conceived as being provided by a regenerative multibeam satellite with on-board ATM (asynchronous transfer mode) processing payload. With up to 800 Mbs offered, higher satellite EIRP is required. This is accomplished with 12-0.4 degree hopping beams, covering a total of 110 dwell positions. It is estimated the space segment capital cost for architecture one would be about 190Mwhereasthesecondarchitecturewouldbeabout190M whereas the second architecture would be about 250M. The net user cost is given for a variety of scenarios, but the cost for 155 Mbs services is shown to be about $15-22/minute for 25 percent system utilization

    Video coding for compression and content-based functionality

    Get PDF
    The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group. This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed. Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced

    System for caption text extraction on a hierarchical region-based image representation

    Get PDF
    English: This work presents a technique for detecting caption text for indexing purposes. This technique is to be included in a generic indexing system dealing with other semantic concepts. The various object detection algorithms are required to share a common image description which is a hierarchical region-based image model. Caption text objects are detected combining texture and geometric features, which are estimated using wavelet analysis and taking advantage of the region-based image model, respectively. Analysis of the region hierarchy provides the final caption text objects
    corecore