Search CORE

61 research outputs found

Unconstrained Scene Text and Video Text Recognition for Arabic Script

Author: Jain Mohit
Jawahar C. V.
Mathew Minesh
Publication venue
Publication date: 07/11/2017
Field of study

Building robust recognizers for Arabic has always been challenging. We demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid architecture in recognizing Arabic text in videos and natural scenes. We outperform previous state-of-the-art on two publicly available video text datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a new Arabic scene text dataset and establish baseline results. For scripts like Arabic, a major challenge in developing robust recognizers is the lack of large quantity of annotated data. We overcome this by synthesising millions of Arabic text images from a large vocabulary of Arabic words and phrases. Our implementation is built on top of the model introduced here [37] which is proven quite effective for English scene text recognition. The model follows a segmentation-free, sequence to sequence transcription approach. The network transcribes a sequence of convolutional features from the input image to a sequence of target labels. This does away with the need for segmenting input image into constituent characters/glyphs, which is often difficult for Arabic script. Further, the ability of RNNs to model contextual dependencies yields superior recognition results.Comment: 5 page

arXiv.org e-Print Archive

Crossref

Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation

Author: A Dempster
C Garcia
Christophe Garcia
D Chen
Franck Mamalet
H Li
J Lim
J Weinman
K Jung
Khaoula Elagouni
L Bahl
M Li
Pascale Sébillot
Q Ye
R Casey
R Yager
S Lucas
T Sato
Y LeCun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/03/2014
Field of study

International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

HAL-Rennes 1

Integrated analysis of audiovisual signals and external information sources for event detection in team sports video

Author: XU HUAXIN
Publication venue
Publication date: 28/04/2008
Field of study

Ph.DDOCTOR OF PHILOSOPH

ScholarBank@NUS

UNIMAS Today : Education for the Future , November 1995

Author: Universiti Malaysia Sarawak. UNIMAS Global
Publication venue: Universiti Malaysia Sarawak (UNIMAS)
Publication date: 01/01/1995
Field of study

Unimas Institutional Repository

Sayısal çoğulortam verisinin anlamsal çokkipli analizi

Author: Alatan Aydın A.
Publication venue
Publication date: 01/01/2008
Field of study

TÜBİTAK EEEAG Proje30.09.200

OpenMETU (Middle East Technical University)

Vision 21: Interdisciplinary Science and Engineering in the Era of Cyberspace

Author
Publication venue
Publication date
Field of study

The symposium Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace was held at the NASA Lewis Research Center on March 30-31, 1993. The purpose of the symposium was to simulate interdisciplinary thinking in the sciences and technologies which will be required for exploration and development of space over the next thousand years. The keynote speakers were Hans Moravec, Vernor Vinge, Carol Stoker, and Myron Krueger. The proceedings consist of transcripts of the invited talks and the panel discussion by the invited speakers, summaries of workshop sessions, and contributed papers by the attendees

NASA Technical Reports Server

Applications of satellite technology to broadband ISDN networks

Author: Chitre D. M.
Henderson T. R.
Kwan Robert K.
Morgan W. L.
Price Kent M.
White L. W.
Publication venue
Publication date
Field of study

Two satellite architectures for delivering broadband integrated services digital network (B-ISDN) service are evaluated. The first is assumed integral to an existing terrestrial network, and provides complementary services such as interconnects to remote nodes as well as high-rate multicast and broadcast service. The interconnects are at a 155 Mbs rate and are shown as being met with a nonregenerative multibeam satellite having 10-1.5 degree spots. The second satellite architecture focuses on providing private B-ISDN networks as well as acting as a gateway to the public network. This is conceived as being provided by a regenerative multibeam satellite with on-board ATM (asynchronous transfer mode) processing payload. With up to 800 Mbs offered, higher satellite EIRP is required. This is accomplished with 12-0.4 degree hopping beams, covering a total of 110 dwell positions. It is estimated the space segment capital cost for architecture one would be about

190M whereas the second architecture would be about

250M. The net user cost is given for a variety of scenarios, but the cost for 155 Mbs services is shown to be about $15-22/minute for 25 percent system utilization

NASA Technical Reports Server

Video coding for compression and content-based functionality

Author: Mulroy Patrick Joseph
Publication venue: Dublin City University. School of Electronic Engineering
Publication date: 01/01/1999
Field of study

The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group. This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed. Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced

Irish Universities

DCU Online Research Access Service

System for caption text extraction on a hierarchical region-based image representation

Author: Zaytseva Ekaterina
Publication venue: Universitat Politècnica de Catalunya
Publication date: 01/01/2011
Field of study

English: This work presents a technique for detecting caption text for indexing purposes. This technique is to be included in a generic indexing system dealing with other semantic concepts. The various object detection algorithms are required to share a common image description which is a hierarchical region-based image model. Caption text objects are detected combining texture and geometric features, which are estimated using wavelet analysis and taking advantage of the region-based image model, respectively. Analysis of the region hierarchy provides the final caption text objects

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC