61 research outputs found
Unconstrained Scene Text and Video Text Recognition for Arabic Script
Building robust recognizers for Arabic has always been challenging. We
demonstrate the effectiveness of an end-to-end trainable CNN-RNN hybrid
architecture in recognizing Arabic text in videos and natural scenes. We
outperform previous state-of-the-art on two publicly available video text
datasets - ALIF and ACTIV. For the scene text recognition task, we introduce a
new Arabic scene text dataset and establish baseline results. For scripts like
Arabic, a major challenge in developing robust recognizers is the lack of large
quantity of annotated data. We overcome this by synthesising millions of Arabic
text images from a large vocabulary of Arabic words and phrases. Our
implementation is built on top of the model introduced here [37] which is
proven quite effective for English scene text recognition. The model follows a
segmentation-free, sequence to sequence transcription approach. The network
transcribes a sequence of convolutional features from the input image to a
sequence of target labels. This does away with the need for segmenting input
image into constituent characters/glyphs, which is often difficult for Arabic
script. Further, the ability of RNNs to model contextual dependencies yields
superior recognition results.Comment: 5 page
Text Recognition in Multimedia Documents: A Study of two Neural-based OCRs Using and Avoiding Character Segmentation
International audienceText embedded in multimedia documents represents an important semantic information that helps to automatically access the content. This paper proposes two neural-based OCRs that handle the text recognition problem in different ways. The first approach segments a text image into individual characters before recognizing them, while the second one avoids the segmentation step by integrating a multi-scale scanning scheme that allows to jointly localize and recognize characters at each position and scale. Some linguistic knowledge is also incorporated into the proposed schemes to remove errors due to recognition confusions. Both OCR systems are applied to caption texts embedded in videos and in natural scene images and provide outstanding results showing that the proposed approaches outperform the state-of-the-art methods
Integrated analysis of audiovisual signals and external information sources for event detection in team sports video
Ph.DDOCTOR OF PHILOSOPH
Sayısal çoğulortam verisinin anlamsal çokkipli analizi
TÜBİTAK EEEAG Proje30.09.200
Vision 21: Interdisciplinary Science and Engineering in the Era of Cyberspace
The symposium Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace was held at the NASA Lewis Research Center on March 30-31, 1993. The purpose of the symposium was to simulate interdisciplinary thinking in the sciences and technologies which will be required for exploration and development of space over the next thousand years. The keynote speakers were Hans Moravec, Vernor Vinge, Carol Stoker, and Myron Krueger. The proceedings consist of transcripts of the invited talks and the panel discussion by the invited speakers, summaries of workshop sessions, and contributed papers by the attendees
Applications of satellite technology to broadband ISDN networks
Two satellite architectures for delivering broadband integrated services digital network (B-ISDN) service are evaluated. The first is assumed integral to an existing terrestrial network, and provides complementary services such as interconnects to remote nodes as well as high-rate multicast and broadcast service. The interconnects are at a 155 Mbs rate and are shown as being met with a nonregenerative multibeam satellite having 10-1.5 degree spots. The second satellite architecture focuses on providing private B-ISDN networks as well as acting as a gateway to the public network. This is conceived as being provided by a regenerative multibeam satellite with on-board ATM (asynchronous transfer mode) processing payload. With up to 800 Mbs offered, higher satellite EIRP is required. This is accomplished with 12-0.4 degree hopping beams, covering a total of 110 dwell positions. It is estimated the space segment capital cost for architecture one would be about 250M. The net user cost is given for a variety of scenarios, but the cost for 155 Mbs services is shown to be about $15-22/minute for 25 percent system utilization
Video coding for compression and content-based functionality
The lifetime of this research project has seen two dramatic developments in the area of digital video coding. The first has been the progress of compression research leading to a factor of two improvement over existing standards, much wider deployment possibilities and the development of the new international ITU-T Recommendation H.263. The second has been a radical change in the approach to video content production with the introduction of the content-based coding concept and the addition of scene composition information to the encoded bit-stream. Content-based coding is central to the latest international standards efforts from the ISO/IEC MPEG working group.
This thesis reports on extensions to existing compression techniques exploiting a priori knowledge about scene content. Existing, standardised, block-based compression coding techniques were extended with work on arithmetic entropy coding and intra-block prediction. These both form part of the H.263 and MPEG-4 specifications respectively. Object-based coding techniques were developed within a collaborative simulation model, known as SIMOC, then extended with ideas on grid motion vector modelling and vector accuracy confidence estimation. An improved confidence measure for encouraging motion smoothness is proposed.
Object-based coding ideas, with those from other model and layer-based coding approaches, influenced the development of content-based coding within MPEG-4. This standard made considerable progress in this newly adopted content based video coding field defining normative techniques for arbitrary shape and texture coding. The means to generate this information, the analysis problem, for the content to be coded was intentionally not specified. Further research work in this area concentrated on video segmentation and analysis techniques to exploit the benefits of content based coding for generic frame based video. The work reported here introduces the use of a clustering algorithm on raw data features for providing initial segmentation of video data and subsequent tracking of those image regions through video sequences. Collaborative video analysis frameworks from COST 21 l qual and MPEG-4, combining results from many other segmentation schemes, are also introduced
System for caption text extraction on a hierarchical region-based image representation
English: This work presents a technique for detecting caption text for indexing purposes. This technique is to be included in a generic indexing system dealing with other semantic concepts. The various object detection algorithms are required to share a common image description which is a hierarchical region-based image model. Caption text objects are detected combining texture and geometric features, which are estimated using wavelet analysis and taking advantage of the region-based image model, respectively. Analysis of the region hierarchy provides the final caption text objects
- …