12,618 research outputs found
Text Line Segmentation of Historical Documents: a Survey
There is a huge amount of historical documents in libraries and in various
National Archives that have not been exploited electronically. Although
automatic reading of complete pages remains, in most cases, a long-term
objective, tasks such as word spotting, text/image alignment, authentication
and extraction of specific fields are in use today. For all these tasks, a
major step is document segmentation into text lines. Because of the low quality
and the complexity of these documents (background noise, artifacts due to
aging, interfering lines),automatic text line segmentation remains an open
research field. The objective of this paper is to present a survey of existing
methods, developed during the last decade, and dedicated to documents of
historical interest.Comment: 25 pages, submitted version, To appear in International Journal on
Document Analysis and Recognition, On line version available at
http://www.springerlink.com/content/k2813176280456k3
Non-Visual Representation of Complex Documents for Use in Digital Talking Books
Essential written information such as text books, bills, and catalogues needs to be accessible by everyone. However, access is not always available to vision-impaired people. As they require electronic documents to be available in specific formats. In order to address the accessibility issues of electronic documents, this research aims to design an affordable, portable, standalone and simple to use complete reading system that will convert and describe complex components in electronic documents to print disabled users
Recommended from our members
Simultaneous mesoscopic and two-photon imaging of neuronal activity in cortical circuits.
Spontaneous and sensory-evoked activity propagates across varying spatial scales in the mammalian cortex, but technical challenges have limited conceptual links between the function of local neuronal circuits and brain-wide network dynamics. We present a method for simultaneous cellular-resolution two-photon calcium imaging of a local microcircuit and mesoscopic widefield calcium imaging of the entire cortical mantle in awake mice. Our multi-scale approach involves a microscope with an orthogonal axis design where the mesoscopic objective is oriented above the brain and the two-photon objective is oriented horizontally, with imaging performed through a microprism. We also introduce a viral transduction method for robust and widespread gene delivery in the mouse brain. These approaches allow us to identify the behavioral state-dependent functional connectivity of pyramidal neurons and vasoactive intestinal peptide-expressing interneurons with long-range cortical networks. Our imaging system provides a powerful strategy for investigating cortical architecture across a wide range of spatial scales
Information Preserving Processing of Noisy Handwritten Document Images
Many pre-processing techniques that normalize artifacts and clean noise induce anomalies due to discretization of the document image. Important information that could be used at later stages may be lost. A proposed composite-model framework takes into account pre-printed information, user-added data, and digitization characteristics. Its benefits are demonstrated by experiments with statistically significant results. Separating pre-printed ruling lines from user-added handwriting shows how ruling lines impact people\u27s handwriting and how they can be exploited for identifying writers. Ruling line detection based on multi-line linear regression reduces the mean error of counting them from 0.10 to 0.03, 6.70 to 0.06, and 0.13 to 0.02, com- pared to an HMM-based approach on three standard test datasets, thereby reducing human correction time by 50%, 83%, and 72% on average. On 61 page images from 16 rule-form templates, the precision and recall of form cell recognition are increased by 2.7% and 3.7%, compared to a cross-matrix approach. Compensating for and exploiting ruling lines during feature extraction rather than pre-processing raises the writer identification accuracy from 61.2% to 67.7% on a 61-writer noisy Arabic dataset. Similarly, counteracting page-wise skew by subtracting it or transforming contours in a continuous coordinate system during feature extraction improves the writer identification accuracy. An implementation study of contour-hinge features reveals that utilizing the full probabilistic probability distribution function matrix improves the writer identification accuracy from 74.9% to 79.5%
Web-Based Visualization of Very Large Scientific Astronomy Imagery
Visualizing and navigating through large astronomy images from a remote
location with current astronomy display tools can be a frustrating experience
in terms of speed and ergonomics, especially on mobile devices. In this paper,
we present a high performance, versatile and robust client-server system for
remote visualization and analysis of extremely large scientific images.
Applications of this work include survey image quality control, interactive
data query and exploration, citizen science, as well as public outreach. The
proposed software is entirely open source and is designed to be generic and
applicable to a variety of datasets. It provides access to floating point data
at terabyte scales, with the ability to precisely adjust image settings in
real-time. The proposed clients are light-weight, platform-independent web
applications built on standard HTML5 web technologies and compatible with both
touch and mouse-based devices. We put the system to the test and assess the
performance of the system and show that a single server can comfortably handle
more than a hundred simultaneous users accessing full precision 32 bit
astronomy data.Comment: Published in Astronomy & Computing. IIPImage server available from
http://iipimage.sourceforge.net . Visiomatic code and demos available from
http://www.visiomatic.org
Investigation of techniques for inventorying forested regions. Volume 2: Forestry information system requirements and joint use of remotely sensed and ancillary data
The author has identified the following significant results. Effects of terrain topography in mountainous forested regions on LANDSAT signals and classifier training were found to be significant. The aspect of sloping terrain relative to the sun's azimuth was the major cause of variability. A relative insolation factor could be defined which, in a single variable, represents the joint effects of slope and aspect and solar geometry on irradiance. Forest canopy reflectances were bound, both through simulation, and empirically, to have nondiffuse reflectance characteristics. Training procedures could be improved by stratifying in the space of ancillary variables and training in each stratum. Application of the Tasselled-Cap transformation for LANDSAT data acquired over forested terrain could provide a viable technique for data compression and convenient physical interpretations
Structure Diagram Recognition in Financial Announcements
Accurately extracting structured data from structure diagrams in financial
announcements is of great practical importance for building financial knowledge
graphs and further improving the efficiency of various financial applications.
First, we proposed a new method for recognizing structure diagrams in financial
announcements, which can better detect and extract different types of
connecting lines, including straight lines, curves, and polylines of different
orientations and angles. Second, we developed a two-stage method to efficiently
generate the industry's first benchmark of structure diagrams from Chinese
financial announcements, where a large number of diagrams were synthesized and
annotated using an automated tool to train a preliminary recognition model with
fairly good performance, and then a high-quality benchmark can be obtained by
automatically annotating the real-world structure diagrams using the
preliminary model and then making few manual corrections. Finally, we
experimentally verified the significant performance advantage of our structure
diagram recognition method over previous methods
Modern Information Systems
The development of modern information systems is a demanding task. New technologies and tools are designed, implemented and presented in the market on a daily bases. User needs change dramatically fast and the IT industry copes to reach the level of efficiency and adaptability for its systems in order to be competitive and up-to-date. Thus, the realization of modern information systems with great characteristics and functionalities implemented for specific areas of interest is a fact of our modern and demanding digital society and this is the main scope of this book. Therefore, this book aims to present a number of innovative and recently developed information systems. It is titled "Modern Information Systems" and includes 8 chapters. This book may assist researchers on studying the innovative functions of modern systems in various areas like health, telematics, knowledge management, etc. It can also assist young students in capturing the new research tendencies of the information systems' development
- …