1 research outputs found
Improving Accessibility of Archived Raster Dictionaries of Complex Script Languages
We propose an approach to index raster images of dictionary pages which in
turn would require very little manual effort to enable direct access to the
appropriate pages of the dictionary for lookup. Accessibility is further
improved by feedback and crowdsourcing that enables highlighting of the
specific location on the page where the lookup word is found, annotation,
digitization, and fielded searching. This approach is equally applicable on
simple scripts as well as complex writing systems. Using our proposed approach,
we have built a Web application called "Dictionary Explorer" which supports
word indexes in various languages and every language can have multiple
dictionaries associated with it. Word lookup gives direct access to appropriate
pages of all the dictionaries of that language simultaneously. The application
has exploration features like searching, pagination, and navigating the word
index through a tree-like interface. The application also supports feedback,
annotation, and digitization features. Apart from the scanned images,
"Dictionary Explorer" aggregates results from various sources and user
contributions in Unicode. We have evaluated the time required for indexing
dictionaries of different sizes and complexities in the Urdu language and
examined various trade-offs in our implementation. Using our approach, a single
person can make a dictionary of 1,000 pages searchable in less than an hour.Comment: 11 pages, 5 images, 2 codes, 1 tabl