160,289 research outputs found
Automatic Palaeographic Exploration of Genizah Manuscripts
The Cairo Genizah is a collection of hand-written documents containing approximately
350,000 fragments of mainly Jewish texts discovered in the late 19th
century. The
fragments are today spread out in some 75 libraries and private collections worldwide,
but there is an ongoing effort to document and catalogue all extant fragments.
Palaeographic information plays a key role in the study of the Genizah collection.
Script style, and–more specifically–handwriting, can be used to identify fragments that
might originate from the same original work. Such matched fragments, commonly
referred to as “joins”, are currently identified manually by experts, and presumably only
a small fraction of existing joins have been discovered to date. In this work, we show
that automatic handwriting matching functions, obtained from non-specific features
using a corpus of writing samples, can perform this task quite reliably. In addition, we
explore the problem of grouping various Genizah documents by script style, without
being provided any prior information about the relevant styles. The automatically
obtained grouping agrees, for the most part, with the palaeographic taxonomy. In cases
where the method fails, it is due to apparent similarities between related scripts
Computer Analysis of Architecture Using Automatic Image Understanding
In the past few years, computer vision and pattern recognition systems have
been becoming increasingly more powerful, expanding the range of automatic
tasks enabled by machine vision. Here we show that computer analysis of
building images can perform quantitative analysis of architecture, and quantify
similarities between city architectural styles in a quantitative fashion.
Images of buildings from 18 cities and three countries were acquired using
Google StreetView, and were used to train a machine vision system to
automatically identify the location of the imaged building based on the image
visual content. Experimental results show that the automatic computer analysis
can automatically identify the geographical location of the StreetView image.
More importantly, the algorithm was able to group the cities and countries and
provide a phylogeny of the similarities between architectural styles as
captured by StreetView images. These results demonstrate that computer vision
and pattern recognition algorithms can perform the complex cognitive task of
analyzing images of buildings, and can be used to measure and quantify visual
similarities and differences between different styles of architectures. This
experiment provides a new paradigm for studying architecture, based on a
quantitative approach that can enhance the traditional manual observation and
analysis. The source code used for the analysis is open and publicly available
Unsupervised Adaptation for Synthetic-to-Real Handwritten Word Recognition
Handwritten Text Recognition (HTR) is still a challenging problem because it
must deal with two important difficulties: the variability among writing
styles, and the scarcity of labelled data. To alleviate such problems,
synthetic data generation and data augmentation are typically used to train HTR
systems. However, training with such data produces encouraging but still
inaccurate transcriptions in real words. In this paper, we propose an
unsupervised writer adaptation approach that is able to automatically adjust a
generic handwritten word recognizer, fully trained with synthetic fonts,
towards a new incoming writer. We have experimentally validated our proposal
using five different datasets, covering several challenges (i) the document
source: modern and historic samples, which may involve paper degradation
problems; (ii) different handwriting styles: single and multiple writer
collections; and (iii) language, which involves different character
combinations. Across these challenging collections, we show that our system is
able to maintain its performance, thus, it provides a practical and generic
approach to deal with new document collections without requiring any expensive
and tedious manual annotation step.Comment: Accepted to WACV 202
- …