106,004 research outputs found
Detecting Authorship, Hands, and Corrections in Historical Manuscripts. A Mixedmethods Approach towards the Unpublished Writings of an 18th Century Czech EmigrĂŠ Community in Berlin (Handwriting)
When one starts working philologically with historical manuscripts, one faces important first questions involving authorship, writersâ hands andthe history of documenttransmission. These issues are especially thorny with documents remaining outside the established canon, such as privatemanuscripts, aboutwhichwehave very restrictedtext-externalinformation. In this area â so we argue â it is especially fruitful to employ a mixed-methods approach, combiningtailored automatic methods from image recognition/analysis with philological and linguistic knowledge.Whileimage analysis captureswritersâ hands, linguistic/philological research mainly addressestextual authorship;thetwo cross-fertilize and obtain a coherent interpretation which may then be evaluated against the available text-external historical evidence. Departingfrom our âlab caseâ,whichis a corpus of unedited Czechmanuscriptsfromthe archive of a small 18th century migrant community, the Herrnhuter BrĂźdergemeine (Brethren parish) in Berlin-NeukĂślln, our project has developed an assistance system which aids philologists in working with digitized (scanned) hand-written historical sources. We present its application and discuss its general potential and methodological implications
READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents
Text line detection is crucial for any application associated with Automatic
Text Recognition or Keyword Spotting. Modern algorithms perform good on
well-established datasets since they either comprise clean data or
simple/homogeneous page layouts. We have collected and annotated 2036 archival
document images from different locations and time periods. The dataset contains
varying page layouts and degradations that challenge text line segmentation
methods. Well established text line segmentation evaluation schemes such as the
Detection Rate or Recognition Accuracy demand for binarized data that is
annotated on a pixel level. Producing ground truth by these means is laborious
and not needed to determine a method's quality. In this paper we propose a new
evaluation scheme that is based on baselines. The proposed scheme has no need
for binarization and it can handle skewed as well as rotated text lines. The
ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on
Layout Analysis for Challenging Medieval Manuscripts used this evaluation
scheme. Finally, we present results achieved by a recently published text line
detection algorithm.Comment: Submitted to DAS201
Exploration of audiovisual heritage using audio indexing technology
This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users
Finding What You Need, and Knowing What You Can Find: Digital Tools for Palaeographers in Musicology and Beyond
This chapter examines three projects that provide musicologists with a range of
resources for managing and exploring their materials: DIAMM (Digital Image Archive
of Medieval Music), CMME (Computerized Mensural Music Editing) and the software
Gamera. Since 1998, DIAMM has been enhancing research of scholars worldwide
by providing them with the best possible quality of digital images. In some cases
these images are now the only access that scholars are permitted, since the original
documents are lost or considered too fragile for further handling. For many sources,
however, simply creating a very high-resolution image is not enough: sources are often
damaged by age, misuse (usually Medieval âvandalismâ), or poor conservation. To deal
with damaged materials the project has developed methods of digital restoration using
mainstream commercial software, which has revealed lost data in a wide variety of
sources. The project also uses light sources ranging from ultraviolet to infrared in
order to obtain better readings of erasures or material lost by heat or water damage.
The ethics of digital restoration are discussed, as well as the concerns of the document
holders. CMME and a database of musical sources and editions, provides scholars with
a tool for making fluid editions and diplomatic transcriptions: without the need for a
single fixed visual form on a printed page, a computerized edition system can utilize
one editorâs transcription to create any number of visual forms and variant versions.
Gamera, a toolkit for building document image recognition systems created by Ichiro
Fujinaga is a broad recognition engine that grew out of music recognition, which can
be adapted and developed to perform a number of tasks on both music and non-musical
materials. Its application to several projects is discussed
The SADC Groundwater Data and Information Archive, Knowledge Sharing and Co-operation Project. Final report
The Southern African Development Community (SADC) Groundwater Data and Information
Archive, Knowledge Sharing and Co-operation Project, funded by the German Development
Cooperation (GIZ) and Department for International Development, UK (DFID), was initiated in
September 2009 to identify, catalogue and subsequently promote access to the large collection of
reports held in the UK by the British Geological Survey (BGS). The work has focused on a
wealth of unpublished so-called âgreyâ data and information which describes groundwater
occurrence and development in Southern Africa and was gathered by the BGS over its many
decades of involvement in the region.
The project has four main aims:
To catalogue and describe the "grey data" documents on SADC groundwater held by the
BGS within a digital metadatabase.
To identify a sub-set of scanned documents to be made freely available to groundwater
practitioners and managers in the SADC region by electronic distribution.
To link the metadatabase and digital sub-set of documents via a web portal hosted by the
BGS, to enable download of documents by SADC groundwater workers.
To strengthen links between BGS hydrogeologists with counterparts in SADC, and
provide an example of groundwater data sharing which could be emulated by other
European Geological Surveys with substantial data holdings on SADC groundwater.
The project has successfully met these aims. The assessment of BGS archived material produced
an electronic meta-database describing 1735 items held in hard copy. Of these, 1041 have been
scanned digitally to searchable Portable Document Format (PDF) format. A subset of 655 PDFs
including partial documents related to groundwater development from the colonial and post
independence period as well as BGS internal project reports and reports approved for web
dissemination by host countries are now available to download (free of charge) at
http://www.SADCgroundwaterarchive.com . Initial results indicate a good deal of interest both
from within SADC and elsewhere, accessed by directly addressing the website and via a search
engine such as Google. The information presented has already been used by in-region projects
such as the SADC Hydrogeological Mapping project and the Malawi Water Assessment Project.
This is essentially a pilot project providing an example of how Web delivery of the archive is an
important step forward for the well-being of the SADC region. It permits access to documents
few even new existed and will, it is hoped, provide a valuable dataset that should inhibit the
temptation to waste scarce resources by âre-inventing the wheelâ
Setting a Bishopric / Arranging an Archive: Traces of Archival Activity in the Bishopric of Alexandria and Antioch
Early Christianity was heir to the archival practice and discourse of Greek and Roman societies, in which public and private archives enjoyed a great deal of consideration. Even before creating their own archives, Christian congregations, when becoming a structured society, adhered to the archival discourse of their times, and the mention of archives in their writings served apologetic and theological aims. The article argues that the main impulse to undertake archival activity came from the new form of leadership, the bishop: alone, or in connections with other colleagues, in particular within the meetings (synods), the bishop produced a huge number of written records, which was to be arranged in archival form. After a brief presentation of the papyrological evidence, the article discusses the traces of ancient episcopal archives detectable in the historiographical and apologetic writings compiled in the main episcopal sees, such as Rome, Alexandria, and Antioch
Access to recorded interviews: A research agenda
Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
- âŚ