106,004 research outputs found

    Detecting Authorship, Hands, and Corrections in Historical Manuscripts. A Mixedmethods Approach towards the Unpublished Writings of an 18th Century Czech EmigrĂŠ Community in Berlin (Handwriting)

    Full text link
    When one starts working philologically with historical manuscripts, one faces important first questions involving authorship, writers’ hands andthe history of documenttransmission. These issues are especially thorny with documents remaining outside the established canon, such as privatemanuscripts, aboutwhichwehave very restrictedtext-externalinformation. In this area – so we argue – it is especially fruitful to employ a mixed-methods approach, combiningtailored automatic methods from image recognition/analysis with philological and linguistic knowledge.Whileimage analysis captureswriters’ hands, linguistic/philological research mainly addressestextual authorship;thetwo cross-fertilize and obtain a coherent interpretation which may then be evaluated against the available text-external historical evidence. Departingfrom our ‘lab case’,whichis a corpus of unedited Czechmanuscriptsfromthe archive of a small 18th century migrant community, the Herrnhuter Brüdergemeine (Brethren parish) in Berlin-Neukölln, our project has developed an assistance system which aids philologists in working with digitized (scanned) hand-written historical sources. We present its application and discuss its general potential and methodological implications

    READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

    Full text link
    Text line detection is crucial for any application associated with Automatic Text Recognition or Keyword Spotting. Modern algorithms perform good on well-established datasets since they either comprise clean data or simple/homogeneous page layouts. We have collected and annotated 2036 archival document images from different locations and time periods. The dataset contains varying page layouts and degradations that challenge text line segmentation methods. Well established text line segmentation evaluation schemes such as the Detection Rate or Recognition Accuracy demand for binarized data that is annotated on a pixel level. Producing ground truth by these means is laborious and not needed to determine a method's quality. In this paper we propose a new evaluation scheme that is based on baselines. The proposed scheme has no need for binarization and it can handle skewed as well as rotated text lines. The ICDAR 2017 Competition on Baseline Detection and the ICDAR 2017 Competition on Layout Analysis for Challenging Medieval Manuscripts used this evaluation scheme. Finally, we present results achieved by a recently published text line detection algorithm.Comment: Submitted to DAS201

    Exploration of audiovisual heritage using audio indexing technology

    Get PDF
    This paper discusses audio indexing tools that have been implemented for the disclosure of Dutch audiovisual cultural heritage collections. It explains the role of language models and their adaptation to historical settings and the adaptation of acoustic models for homogeneous audio collections. In addition to the benefits of cross-media linking, the requirements for successful tuning and improvement of available tools for indexing the heterogeneous A/V collections from the cultural heritage domain are reviewed. And finally the paper argues that research is needed to cope with the varying information needs for different types of users

    Finding What You Need, and Knowing What You Can Find: Digital Tools for Palaeographers in Musicology and Beyond

    Get PDF
    This chapter examines three projects that provide musicologists with a range of resources for managing and exploring their materials: DIAMM (Digital Image Archive of Medieval Music), CMME (Computerized Mensural Music Editing) and the software Gamera. Since 1998, DIAMM has been enhancing research of scholars worldwide by providing them with the best possible quality of digital images. In some cases these images are now the only access that scholars are permitted, since the original documents are lost or considered too fragile for further handling. For many sources, however, simply creating a very high-resolution image is not enough: sources are often damaged by age, misuse (usually Medieval ‘vandalism’), or poor conservation. To deal with damaged materials the project has developed methods of digital restoration using mainstream commercial software, which has revealed lost data in a wide variety of sources. The project also uses light sources ranging from ultraviolet to infrared in order to obtain better readings of erasures or material lost by heat or water damage. The ethics of digital restoration are discussed, as well as the concerns of the document holders. CMME and a database of musical sources and editions, provides scholars with a tool for making fluid editions and diplomatic transcriptions: without the need for a single fixed visual form on a printed page, a computerized edition system can utilize one editor’s transcription to create any number of visual forms and variant versions. Gamera, a toolkit for building document image recognition systems created by Ichiro Fujinaga is a broad recognition engine that grew out of music recognition, which can be adapted and developed to perform a number of tasks on both music and non-musical materials. Its application to several projects is discussed

    The SADC Groundwater Data and Information Archive, Knowledge Sharing and Co-operation Project. Final report

    Get PDF
    The Southern African Development Community (SADC) Groundwater Data and Information Archive, Knowledge Sharing and Co-operation Project, funded by the German Development Cooperation (GIZ) and Department for International Development, UK (DFID), was initiated in September 2009 to identify, catalogue and subsequently promote access to the large collection of reports held in the UK by the British Geological Survey (BGS). The work has focused on a wealth of unpublished so-called “grey” data and information which describes groundwater occurrence and development in Southern Africa and was gathered by the BGS over its many decades of involvement in the region. The project has four main aims: To catalogue and describe the "grey data" documents on SADC groundwater held by the BGS within a digital metadatabase. To identify a sub-set of scanned documents to be made freely available to groundwater practitioners and managers in the SADC region by electronic distribution. To link the metadatabase and digital sub-set of documents via a web portal hosted by the BGS, to enable download of documents by SADC groundwater workers. To strengthen links between BGS hydrogeologists with counterparts in SADC, and provide an example of groundwater data sharing which could be emulated by other European Geological Surveys with substantial data holdings on SADC groundwater. The project has successfully met these aims. The assessment of BGS archived material produced an electronic meta-database describing 1735 items held in hard copy. Of these, 1041 have been scanned digitally to searchable Portable Document Format (PDF) format. A subset of 655 PDFs including partial documents related to groundwater development from the colonial and post independence period as well as BGS internal project reports and reports approved for web dissemination by host countries are now available to download (free of charge) at http://www.SADCgroundwaterarchive.com . Initial results indicate a good deal of interest both from within SADC and elsewhere, accessed by directly addressing the website and via a search engine such as Google. The information presented has already been used by in-region projects such as the SADC Hydrogeological Mapping project and the Malawi Water Assessment Project. This is essentially a pilot project providing an example of how Web delivery of the archive is an important step forward for the well-being of the SADC region. It permits access to documents few even new existed and will, it is hoped, provide a valuable dataset that should inhibit the temptation to waste scarce resources by ‘re-inventing the wheel’

    Setting a Bishopric / Arranging an Archive: Traces of Archival Activity in the Bishopric of Alexandria and Antioch

    Get PDF
    Early Christianity was heir to the archival practice and discourse of Greek and Roman societies, in which public and private archives enjoyed a great deal of consideration. Even before creating their own archives, Christian congregations, when becoming a structured society, adhered to the archival discourse of their times, and the mention of archives in their writings served apologetic and theological aims. The article argues that the main impulse to undertake archival activity came from the new form of leadership, the bishop: alone, or in connections with other colleagues, in particular within the meetings (synods), the bishop produced a huge number of written records, which was to be arranged in archival form. After a brief presentation of the papyrological evidence, the article discusses the traces of ancient episcopal archives detectable in the historiographical and apologetic writings compiled in the main episcopal sees, such as Rome, Alexandria, and Antioch

    Access to recorded interviews: A research agenda

    Get PDF
    Recorded interviews form a rich basis for scholarly inquiry. Examples include oral histories, community memory projects, and interviews conducted for broadcast media. Emerging technologies offer the potential to radically transform the way in which recorded interviews are made accessible, but this vision will demand substantial investments from a broad range of research communities. This article reviews the present state of practice for making recorded interviews available and the state-of-the-art for key component technologies. A large number of important research issues are identified, and from that set of issues, a coherent research agenda is proposed
    • …
    corecore