2,407 research outputs found

    Multimodality, interactivity, and crowdsourcing for document transcription

    Full text link
    This is the peer reviewed version of the following article: Granell, Emilio, Romero, Verónica, Martínez-Hinarejos, Carlos-D.. (2018). Multimodality, interactivity, and crowdsourcing for document transcription.Computational Intelligence, 34, 2, 398-419. DOI: 10.1111/coin.12169, which has been published in final form at http://doi.org/10.1111/coin.12169.. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.[EN] Knowledge mining from documents usually use document engineering techniques that allow the user to access the information contained in documents of interest. In this framework, transcription may provide efficient access to the contents of handwritten documents. Manual transcription is a time-consuming task that can be sped up by using different mechanisms. A first possibility is employing state-of-the-art handwritten text recognition systems to obtain an initial draft transcription that can be manually amended. A second option is employing crowdsourcing to obtain a massive but not error-free draft transcription. In this case, when collaborators employ mobile devices, speech dictation can be used as a transcription source, and speech and handwritten text recognition can be fused to provide a better draft transcription, which can be amended with even less effort. A final option is using interactive assistive frameworks, where the automatic system that provides the draft transcription and the transcriber cooperate to generate the final transcription. The novel contributions presented in this work include the study of the data fusion on a multimodal crowdsourcing framework and its integration with an interactive system. The use of the proposed solutions reduces the required transcription effort and optimizes the overall performance and usability, allowing for a better transcription process.projects READ, Grant/Award Number: 674943; (European Union's H2020); Smart Ways, Grant/Award Number: RTC-2014-1466-4; (MINECO); CoMUN-HaT, Grant/Award Number: TIN2015-70924-C2-1-R; (MINECO / FEDER)Granell, E.; Romero, V.; Martínez-Hinarejos, C. (2018). Multimodality, interactivity, and crowdsourcing for document transcription. Computational Intelligence. 34(2):398-419. https://doi.org/10.1111/coin.12169S39841934

    Multimodal Crowdsourcing for Transcribing Handwritten Documents

    Full text link
    © 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Transcription of handwritten documents is an important research topic for multiple applications, such as document classification or information extraction. In the case of historical documents, their transcription allows to preserve cultural heritage because of the amount of historical data contained in those documents. The transcription process can employ state-of-the-art handwritten text recognition systems in order to obtain an initial transcription. This transcription is usually not good enough for the quality standards, but that may speed up the final transcription of the expert. In this framework, the use of collaborative transcription applications (crowdsourcing) has risen in the recent years, but these platforms are mainly limited by the use of non-mobile devices. Thus, the recruiting initiatives get reduced to a smaller set of potential volunteers. In this paper, an alternative that allows the use of mobile devices is presented. The proposal consists of using speech dictation of handwritten text lines. Then, by using multimodal combination of speech and handwritten text images, a draft transcription can be obtained, presenting more quality than that obtained by only using handwritten text recognition. The speech dictation platform is implemented as a mobile device application, which allows for a wider range of population for recruiting volunteers. A real acquisition on the contents of a Spanish historical handwritten book was obtained with the platform. This data was used to perform experiments on the behaviour of the proposed framework. Some experiments were performed to study how to optimise the collaborators effort in terms of number of collaborations, including how many lines and which lines should be selected for the speech dictation.This work was supported in part by projects READ-674943 (European Union's H2020), SmartWays-RTC-2014-1466-4 (MINECO), CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER), and ALMAMATER-PROMETEOII/2014/030 (Generalitat Valenciana).Granell Romero, E.; Martínez Hinarejos, CD. (2017). Multimodal Crowdsourcing for Transcribing Handwritten Documents. IEEE/ACM Transactions on Audio, Speech and Language Processing. 25(2):409-419. https://doi.org/10.1109/TASLP.2016.2634123S40941925

    Late multimodal fusion for image and audio music transcription

    Get PDF
    Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios–in which the corresponding single-modality models yield different error rates–showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project, funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding programme (IDIFEDER/2020/003). The first and second authors are respectively supported by grants FPU19/04957 from the Spanish Ministerio de Universidades and APOSTD/2020/256 from Generalitat Valenciana

    Automatic Identification of Addresses: A Systematic Literature Review

    Get PDF
    Cruz, P., Vanneschi, L., Painho, M., & Rita, P. (2022). Automatic Identification of Addresses: A Systematic Literature Review. ISPRS International Journal of Geo-Information, 11(1), 1-27. https://doi.org/10.3390/ijgi11010011 -----------------------------------------------------------------------The work by Leonardo Vanneschi, Marco Painho and Paulo Rita was supported by Fundação para a Ciência e a Tecnologia (FCT) within the Project: UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC). The work by Prof. Leonardo Vanneschi was also partially supported by FCT, Portugal, through funding of project AICE (DSAIPA/DS/0113/2019).Address matching continues to play a central role at various levels, through geocoding and data integration from different sources, with a view to promote activities such as urban planning, location-based services, and the construction of databases like those used in census operations. However, the task of address matching continues to face several challenges, such as non-standard or incomplete address records or addresses written in more complex languages. In order to better understand how current limitations can be overcome, this paper conducted a systematic literature review focused on automated approaches to address matching and their evolution across time. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed, resulting in a final set of 41 papers published between 2002 and 2021, the great majority of which are after 2017, with Chinese authors leading the way. The main findings revealed a consistent move from more traditional approaches to deep learning methods based on semantics, encoder-decoder architectures, and attention mechanisms, as well as the very recent adoption of hybrid approaches making an increased use of spatial constraints and entities. The adoption of evolutionary-based approaches and privacy preserving methods stand as some of the research gaps to address in future studies.publishersversionpublishe

    GUIDELINES FOR THE DESIGN OF ENHANCED, COST EFFECTIVE NETWORKS IN A MANUFACTURING ENVIRONMENT

    Get PDF
    Investigations into the transmission of real-time interactive speech over local area networks (LAN) in an industriai/commerciai environment to eventually obviate the need for a private automatic branch exchange and ultimately prepare the way for a single interactive integrated information system (PS) that provides work stations, which are networked via a LAN, with a fully interactive speech and graphics facility commensurate with the future requirements in computer integrated manufacturing (CIM). The reasons for conducting this programme of research were that existing LANs do not offer a real time interactive speech facility. Any verbal communication between workstation users on the LAN has to be carried out over a telephone network (PABX). This necessitates the provision of a second completely separate network with its associated costs. Initial investigations indicate that there is sufGcient capacity on existing LANs to support both data and real-time speech provided certain data packet delay criteria can be met. Earlier research work (in the late 1980s) has been conducted at Bell Labs and MIT. [Ref 25, 27 & 28], University of Strathclyde [Ref 24] and at BTRL [Ref 22 and 37]. In all of these cases the real time implementation issues were not fijlly addressed. In this thesis the research work reported provides the main criteria for the implementation of real-time interactive speech on both existing and newly installed networks. With such enhanced communication facilities, designers and engineers on the shop floor can be projected into their suppliers, providing a much greater integration between manufacturer and supplier which will be beneficial as Concurrent and Simultaneous Engineering Methodologies are further developed. As a result, various LANs have been evaluated as to their suitability for the transmission of real time interactive speech. As LANs, in general, can be separated into those with either deterministic or stochastic access mechanisms, investigations were carried out into the ability of both the: (i) Token Passing Bus LANs supporting the Manufacturing and Automation Protocol (MAP)—Deterministic and (u) Carrier Sense Multiple Access/Collision Detection (CSMA/CD) LANs supporting the Technical Office Protocol (TOP)— Stochastic to support real time interactive speech, as both are used extensively in commerce and manufacturing. The thesis that real time interactive speech can be transmitted over LANs employed in a computer integrated manufacturing environment has to be moderated following the tests carried out in this work, as follows:- The Token Passing LAN presents no serious problems under normal traffic conditions, however, the CSMA/CD LAN can only be used in relatively light traffic conditions i.e. below 30% of its designed maximum capacity, providing special arrangements are made to minimise the access, transmission and processing delays of speech packets. Given that a certain amount of delay is inevitable in packet switched systems (LANs), investigations have been carried out into techniques for reducing the subjective efifect of speech packet loss on real-time interactive systems due to the unacceptable delays caused by the conditions mentioned above
    • …
    corecore