Search CORE

2,407 research outputs found

Multimodality, interactivity, and crowdsourcing for document transcription

Author: Granell Emilio
Martínez-Hinarejos Carlos-D.
Romero Verónica
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

This is the peer reviewed version of the following article: Granell, Emilio, Romero, Verónica, Martínez-Hinarejos, Carlos-D.. (2018). Multimodality, interactivity, and crowdsourcing for document transcription.Computational Intelligence, 34, 2, 398-419. DOI: 10.1111/coin.12169, which has been published in final form at http://doi.org/10.1111/coin.12169.. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.[EN] Knowledge mining from documents usually use document engineering techniques that allow the user to access the information contained in documents of interest. In this framework, transcription may provide efficient access to the contents of handwritten documents. Manual transcription is a time-consuming task that can be sped up by using different mechanisms. A first possibility is employing state-of-the-art handwritten text recognition systems to obtain an initial draft transcription that can be manually amended. A second option is employing crowdsourcing to obtain a massive but not error-free draft transcription. In this case, when collaborators employ mobile devices, speech dictation can be used as a transcription source, and speech and handwritten text recognition can be fused to provide a better draft transcription, which can be amended with even less effort. A final option is using interactive assistive frameworks, where the automatic system that provides the draft transcription and the transcriber cooperate to generate the final transcription. The novel contributions presented in this work include the study of the data fusion on a multimodal crowdsourcing framework and its integration with an interactive system. The use of the proposed solutions reduces the required transcription effort and optimizes the overall performance and usability, allowing for a better transcription process.projects READ, Grant/Award Number: 674943; (European Union's H2020); Smart Ways, Grant/Award Number: RTC-2014-1466-4; (MINECO); CoMUN-HaT, Grant/Award Number: TIN2015-70924-C2-1-R; (MINECO / FEDER)Granell, E.; Romero, V.; Martínez-Hinarejos, C. (2018). Multimodality, interactivity, and crowdsourcing for document transcription. Computational Intelligence. 34(2):398-419. https://doi.org/10.1111/coin.12169S39841934

RiuNet

Multimodal Crowdsourcing for Transcribing Handwritten Documents

Author: Granell Romero Emilio
Martínez Hinarejos Carlos David
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/02/2017
Field of study

© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.[EN] Transcription of handwritten documents is an important research topic for multiple applications, such as document classification or information extraction. In the case of historical documents, their transcription allows to preserve cultural heritage because of the amount of historical data contained in those documents. The transcription process can employ state-of-the-art handwritten text recognition systems in order to obtain an initial transcription. This transcription is usually not good enough for the quality standards, but that may speed up the final transcription of the expert. In this framework, the use of collaborative transcription applications (crowdsourcing) has risen in the recent years, but these platforms are mainly limited by the use of non-mobile devices. Thus, the recruiting initiatives get reduced to a smaller set of potential volunteers. In this paper, an alternative that allows the use of mobile devices is presented. The proposal consists of using speech dictation of handwritten text lines. Then, by using multimodal combination of speech and handwritten text images, a draft transcription can be obtained, presenting more quality than that obtained by only using handwritten text recognition. The speech dictation platform is implemented as a mobile device application, which allows for a wider range of population for recruiting volunteers. A real acquisition on the contents of a Spanish historical handwritten book was obtained with the platform. This data was used to perform experiments on the behaviour of the proposed framework. Some experiments were performed to study how to optimise the collaborators effort in terms of number of collaborations, including how many lines and which lines should be selected for the speech dictation.This work was supported in part by projects READ-674943 (European Union's H2020), SmartWays-RTC-2014-1466-4 (MINECO), CoMUN-HaT-TIN2015-70924-C2-1-R (MINECO/FEDER), and ALMAMATER-PROMETEOII/2014/030 (Generalitat Valenciana).Granell Romero, E.; Martínez Hinarejos, CD. (2017). Multimodal Crowdsourcing for Transcribing Handwritten Documents. IEEE/ACM Transactions on Audio, Speech and Language Processing. 25(2):409-419. https://doi.org/10.1109/TASLP.2016.2634123S40941925

Crossref

RiuNet

Late multimodal fusion for image and audio music transcription

Author: Alfaro-Contreras María
Calvo-Zaragoza Jorge
Iñesta José M.
Valero-Mas Jose J.
Publication venue: 'Elsevier BV'
Publication date: 26/08/2022
Field of study

Music transcription, which deals with the conversion of music sources into a structured digital format, is a key problem for Music Information Retrieval (MIR). When addressing this challenge in computational terms, the MIR community follows two lines of research: music documents, which is the case of Optical Music Recognition (OMR), or audio recordings, which is the case of Automatic Music Transcription (AMT). The different nature of the aforementioned input data has conditioned these fields to develop modality-specific frameworks. However, their recent definition in terms of sequence labeling tasks leads to a common output representation, which enables research on a combined paradigm. In this respect, multimodal image and audio music transcription comprises the challenge of effectively combining the information conveyed by image and audio modalities. In this work, we explore this question at a late-fusion level: we study four combination approaches in order to merge, for the first time, the hypotheses regarding end-to-end OMR and AMT systems in a lattice-based search space. The results obtained for a series of performance scenarios–in which the corresponding single-modality models yield different error rates–showed interesting benefits of these approaches. In addition, two of the four strategies considered significantly improve the corresponding unimodal standard recognition frameworks.This paper is part of the I+D+i PID2020-118447RA-I00 (MultiScore) project, funded by MCIN/AEI/10.13039/501100011033. Some of the computing resources were provided by the Generalitat Valenciana and the European Union through the FEDER funding programme (IDIFEDER/2020/003). The first and second authors are respectively supported by grants FPU19/04957 from the Spanish Ministerio de Universidades and APOSTD/2020/256 from Generalitat Valenciana

Repositorio Institucional de la Universidad de Alicante

arXiv.org e-Print Archive

A Review of the methods, applications, and challenges or adopting artificial intelligence in the property assessment office

Author: McCord Michael
Publication venue: International Association of Assessing Officers
Publication date: 01/01/2022
Field of study

Ulster University's Research Portal

Automatic Identification of Addresses: A Systematic Literature Review

Author: Cruz Paula
Painho Marco
Rita Paulo
Vanneschi Leonardo
Publication venue: 'MDPI AG'
Publication date: 01/12/2021
Field of study

Cruz, P., Vanneschi, L., Painho, M., & Rita, P. (2022). Automatic Identification of Addresses: A Systematic Literature Review. ISPRS International Journal of Geo-Information, 11(1), 1-27. https://doi.org/10.3390/ijgi11010011 -----------------------------------------------------------------------The work by Leonardo Vanneschi, Marco Painho and Paulo Rita was supported by Fundação para a Ciência e a Tecnologia (FCT) within the Project: UIDB/04152/2020—Centro de Investigação em Gestão de Informação (MagIC). The work by Prof. Leonardo Vanneschi was also partially supported by FCT, Portugal, through funding of project AICE (DSAIPA/DS/0113/2019).Address matching continues to play a central role at various levels, through geocoding and data integration from different sources, with a view to promote activities such as urban planning, location-based services, and the construction of databases like those used in census operations. However, the task of address matching continues to face several challenges, such as non-standard or incomplete address records or addresses written in more complex languages. In order to better understand how current limitations can be overcome, this paper conducted a systematic literature review focused on automated approaches to address matching and their evolution across time. The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were followed, resulting in a final set of 41 papers published between 2002 and 2021, the great majority of which are after 2017, with Chinese authors leading the way. The main findings revealed a consistent move from more traditional approaches to deep learning methods based on semantics, encoder-decoder architectures, and attention mechanisms, as well as the very recent adoption of hybrid approaches making an increased use of spatial constraints and entities. The adoption of evolutionary-based approaches and privacy preserving methods stand as some of the research gaps to address in future studies.publishersversionpublishe

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Repositório da Universidade Nova de Lisboa

Acquiring Planning Models from Narrative Synopses

Author: Hayton Thomas
Publication venue
Publication date: 01/12/2019
Field of study

Teeside University's Research Repository

Recommended from our members

The design of speech-based automated mobile phone services using interface metaphors

Author: Howell Mark David
Publication venue: Brunel University, School of Information Systems, Computing and Mathematics
Publication date: 01/01/2004
Field of study

This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.Interface metaphor is a widely used design technique for interactive computer systems. The advantages of using interface metaphors derive from their ability to promote active learning, which enables a user to transfer knowledge from a familiar real world domain, to an unfamiliar computing domain. Interface metaphor is not currently used for the design of automated phone services, and it was the aim of this thesis to examine whether interface metaphor could improve the usability of speech-activated automated mobile phone services. A human-centred design methodology was followed to generate, select, and develop potential metaphors, which were used to implement metaphor-based phone services. An experimental methodology was then used to compare the usability of the metaphor-based services with the usability of currently available number-based phone services. The first experiment examined the effect of three different interface metaphors on the usability of a mobile city guide service. Usability was measured as a range of performance and attitude measures, and was supplemented by telephone interview data. After three consecutive days of usage, participants both preferred, and performed better with, the service that was based on an office filing system metaphor. Experiment two was conducted over a six week period, and investigated the effect of users' individual differences, and the context of use, on the usability of both the office filing system metaphor-based service, and a non-metaphor service. The results showed that performance with the metaphor-based service was significantly better than performance with the non-metaphor service. The usability of the metaphor-based service was not significantly affected by users' individual characteristics and aptitudes, whereas the number-based service was, suggesting that metaphor-based services may be more usable for a wider range of potential users. Usability levels for both services were found to be consistent across both private and public locations of use, suggesting that speech-activated mobile phone services provide a flexible means of information access. Experiment three investigated the strategies used by participants when interacting with mobile phone services, specifically the visualisation strategy that was used by two thirds of the metaphor-based service participants in experiment two. In addition to the attitude and performance measures used for experiments one and two, face-to face interviews were conducted with participants. The results indicated that significantly more participants visualised the metaphor-based services relative to a non-metaphor service, and that visualisation of the service structure led to significant performance improvements. This thesis has demonstrated the usability benefits of interface metaphor as a design technique for speech-based mobile phone services. These benefits of metaphor appear to derive from their ability to provide a mental model of the phone service that can be visualised, and their ability to accommodate the individual differences of users

Brunel University Research Archive

Emotional Intelligence and Academic Achievement in Primary Schools: Development and Initial Evaluation of the EMIL Programme

Author: Nelson Catherine
Publication venue
Publication date: 11/09/2021
Field of study

University of South Wales Research Explorer

GUIDELINES FOR THE DESIGN OF ENHANCED, COST EFFECTIVE NETWORKS IN A MANUFACTURING ENVIRONMENT

Author: LINES BENN MICHAEL
Publication venue: 'University of Plymouth'
Publication date: 01/01/1994
Field of study

Investigations into the transmission of real-time interactive speech over local area networks (LAN) in an industriai/commerciai environment to eventually obviate the need for a private automatic branch exchange and ultimately prepare the way for a single interactive integrated information system (PS) that provides work stations, which are networked via a LAN, with a fully interactive speech and graphics facility commensurate with the future requirements in computer integrated manufacturing (CIM). The reasons for conducting this programme of research were that existing LANs do not offer a real time interactive speech facility. Any verbal communication between workstation users on the LAN has to be carried out over a telephone network (PABX). This necessitates the provision of a second completely separate network with its associated costs. Initial investigations indicate that there is sufGcient capacity on existing LANs to support both data and real-time speech provided certain data packet delay criteria can be met. Earlier research work (in the late 1980s) has been conducted at Bell Labs and MIT. [Ref 25, 27 & 28], University of Strathclyde [Ref 24] and at BTRL [Ref 22 and 37]. In all of these cases the real time implementation issues were not fijlly addressed. In this thesis the research work reported provides the main criteria for the implementation of real-time interactive speech on both existing and newly installed networks. With such enhanced communication facilities, designers and engineers on the shop floor can be projected into their suppliers, providing a much greater integration between manufacturer and supplier which will be beneficial as Concurrent and Simultaneous Engineering Methodologies are further developed. As a result, various LANs have been evaluated as to their suitability for the transmission of real time interactive speech. As LANs, in general, can be separated into those with either deterministic or stochastic access mechanisms, investigations were carried out into the ability of both the: (i) Token Passing Bus LANs supporting the Manufacturing and Automation Protocol (MAP)—Deterministic and (u) Carrier Sense Multiple Access/Collision Detection (CSMA/CD) LANs supporting the Technical Office Protocol (TOP)— Stochastic to support real time interactive speech, as both are used extensively in commerce and manufacturing. The thesis that real time interactive speech can be transmitted over LANs employed in a computer integrated manufacturing environment has to be moderated following the tests carried out in this work, as follows:- The Token Passing LAN presents no serious problems under normal traffic conditions, however, the CSMA/CD LAN can only be used in relatively light traffic conditions i.e. below 30% of its designed maximum capacity, providing special arrangements are made to minimise the access, transmission and processing delays of speech packets. Given that a certain amount of delay is inevitable in packet switched systems (LANs), investigations have been carried out into techniques for reducing the subjective efifect of speech packet loss on real-time interactive systems due to the unacceptable delays caused by the conditions mentioned above

Plymouth Electronic Archive and Research Library

OpenGrey Repository