4 research outputs found

    Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation

    Full text link
    [EN] Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian-Spanish and Italian-Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.This research is partially funded by the EU FP7 PortDial Project No. 296170, FP7 SpeDial Project No. 611396, and Spanish contract TIN2014-54288-C4-3-R. The work presented in this paper was carried out while the author was affiliated with Universitat Politecnica de Valencia.Stepanov, EA.; Chowdhury, SA.; Bayer, AO.; Ghosh, A.; Klasinas, I.; Calvo Lance, M.; Sanchís Arnal, E.... (2018). Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation. Language Resources and Evaluation. 52(1):341-364. https://doi.org/10.1007/s10579-017-9396-5S34136452

    Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation

    No full text
    Summarization: Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian–Spanish and Italian–Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.Presented on: Language Resources and Evaluatio

    Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation

    No full text
    [EN] Modern data-driven spoken language systems (SLS) require manual semantic annotation for training spoken language understanding parsers. Multilingual porting of SLS demands significant manual effort and language resources, as this manual annotation has to be replicated. Crowdsourcing is an accessible and cost-effective alternative to traditional methods of collecting and annotating data. The application of crowdsourcing to simple tasks has been well investigated. However, complex tasks, like cross-language semantic annotation transfer, may generate low judgment agreement and/or poor performance. The most serious issue in cross-language porting is the absence of reference annotations in the target language; thus, crowd quality control and the evaluation of the collected annotations is difficult. In this paper we investigate targeted crowdsourcing for semantic annotation transfer that delegates to crowds a complex task such as segmenting and labeling of concepts taken from a domain ontology; and evaluation using source language annotation. To test the applicability and effectiveness of the crowdsourced annotation transfer we have considered the case of close and distant language pairs: Italian-Spanish and Italian-Greek. The corpora annotated via crowdsourcing are evaluated against source and target language expert annotations. We demonstrate that the two evaluation references (source and target) highly correlate with each other; thus, drastically reduce the need for the target language reference annotations.This research is partially funded by the EU FP7 PortDial Project No. 296170, FP7 SpeDial Project No. 611396, and Spanish contract TIN2014-54288-C4-3-R. The work presented in this paper was carried out while the author was affiliated with Universitat Politecnica de Valencia.Stepanov, EA.; Chowdhury, SA.; Bayer, AO.; Ghosh, A.; Klasinas, I.; Calvo Lance, M.; Sanchís Arnal, E.... (2018). Cross-language transfer of semantic annotation via targeted crowdsourcing: task design and evaluation. Language Resources and Evaluation. 52(1):341-364. https://doi.org/10.1007/s10579-017-9396-534136452

    Speech understanding for spoken dialogue systems: from corpus harvesting to grammar rule induction

    No full text
    Summarization: We investigate algorithms and tools for the semi-automatic authoring of grammars for spoken dialogue systems (SDS) proposing a framework that spans from corpora creation to grammar induction algorithms. A realistic human-in-the-loop approach is followed balancing automation and human intervention to optimize cost to performance ratio for grammar development. Web harvesting is the main approach investigated for eliciting spoken dialogue textual data, while crowdsourcing is also proposed as an alternative method. Several techniques are presented for constructing web queries and filtering the acquired corpora. We also investigate how the harvested corpora can be used for the automatic and semi-automatic (human-in-the-loop) induction of grammar rules. SDS grammar rules and induction algorithms are grouped into two types, namely, low- and high-level. Two families of algorithms are investigated for rule induction: one based on semantic similarity and distributional semantic models, and the other using more traditional statistical modeling approaches (e.g., slot-filling algorithms using Conditional Random Fields). Evaluation results are presented for two domains and languages. High-level induction precision scores up to 60% are obtained. Results advocate the portability of the proposed features and algorithms across languages and domains.Presented on
    corecore