51 research outputs found

    Evaluating cross-language annotation transfer in the MultiSemCor corpus

    Full text link
    In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resource-poor) languages with greatly reduced human effort.

    Cross-Language Question Re-Ranking

    Full text link
    We study how to find relevant questions in community forums when the language of the new questions is different from that of the existing questions in the forum. In particular, we explore the Arabic-English language pair. We compare a kernel-based system with a feed-forward neural network in a scenario where a large parallel corpus is available for training a machine translation system, bilingual dictionaries, and cross-language word embeddings. We observe that both approaches degrade the performance of the system when working on the translated text, especially the kernel-based system, which depends heavily on a syntactic kernel. We address this issue using a cross-language tree kernel, which compares the original Arabic tree to the English trees of the related questions. We show that this kernel almost closes the performance gap with respect to the monolingual system. On the neural network side, we use the parallel corpus to train cross-language embeddings, which we then use to represent the Arabic input and the English related questions in the same space. The results also improve to close to those of the monolingual neural network. Overall, the kernel system shows a better performance compared to the neural network in all cases.Comment: SIGIR-2017; Community Question Answering; Cross-language Approaches; Question Retrieval; Kernel-based Methods; Neural Networks; Distributed Representation

    H2020 692819 SIMPATICO - D1.5: Ethics compliance report

    No full text
    This document is the deliverable “D1.5 – SIMPATICO Ethical compliance report” of the European project “SIMPATICO - SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies” (hereinafter also referred to as “SIMPATICO”, project reference: 692819). SIMPATICO addresses a strategic challenge towards the innovation and modernization of the public sector: the need to offer a more efficient and more effective experience to companies and citizens in their daily interaction with Public Administration (PA) by: (i) offering a personalized delivery of PA online services; (ii) enabling a better comprehension of the complex processes and documents (forms, regulations, etc.) behind these services; (iii) engaging citizens to improve the administration processes and services. Several ethical and data protection aspects should be taken into account due the involvement of public/private stakeholders and citizens and due to the necessity to collect, store and process personal data. Starting from a brief illustration of the SIMPATICO project, and of the ethical concerns raised by the project activities, this report describes the procedures adopted to ensure compliance with ethical requirements. In particular, the project has appointed an Ethics Advisory Board that is responsible of providing advices and coordinating activities for what concerns the fulfilment of the ethical obligations of SIMPATICO. Moreover, the relevant ethical and legal regulations have been identified and analysed, both at the national and European level, and recommendations and guidelines have been derived. This report does not cover the concerns related to data management, as they are the focus of a dedicated deliverable – namely report “D1.3 Data Management Plan v.1” (M6). This version is an update of D1.5 produced at project month M14, after the detailed definition of the project use-cases and the revision of the ethics-related aspects of the project by the Ethics Advisory Board

    H2020 692819 SIMPATICO - D1.1: Project Management Plan

    No full text
    This document is the deliverable “D1.1 – Project Management Plan” of the European project “SIMPATICO - SIMplifying the interaction with Public Administration Through Information technology for Citizens and cOmpanies” (hereinafter also referred to as “SIMPATICO”, project reference: 692819). The SIMPATICO Project Management Plan (PMP) is the main planning document and describes how major aspects of the project are managed, monitored and controlled. It is intended to provide guidance and direction for specific management, planning, and control activities such as schedule, cost, risk, communication, quality, etc. The focus of this document is to describe the approaches being taken in the project to manage the various work packages, share and store documents, communicate among consortium members, control the quality of project deliverables, identify and mitigate risks associated with the project. Benefits of creating a Project Management Plan include: · clearly define roles, responsibilities, processes and activities; · increase probability that projects will complete on-time, within budget, and with high degree of quality; · ensuring understanding of what was agreed upon; · helping project teams identify and plan for how project activities will be managed (budget, quality, schedule, etc.). The PMP is a living document and should be updated continuously throughout the project. The main updates will concern project KPIs, project risks and ethics concerns, which will be regularly updated on the meetings of the Project Management Board (PMB). After approval by the PMB. the updated PMP will be uploaded in the SIMPATICO website. The final update of the Project Management Plan will be released at M36

    Evaluating Cross-Language Annotation Transfer

    No full text
    In this paper we illustrate and evaluate an approach to the creation of high quality linguistically annotated resources based on the exploitation of aligned parallel corpora. This approach is based on the assumption that if a text in one language has been annotated and its translation has not, annotations can be transferred from the source text to the target using word alignment as a bridge. The transfer approach has been tested in the creation of the MultiSemCor corpus, an English/Italian parallel corpus created on the basis of the English SemCor corpus. In MultiSemCor texts are aligned at the word level and semantically annotated with a shared inventory of senses. We present some experiments carried out to evaluate the different steps involved in the methodology. The results of the evaluation suggest that the cross-language annotation transfer methodology is a promising solution allowing for the exploitation of existing (mostly English) annotated resources to bootstrap the creation of annotated corpora in new (resourcepoor) languages with greatly reduced human effort.

    Manual word alignment guidelines for the MultiSemCor project

    No full text
    These guidelines are used for the manual alignment at word level of English-Italian parallel texts. The task is carried out in the context of the development of a full-text word alignment system which is used in the MultiSemCor projec
    corecore