6 research outputs found

    RIGOTRIO at SemEval-2017 Task 9: Combining Machine Learning and Grammar Engineering for AMR Parsing and Generation

    Get PDF
    By addressing both text-to-AMR parsing and AMR-to-text generation, SemEval-2017 Task 9 established AMR as a powerful semantic interlingua. We strengthen the interlingual aspect of AMR by applying the multilingual Grammatical Framework (GF) for AMR-to-text generation. Our current rule-based GF approach completely covered only 12.3% of the test AMRs, therefore we combined it with state-of-the-art JAMR Generator to see if the combination increases or decreases the overall performance. The combined system achieved the automatic BLEU score of 18.82 and the human Trueskill score of 107.2, to be compared to the plain JAMR Generator results. As for AMR parsing, we added NER extensions to our SemEval-2016 general-domain AMR parser to handle the biomedical genre, rich in organic compound names, achieving Smatch F1=54.0%

    Character-level neural translation for multilingual media monitoring in the SUMMA project

    Get PDF
    The paper steps outside the comfort-zone of the traditional NLP tasks like automatic speech recognition (ASR) and machine translation (MT) to addresses two novel problems arising in the automated multilingual news monitoring: segmentation of the TV and radio program ASR transcripts into individual stories, and clustering of the individual stories coming from various sources and languages into storylines. Storyline clustering of stories covering the same events is an essential task for inquisitorial media monitoring. We address these two problems jointly by engaging the low-dimensional semantic representation capabilities of the sequence to sequence neural translation models. To enable joint multi-task learning for multilingual neural translation of morphologically rich languages we replace the attention mechanism with the sliding-window mechanism and operate the sequence to sequence neural translation model on the character-level rather than on the word-level. The story segmentation and storyline clustering problem is tackled by examining the low-dimensional vectors produced as a side-product of the neural translation process. The results of this paper describe a novel approach to the automatic story segmentation and storyline clustering problem.Comment: LREC-2016 submissio

    dBaby: Grounded Language Teaching through Games and Efficient Reinforcement Learning

    Get PDF
    This paper outlines a project proposal to be submitted to EC H2020 call ICT-29-2018. The purpose of the project is to create a digital Baby (dBaby) - an agent perceiving and interacting with the 3D world and communicating with its Teacher via natural language phrases to achieve the goals set by the Teacher. The novelty of the approach is that neither language nor visual capabilities are hard-coded in dBaby - instead, the Teacher defines a language learning Game grounded in the 3D world, and dBaby learns the language as a byproduct of the reinforcement learning from the raw pixels and character strings while maximizing the rewards in the Game. So far such approach successfully has been demonstrated only in the virtual 3D world with pre-programmed Games where it requires millions of episodes to learn a dozen words. Moving to human Teacher and real 3D environment requires an order-of-magnitude improvement to data-efficiency of the reinforcement learning. A novel Episodic Control based pre-training is demonstrated as a promising approach for bootstrapping the data-efficient reinforcement learning

    SUMMA: Integrating Multiple NLP Technologies into an Open-source Platform for Multilingual Media Monitoring

    Get PDF
    The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. It assembles numerous state-of-the-art NLP technologies into a fully automated media ingestion pipeline that can record live broadcasts, detect and transcribe spoken content, translate from several languages (original text or transcribed speech) into English,1 recognize Named Entities, detect topics, cluster and summarize documents across language barriers, and extract and store factual claims in these news items. This paper describes the intended use cases and discusses the system design decisions that allowed us to integrate state-of-theart NLP modules into an effective workflow with comparatively little effort

    The SUMMA Platform:A Scalable Infrastructure for Multi-lingual Multi-media Monitoring

    Get PDF
    The open-source SUMMA Platform is a highly scalable distributed architecture for monitoring a large number of media broadcasts in parallel, with a lag behind actual broadcast time of at most a few minutes. The Platform offers a fully automated media ingestion pipeline capable of recording live broadcasts, detection and transcription of spoken content, translation of all text (original or transcribed) into English, recognition and linking of Named Entities, topic detection, clustering and crosslingual multi-document summarization of related media items, and last but not least, extraction and storage of factual claims in these news items. Browser-based graphical user interfaces provide humans with aggregated information as well as structured access to individual news items stored in the Platform’s database. This paper describes the intended use cases and provides an overview over the system’s implementation
    corecore