8 research outputs found

    Photo2Video: Semantic-Aware Deep Learning-Based Video Generation from Still Content

    Get PDF
    Applying machine learning (ML), and especially deep learning, to understand visual content is becoming common practice in many application areas. However, little attention has been given to its use within the multimedia creative domain. It is true that ML is already popular for content creation, but the progress achieved so far addresses essentially textual content or the identification and selection of specific types of content. A wealth of possibilities are yet to be explored by bringing the use of ML into the multimedia creative process, allowing the knowledge inferred by the former to influence automatically how new multimedia content is created. The work presented in this article provides contributions in three distinct ways towards this goal: firstly, it proposes a methodology to re-train popular neural network models in identifying new thematic concepts in static visual content and attaching meaningful annotations to the detected regions of interest; secondly, it presents varied visual digital effects and corresponding tools that can be automatically called upon to apply such effects in a previously analyzed photo; thirdly, it defines a complete automated creative workflow, from the acquisition of a photograph and corresponding contextual data, through the ML region-based annotation, to the automatic application of digital effects and generation of a semantically aware multimedia story driven by the previously derived situational and visual contextual data. Additionally, it presents a variant of this automated workflow by offering to the user the possibility of manipulating the automatic annotations in an assisted manner. The final aim is to transform a static digital photo into a short video clip, taking into account the information acquired. The final result strongly contrasts with current standard approaches of creating random movements, by implementing an intelligent content- and context-aware video.The work presented in this paper has been supported by the European Commission under contract number H2020-ICT-20-2017-1-RIA-780612 and by National Funds through the Portuguese funding agency, FCT—Fundação para a Ciência e a Tecnologia, within project LA/P/0063/2020.info:eu-repo/semantics/publishedVersio

    Evaluating the State-of-the-Art of End-to-End Natural Language Generation: The E2E NLG Challenge

    Full text link
    This paper provides a comprehensive analysis of the first shared task on End-to-End Natural Language Generation (NLG) and identifies avenues for future research based on the results. This shared task aimed to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity and diverse discourse phenomena. Introducing novel automatic and human metrics, we compare 62 systems submitted by 17 institutions, covering a wide range of approaches, including machine learning architectures -- with the majority implementing sequence-to-sequence models (seq2seq) -- as well as systems based on grammatical rules and templates. Seq2seq-based systems have demonstrated a great potential for NLG in the challenge. We find that seq2seq systems generally score high in terms of word-overlap metrics and human evaluations of naturalness -- with the winning SLUG system (Juraska et al., 2018) being seq2seq-based. However, vanilla seq2seq models often fail to correctly express a given meaning representation if they lack a strong semantic control mechanism applied during decoding. Moreover, seq2seq models can be outperformed by hand-engineered systems in terms of overall quality, as well as complexity, length and diversity of outputs. This research has influenced, inspired and motivated a number of recent studies outwith the original competition, which we also summarise as part of this paper.Comment: Computer Speech and Language, final accepted manuscript (in press

    Flavor text generation for role-playing video games

    Get PDF

    Self-supervised learning in natural language processing

    Get PDF
    Most natural language processing (NLP) learning algorithms require labeled data. While this is given for a select number of (mostly English) tasks, the availability of labeled data is sparse or non-existent for the vast majority of use-cases. To alleviate this, unsupervised learning and a wide array of data augmentation techniques have been developed (Hedderich et al., 2021a). However, unsupervised learning often requires massive amounts of unlabeled data and also fails to perform in difficult (low-resource) data settings, i.e., if there is an increased distance between the source and target data distributions (Kim et al., 2020). This distributional distance can be the case if there is a domain drift or large linguistic distance between the source and target data. Unsupervised learning in itself does not exploit the highly informative (labeled) supervisory signals hidden in unlabeled data. In this dissertation, we show that by combining the right unsupervised auxiliary task (e.g., sentence pair extraction) with an appropriate primary task (e.g., machine translation), self-supervised learning can exploit these hidden supervisory signals more efficiently than purely unsupervised approaches, while functioning on less labeled data than supervised approaches. Our self-supervised learning approach can be used to learn NLP tasks in an efficient manner, even when the amount of training data is sparse or the data comes with strong differences in its underlying distribution, e.g., stemming from unrelated languages. For our general approach, we applied unsupervised learning as an auxiliary task to learn a supervised primary task. Concretely, we have focused on the auxiliary task of sentence pair extraction for sequence-to-sequence primary tasks (i.e., machine translation and style transfer) as well as language modeling, clustering, subspace learning and knowledge integration for primary classification tasks (i.e., hate speech detection and sentiment analysis). For sequence-to-sequence tasks, we show that self-supervised neural machine translation (NMT) achieves competitive results on high-resource language pairs in comparison to unsupervised NMT while requiring less data. Further combining self-supervised NMT with unsupervised NMT-inspired augmentation techniques makes the learning of low-resource (similar, distant and unrelated) language pairs possible. Further, using our self-supervised approach, we show how style transfer can be learned without the need for parallel data, generating stylistic rephrasings of highest overall performance on all tested tasks. For sequence-to-label tasks, we underline the benefit of auxiliary task-based augmentation over primary task augmentation. An auxiliary task that showed to be especially beneficial to the primary task performance was subspace learning, which led to impressive gains in (cross-lingual) zero-shot classification performance on similar or distant target tasks, also on similar, distant and unrelated languages.Die meisten Lernalgorithmen der Computerlingistik (CL) benötigen gelabelte Daten. Diese sind zwar für eine Auswahl an (hautpsächlich Englischen) Aufgaben verfügbar, für den Großteil aller Anwendungsfälle sind gelabelte Daten jedoch nur spärrlich bis gar nicht vorhanden. Um dem gegenzusteuern, wurde eine große Auswahl an Techniken entwickelt, welche sich das unüberwachte Lernen oder Datenaugmentierung zu eigen machen (Hedderich et al., 2021a). Unüberwachtes Lernen benötigt jedoch massive Mengen an ungelabelten Daten und versagt, wenn es mit schwierigen (resourcenarmen) Datensituationen konfrontiert wird, d.h. wenn eine größere Distanz zwischen der Quellen- und Zieldatendistributionen vorhanden ist (Kim et al., 2020). Eine distributionelle Distanz kann zum Beispiel der Fall sein, wenn ein Domänenunterschied oder eine größere sprachliche Distanz zwischen der Quellenund Zieldaten besteht. Unüberwachtes Lernen selbst nutzt die hochinformativen (gelabelten) Überwachungssignale, welche sich in ungelabelte Daten verstecken, nicht aus. In dieser Dissertation zeigen wir, dass selbstüberwachtes Lernen, durch die Kombination der richtigen unüberwachten Hilfsaufgabe (z.B. Satzpaarextraktion) mit einer passenden Hauptaufgabe (z.B. maschinelle Übersetzung), diese versteckten Überwachsungssignale effizienter ausnutzen kann als pure unüberwachte Lernalgorithmen, und dabei auch noch weniger gelabelte Daten benötigen als überwachte Lernalgorithmen. Unser selbstüberwachter Lernansatz erlaubt es uns, CL Aufgaben effizient zu lernen, selbst wenn die Trainingsdatenmenge spärrlich ist oder die Daten mit starken distributionellen Differenzen einher gehen, z.B. weil die Daten von zwei nicht verwandten Sprachen stammen. Im Generellen haben wir unüberwachtes Lernen als Hilfsaufgabe angewandt um eine überwachte Hauptaufgabe zu erlernen. Konkret haben wir uns auf Satzpaarextraktion als Hilfsaufgabe für Sequenz-zu-Sequenz Hauptaufgaben (z.B. maschinelle Übersetzung und Stilübertragung) konzentriert sowohl als auch Sprachmodelierung, Clustern, Teilraumlernen und Wissensintegration zum erlernen von Klassifikationsaufgaben (z.B. Hassredenidentifikation und Sentimentanalyse). Für Sequenz-zu-Sequenz Aufgaben zeigen wir, dass selbstüberwachte maschinelle Übersetzung (MÜ) im Vergleich zur unüberwachten MÜ wettbewerbsfähige Ergebnisse auf resourcenreichen Sprachpaaren erreicht und währenddessen weniger Daten zum Lernen benötigt. Wenn selbstüberwachte MÜ mit Augmentationstechniken, inspiriert durch unüberwachte MÜ, kombiniert wird, wird auch das Lernen von resourcenarmen (ähnlichen, entfernt verwandten und nicht verwandten) Sprachpaaren möglich. Außerdem zeigen wir, wie unser selbsüberwachter Lernansatz es ermöglicht Stilübertragung ohne parallele Daten zu erlernen und dabei stylistische Umformulierungen von höchster Qualität auf allen geprüften Aufgaben zu erlangen. Für Sequenz-zu-Label Aufgaben unterstreichen wir den Vorteil, welchen hilfsaufgabenseitige Augmentierung über hauptaufgabenseitige Augmentierung hat. Eine Hilfsaufgabe welche sich als besonders hilfreich für die Qualität der Hauptaufgabe herausstellte ist das Teilraumlernen, welches zu beeindruckenden Leistungssteigerungen für (sprachübergreifende) zero-shot Klassifikation ähnlicher und entfernter Zielaufgaben (auch für ähnliche, entfernt verwandte und nicht verwandte Sprachen) führt

    Towards Reliable and Inclusive Natural Language Generation

    Get PDF
    Natural language generation (NLG) is an important subfield of natural language processing (NLP) that produces natural language output. Despite notable advancements made by large-scale pre-trained language models in NLG, there remain several unresolved challenges. This thesis aims to enhance NLG from two significant aspects: reliability and inclusiveness. For reliability, on the one hand, we introduce novel training objectives that improve the alignment of language generation models with desired model behaviors. To improve the answerability of model-generated questions, we use a question answering model to provide additional rewards to a question generation model, encouraging the production of more answerable questions. In addition, we propose to train language models with a mixture of forward and reverse cross-entropies, demonstrating that the resulting models yield better generated text without complex decoding strategies. On the other hand, we propose novel evaluation methods to assess the performance of NLG models accurately and comprehensively. By combining human and automatic evaluations, we strike a balance between reliability and reproducibility. We delve into the unexplored issue of unfaithfulness in extractive summaries and conclude that extractive summarization does not guarantee faithfulness. For inclusiveness, we extend the coverage of NLG techniques to low-resource or endangered languages. We develop the first machine translation system for supporting translation between Cherokee, an endangered Native American language, and English, and we propose a roadmap for utilizing NLP to support language revitalization efforts. Additionally, we investigate the underrepresentation of low-resource languages during multilingual tokenization, a crucial data preprocessing step in training multilingual NLG models, and we present best practices for training multilingual tokenizers. Overall, this thesis works towards enhancing the trustworthiness of NLG models in practice and facilitating support for a more diverse range of languages worldwide.Doctor of Philosoph

    K + K = 120 : Papers dedicated to László Kálmán and András Kornai on the occasion of their 60th birthdays

    Get PDF
    corecore