311 research outputs found
Integrating lexical and prosodic features for automatic paragraph segmentation
Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically
identify their discourse structure is an important step to understanding what a spoken document is about. Moreover,
finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little
work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how
discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical
and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models
using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical
cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak
lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural
networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that
integrate representations generated by separate lexical and prosodic models while allowing interactions between these
features streams rather than treating them as independent information sources. Application to ASR outputs shows that
adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to
transcription errors.The second author was funded from the EU’s Horizon
2020 Research and Innovation Programme under the GA
H2020-RIA-645012 and the Spanish Ministry of Economy
and Competitivity Juan de la Cierva program. The other
authors were funded by the University of Edinburgh
TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog
Syntactic and pragmatic completeness is known to be important for turn-taking
prediction, but so far machine learning models of turn-taking have used such
linguistic information in a limited way. In this paper, we introduce TurnGPT, a
transformer-based language model for predicting turn-shifts in spoken dialog.
The model has been trained and evaluated on a variety of written and spoken
dialog datasets. We show that the model outperforms two baselines used in prior
work. We also report on an ablation study, as well as attention and gradient
analyses, which show that the model is able to utilize the dialog context and
pragmatic completeness for turn-taking prediction. Finally, we explore the
model's potential in not only detecting, but also projecting, turn-completions.Comment: Accepted to Findings of ACL: EMNLP 202
Deep Learning Techniques for Music Generation -- A Survey
This paper is a survey and an analysis of different ways of using deep
learning (deep artificial neural networks) to generate musical content. We
propose a methodology based on five dimensions for our analysis:
Objective - What musical content is to be generated? Examples are: melody,
polyphony, accompaniment or counterpoint. - For what destination and for what
use? To be performed by a human(s) (in the case of a musical score), or by a
machine (in the case of an audio file).
Representation - What are the concepts to be manipulated? Examples are:
waveform, spectrogram, note, chord, meter and beat. - What format is to be
used? Examples are: MIDI, piano roll or text. - How will the representation be
encoded? Examples are: scalar, one-hot or many-hot.
Architecture - What type(s) of deep neural network is (are) to be used?
Examples are: feedforward network, recurrent network, autoencoder or generative
adversarial networks.
Challenge - What are the limitations and open challenges? Examples are:
variability, interactivity and creativity.
Strategy - How do we model and control the process of generation? Examples
are: single-step feedforward, iterative feedforward, sampling or input
manipulation.
For each dimension, we conduct a comparative analysis of various models and
techniques and we propose some tentative multidimensional typology. This
typology is bottom-up, based on the analysis of many existing deep-learning
based systems for music generation selected from the relevant literature. These
systems are described and are used to exemplify the various choices of
objective, representation, architecture, challenge and strategy. The last
section includes some discussion and some prospects.Comment: 209 pages. This paper is a simplified version of the book: J.-P.
Briot, G. Hadjeres and F.-D. Pachet, Deep Learning Techniques for Music
Generation, Computational Synthesis and Creative Systems, Springer, 201
Adverse drug reaction extraction on electronic health records written in Spanish
148 p.This work focuses on the automatic extraction of Adverse Drug Reactions (ADRs) in Electronic HealthRecords (EHRs). That is, extracting a response to a medicine which is noxious and unintended and whichoccurs at doses normally used. From Natural Language Processing (NLP) perspective, this wasapproached as a relation extraction task in which the drug is the causative agent of a disease, sign orsymptom, that is, the adverse reaction.ADR extraction from EHRs involves major challenges. First, ADRs are rare events. That is, relationsbetween drugs and diseases found in an EHR are seldom ADRs (are often unrelated or, instead, related astreatment). This implies the inference from samples with skewed class distribution. Second, EHRs arewritten by experts often under time pressure, employing both rich medical jargon together with colloquialexpressions (not always grammatical) and it is not infrequent to find misspells and both standard andnon-standard abbreviations. All this leads to a high lexical variability.We explored several ADR detection algorithms and representations to characterize the ADR candidates.In addition, we have assessed the tolerance of the ADR detection model to external noise such as theincorrect detection of implied medical entities implied in the ADR extraction, i.e. drugs and diseases. Westtled the first steps on ADR extraction in Spanish using a corpus of real EHRs
Transformer Reinforcement Learning for Procedural Level Generation
This paper examines how recent advances in sequence modeling translate for machine learning assisted procedural level generation. We explore the use of Transformer based models like DistilGPT-2 to generate platformer levels, specifically for the game Super Mario Bros., and explore how we can use reinforcement learning to push the model towards a task like generating levels that are actually beatable. We found that large language models (LLMs) can be used without any major modifications from the original NLP focused models to instead generate levels for the aforementioned game.
However, the main focus of the research is connected to how advancement in the area of NLP by the use of reinforcement learning (RL) algorithms, specifically PPO, translates to the arena of procedural level generation in cases where the levels can be treated as token sequences.
We did however not find any combinations of hyperparameters that allowed the PPO to reach higher better results than our baseline model trained for next token prediction.
Despite its success in the area of NLP, we failed to find a combination of hyperparameters that improved upon the level generation by applying an reward for the whole level. However there are methods that we did not try yet, like finding specific parts of the level to reward and penalize
- …