2,894 research outputs found
Recommended from our members
Minimally supervised induction of morphology through bitexts
textA knowledge of morphology can be useful for many natural language processing systems. Thus, much effort has been expended in developing accurate computational tools for morphology that lemmatize, segment and generate new forms. The most powerful and accurate of these have been manually encoded, such endeavors being without exception expensive and time-consuming. There have been consequently many attempts to reduce this cost in the development of morphological systems through the development of unsupervised or minimally supervised algorithms and learning methods for acquisition of morphology. These efforts have yet to produce a tool that approaches the performance of manually encoded systems.
Here, I present a strategy for dealing with morphological clustering and segmentation in a minimally supervised manner but one that will be more linguistically informed than previous unsupervised approaches. That is, this study will attempt to induce clusters of words from an unannotated text that are inflectional variants of each other. Then a set of inflectional suffixes by part-of-speech will be induced from these clusters. This level of detail is made possible by a method known as alignment and transfer (AT), among other names, an approach that uses aligned bitexts to transfer linguistic resources developed for one language–the source language–to another language–the target. This approach has a further advantage in that it allows a reduction in the amount of training data without a significant degradation in performance making it useful in applications targeted at data collected from endangered languages. In the current study, however, I use English as the source and German as the target for ease of evaluation and for certain typlogical properties of German. The two main tasks, that of clustering and segmentation, are approached as sequential tasks with the clustering informing the segmentation to allow for greater accuracy in morphological analysis.
While the performance of these methods does not exceed the current roster of unsupervised or minimally supervised approaches to morphology acquisition, it attempts to integrate more learning methods than previous studies. Furthermore, it attempts to learn inflectional morphology as opposed to derivational morphology, which is a crucial distinction in linguistics.Linguistic
Text-based Editing of Talking-head Video
Editing talking-head video to change the speech content or to remove filler words is challenging. We propose a novel method to edit talking-head video based on its transcript to produce a realistic output video in which the dialogue of the speaker has been modified, while maintaining a seamless audio-visual flow (i.e. no jump cuts). Our method automatically annotates an input talking-head video with phonemes, visemes, 3D face pose and geometry, reflectance, expression and scene illumination per frame. To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material. The annotated parameters corresponding to the selected segments are seamlessly stitched together and used to produce an intermediate video representation in which the lower half of the face is rendered with a parametric face model. Finally, a recurrent video generation network transforms this representation to a photorealistic video that matches the edited transcript. We demonstrate a large variety of edits, such as the addition, removal, and alteration of words, as well as convincing language translation and full sentence synthesis
Adapting Sequence to Sequence models for Text Normalization in Social Media
Social media offer an abundant source of valuable raw data, however informal
writing can quickly become a bottleneck for many natural language processing
(NLP) tasks. Off-the-shelf tools are usually trained on formal text and cannot
explicitly handle noise found in short online posts. Moreover, the variety of
frequently occurring linguistic variations presents several challenges, even
for humans who might not be able to comprehend the meaning of such posts,
especially when they contain slang and abbreviations. Text Normalization aims
to transform online user-generated text to a canonical form. Current text
normalization systems rely on string or phonetic similarity and classification
models that work on a local fashion. We argue that processing contextual
information is crucial for this task and introduce a social media text
normalization hybrid word-character attention-based encoder-decoder model that
can serve as a pre-processing step for NLP applications to adapt to noisy text
in social media. Our character-based component is trained on synthetic
adversarial examples that are designed to capture errors commonly found in
online user-generated text. Experiments show that our model surpasses neural
architectures designed for text normalization and achieves comparable
performance with state-of-the-art related work.Comment: Accepted at the 13th International AAAI Conference on Web and Social
Media (ICWSM 2019
- …