857 research outputs found
Improving Audio Caption Fluency with Automatic Error Correction
Automated audio captioning (AAC) is an important cross-modality translation
task, aiming at generating descriptions for audio clips. However, captions
generated by previous AAC models have faced ``false-repetition'' errors due to
the training objective. In such scenarios, we propose a new task of AAC error
correction and hope to reduce such errors by post-processing AAC outputs. To
tackle this problem, we use observation-based rules to corrupt captions without
errors, for pseudo grammatically-erroneous sentence generation. One pair of
corrupted and clean sentences can thus be used for training. We train a neural
network-based model on the synthetic error dataset and apply the model to
correct real errors in AAC outputs. Results on two benchmark datasets indicate
that our approach significantly improves fluency while maintaining semantic
information.Comment: Accepted by NCMMSC 202
Supporting Stylized Language Models Using Multi-Modality Features
As AI and machine learning systems become more common in our everyday lives, there is an increased desire to construct systems that are able to seamlessly interact and communicate with humans. This typically means creating systems that are able to communicate with humans via natural language. Given the variance of natural language, this can be a very challenging task. In this thesis, I explored the topic of humanlike language generation in the context of stylized language generation. Stylized language generation involves producing some text that exhibits a specific, desired style. In this dissertation, I specifically explored the use of multi-modality features as a means to provide sufficient information to produce high-quality stylized text output. I also explored how these multi-modality features can be used to identify and explain errors in the generated output. Finally, I constructed an automated language evaluation metric that can evaluate stylized language models
A Mixed-Methods Study Examining the Difference Between Closed Captioning and Lexile Levels
This experimental mixed-methods study explores what happens to student Lexile scores when they use closed captioning. Since the emergence of closed captioning tools in the 1980s, closed captioning has become more mainstream and easier to access today than at any other time in history (Rickelman et al., 1991). Thus, it is through harnessing this technology and bringing it into the classroom setting that the researcher of this study hopes to provide new approaches for educators that want to improve their student Lexile levels, while also incorporating the SAMR model within our increasingly technologically-focused classrooms (Crompton & Burke, 2018).
The quantitative data analysis procedures involved in this experimental study consisted of utilizing two-sample t-tests to compare the iReady Lexile scores of the participants [n=38] to that of the researched district students [n=810] that were not using closed captioning in this study. The researcher required participants to complete a baseline iReady test to determine their preexisting Lexile levels. Then after the study, participants both in the researched district and in the study, itself were required to complete an iReady post-test to determine their respective Lexile growth in the four areas of reading, which are overall growth, vocabulary, comprehension of literary text, and comprehension of informational text. The independent variable in this study was the use of the enabled closed captioning tool found on the participants\u27 devices. The dependent variable was the Lexile scores that were computed using the iReady Lexile exam.
The researcher collected the qualitative data using a variety of observational logs personal interviews, and pre- and post-surveys that the researcher disseminated to students using the Qualtrics system. Once these data were collected, theming and phenomenology analysis were used to identify themes and student emotions/reactions that emerged throughout this study. The themes that emerged from participants involved in the study included the belief in increasing Lexile levels, no effect on vocabulary, and enjoyment of using closed captioning
Improving fairness in machine learning systems: What do industry practitioners need?
The potential for machine learning (ML) systems to amplify social inequities
and unfairness is receiving increasing popular and academic attention. A surge
of recent work has focused on the development of algorithmic tools to assess
and mitigate such unfairness. If these tools are to have a positive impact on
industry practice, however, it is crucial that their design be informed by an
understanding of real-world needs. Through 35 semi-structured interviews and an
anonymous survey of 267 ML practitioners, we conduct the first systematic
investigation of commercial product teams' challenges and needs for support in
developing fairer ML systems. We identify areas of alignment and disconnect
between the challenges faced by industry practitioners and solutions proposed
in the fair ML research literature. Based on these findings, we highlight
directions for future ML and HCI research that will better address industry
practitioners' needs.Comment: To appear in the 2019 ACM CHI Conference on Human Factors in
Computing Systems (CHI 2019
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
We propose LENS, a modular approach for tackling computer vision problems by
leveraging the power of large language models (LLMs). Our system uses a
language model to reason over outputs from a set of independent and highly
descriptive vision modules that provide exhaustive information about an image.
We evaluate the approach on pure computer vision settings such as zero- and
few-shot object recognition, as well as on vision and language problems. LENS
can be applied to any off-the-shelf LLM and we find that the LLMs with LENS
perform highly competitively with much bigger and much more sophisticated
systems, without any multimodal training whatsoever. We open-source our code at
https://github.com/ContextualAI/lens and provide an interactive demo
Knowledge Graph Extraction from Videos
Nearly all existing techniques for automated video annotation (or captioning)
describe videos using natural language sentences. However, this has several
shortcomings: (i) it is very hard to then further use the generated natural
language annotations in automated data processing, (ii) generating natural
language annotations requires to solve the hard subtask of generating
semantically precise and syntactically correct natural language sentences,
which is actually unrelated to the task of video annotation, (iii) it is
difficult to quantitatively measure performance, as standard metrics (e.g.,
accuracy and F1-score) are inapplicable, and (iv) annotations are
language-specific. In this paper, we propose the new task of knowledge graph
extraction from videos, i.e., producing a description in the form of a
knowledge graph of the contents of a given video. Since no datasets exist for
this task, we also include a method to automatically generate them, starting
from datasets where videos are annotated with natural language. We then
describe an initial deep-learning model for knowledge graph extraction from
videos, and report results on MSVD* and MSR-VTT*, two datasets obtained from
MSVD and MSR-VTT using our method.Comment: 10 pages, 4 figure
- …