2,127 research outputs found
The Validation of Speech Corpora
1.2 Intended audience........................
A Deep Dive into the Disparity of Word Error Rates Across Thousands of NPTEL MOOC Videos
Automatic speech recognition (ASR) systems are designed to transcribe spoken
language into written text and find utility in a variety of applications
including voice assistants and transcription services. However, it has been
observed that state-of-the-art ASR systems which deliver impressive benchmark
results, struggle with speakers of certain regions or demographics due to
variation in their speech properties. In this work, we describe the curation of
a massive speech dataset of 8740 hours consisting of K technical
lectures in the English language along with their transcripts delivered by
instructors representing various parts of Indian demography. The dataset is
sourced from the very popular NPTEL MOOC platform. We use the curated dataset
to measure the existing disparity in YouTube Automatic Captions and OpenAI
Whisper model performance across the diverse demographic traits of speakers in
India. While there exists disparity due to gender, native region, age and
speech rate of speakers, disparity based on caste is non-existent. We also
observe statistically significant disparity across the disciplines of the
lectures. These results indicate the need of more inclusive and robust ASR
systems and more representational datasets for disparity evaluation in them
Crowdsourcing in Computer Vision
Computer vision systems require large amounts of manually annotated data to
properly learn challenging visual concepts. Crowdsourcing platforms offer an
inexpensive method to capture human knowledge and understanding, for a vast
number of visual perception tasks. In this survey, we describe the types of
annotations computer vision researchers have collected using crowdsourcing, and
how they have ensured that this data is of high quality while annotation effort
is minimized. We begin by discussing data collection on both classic (e.g.,
object recognition) and recent (e.g., visual story-telling) vision tasks. We
then summarize key design decisions for creating effective data collection
interfaces and workflows, and present strategies for intelligently selecting
the most important data instances to annotate. Finally, we conclude with some
thoughts on the future of crowdsourcing in computer vision.Comment: A 69-page meta review of the field, Foundations and Trends in
Computer Graphics and Vision, 201
Review of F4transkript, a simple interface for efficient annotation
National Foreign Language Resource Cente
Computational Sociolinguistics: A Survey
Language is a social phenomenon and variation is inherent to its social
nature. Recently, there has been a surge of interest within the computational
linguistics (CL) community in the social dimension of language. In this article
we present a survey of the emerging field of "Computational Sociolinguistics"
that reflects this increased interest. We aim to provide a comprehensive
overview of CL research on sociolinguistic themes, featuring topics such as the
relation between language and social identity, language use in social
interaction and multilingual communication. Moreover, we demonstrate the
potential for synergy between the research communities involved, by showing how
the large-scale data-driven methods that are widely used in CL can complement
existing sociolinguistic studies, and how sociolinguistics can inform and
challenge the methods and assumptions employed in CL studies. We hope to convey
the possible benefits of a closer collaboration between the two communities and
conclude with a discussion of open challenges.Comment: To appear in Computational Linguistics. Accepted for publication:
18th February, 201
Crowdsourcing for Language Resource Development: Criticisms About Amazon Mechanical Turk Overpowering Use
International audienceThis article is a position paper about Amazon Mechanical Turk, the use of which has been steadily growing in language processing in the past few years. According to the mainstream opinion expressed in articles of the domain, this type of on-line working platforms allows to develop quickly all sorts of quality language resources, at a very low price, by people doing that as a hobby. We shall demonstrate here that the situation is far from being that ideal. Our goal here is manifold: 1- to inform researchers, so that they can make their own choices, 2- to develop alternatives with the help of funding agencies and scientific associations, 3- to propose practical and organizational solutions in order to improve language resources development, while limiting the risks of ethical and legal issues without letting go price or quality, 4- to introduce an Ethics and Big Data Charter for the documentation of language resourc
CLAD: A Complex and Long Activities Dataset with Rich Crowdsourced Annotations
This paper introduces a novel activity dataset which exhibits real-life and
diverse scenarios of complex, temporally-extended human activities and actions.
The dataset presents a set of videos of actors performing everyday activities
in a natural and unscripted manner. The dataset was recorded using a static
Kinect 2 sensor which is commonly used on many robotic platforms. The dataset
comprises of RGB-D images, point cloud data, automatically generated skeleton
tracks in addition to crowdsourced annotations. Furthermore, we also describe
the methodology used to acquire annotations through crowdsourcing. Finally some
activity recognition benchmarks are presented using current state-of-the-art
techniques. We believe that this dataset is particularly suitable as a testbed
for activity recognition research but it can also be applicable for other
common tasks in robotics/computer vision research such as object detection and
human skeleton tracking
- …