6,711 research outputs found
Spartan Daily January 25, 2012
Volume 138, Issue 1https://scholarworks.sjsu.edu/spartandaily/1000/thumbnail.jp
Automatic Pronunciation Assessment -- A Review
Pronunciation assessment and its application in computer-aided pronunciation
training (CAPT) have seen impressive progress in recent years. With the rapid
growth in language processing and deep learning over the past few years, there
is a need for an updated review. In this paper, we review methods employed in
pronunciation assessment for both phonemic and prosodic. We categorize the main
challenges observed in prominent research trends, and highlight existing
limitations, and available resources. This is followed by a discussion of the
remaining challenges and possible directions for future work.Comment: 9 pages, accepted to EMNLP Finding
The elastic use of 'some': a comparative study between l1 and l2 speakers in educational settings
This study explored some using a refreshing approach: focusing on its elasticity. It was a comparative study of L1 (American) and L2 (Chinese and Vietnamese) speakers and found that L2 speakers are vaguer than L1 speakers, and that the elasticity of some is manifested through the fluid, stretchable and strategic features of some’s pragmatic meanings and functions. The implication is that an understanding of its elastic nature may be integrated into the curriculum of English language teaching
VNHSGE: VietNamese High School Graduation Examination Dataset for Large Language Models
The VNHSGE (VietNamese High School Graduation Examination) dataset, developed
exclusively for evaluating large language models (LLMs), is introduced in this
article. The dataset, which covers nine subjects, was generated from the
Vietnamese National High School Graduation Examination and comparable tests.
300 literary essays have been included, and there are over 19,000
multiple-choice questions on a range of topics. The dataset assesses LLMs in
multitasking situations such as question answering, text generation, reading
comprehension, visual question answering, and more by including both textual
data and accompanying images. Using ChatGPT and BingChat, we evaluated LLMs on
the VNHSGE dataset and contrasted their performance with that of Vietnamese
students to see how well they performed. The results show that ChatGPT and
BingChat both perform at a human level in a number of areas, including
literature, English, history, geography, and civics education. They still have
space to grow, though, especially in the areas of mathematics, physics,
chemistry, and biology. The VNHSGE dataset seeks to provide an adequate
benchmark for assessing the abilities of LLMs with its wide-ranging coverage
and variety of activities. We intend to promote future developments in the
creation of LLMs by making this dataset available to the scientific community,
especially in resolving LLMs' limits in disciplines involving mathematics and
the natural sciences.Comment: 74 pages, 44 figure
Automatic Speech Recognition for Low-resource Languages and Accents Using Multilingual and Crosslingual Information
This thesis explores methods to rapidly bootstrap automatic speech recognition systems for languages, which lack resources for speech and language processing. We focus on finding approaches which allow using data from multiple languages to improve the performance for those languages on different levels, such as feature extraction, acoustic modeling and language modeling. Under application aspects, this thesis also includes research work on non-native and Code-Switching speech
Recommended from our members
Cross-generational linguistic variation in the Canberra Vietnamese heritage language community: A corpus-centred investigation
This dissertation investigates cross-generational linguistic differences in the Canberra Vietnamese bilingual community, with a particular focus on Vietnamese as the heritage language. Specifically, it documents the vernacular and considers key aspects of this data from different theoretical perspectives. Its main contribution is an insight into a rarely studied heritage language variety in a contact community that has never been examined.
The dissertation consists of five core chapters, organised into two parts. In the first part (Chapters 2–3), I describe how I documented the vernacular and created the Canberra Vietnamese English Corpus (CanVEC), an original corpus compiled specifically for this study that is also the first to be freely available for research purposes. The corpus consists of over ten hours of spontaneous speech produced by 45 Vietnamese-English bilingual speakers across two generations living in Canberra. In the second part of the study (Chapters 4–6), I put the corpus to use and investigate aspects of the cross-generational differences in Vietnamese as the heritage language in this community.
In particular, I first probe the Vietnamese heritage language via its participation in the code-switching discourse (Chapter 4). In doing so, I focus on the applicability of the Matrix Language Framework (MLF) (Myers-Scotton, 1993, 2002) and its associated Matrix Language (ML) Turnover Hypothesis (Myers-Scotton, 1998) to the code-switching data in CanVEC. Since support for this prominent model has mainly come from language pairs that have different clausal word order or vastly different inventories of inflectional morphology, Vietnamese-English as a pair in which both languages are SVO and essentially isolating offers a tantalising testing ground for its application. Results show that the universal claims of this model do not hold so straight-forwardly. CanVEC data challenges several assumptions of the MLF, with the model ultimately only being able to account for around half of the CanVEC code-switching data. I further demonstrate that even when the ML is putatively identifiable and a cross-generational ML ‘turnover’ is quantitatively observed, the predictions do not reflect the direction of structural influence that we see in CanVEC. The MLF approach therefore sheds only limited light on cross-generational language shift and variation in this community.
Given that null elements emerge as a distinct area of difficulty in Chapter 4, I take this aspect as the focal point for the next part of the investigation (Chapter 5), where I use the variationist approach (Labov, 1972 et seq.) to explore three cases where null and overt realisation alternates in Vietnamese: subjects, objects, and copulas. In doing so, I move away from the bilingual portion of CanVEC to examine the monolingual heritage Vietnamese subset directly. Results show that Vietnamese null subjects vary significantly across generations, while null objects and copulas remain stable in terms of use. As speakers also overwhelmingly prefer overt forms over null forms (∼70:30) across all the three of the variables of interest, I appeal to the generative interface-oriented approach (Sorace & Filiaci, 2006 et seq.) to next examine the distribution of overt subjects, objects, copulas (Chapter 6). These results converge with what was found for null forms: cross-generational effects were observed for pronominal subjects, but not pronominal objects and copulas. This finding also supports the importance of a distinction drawn in previous works between internal (syntax-semantics) and external (syntax-discourse/pragmatics) interface phenomena, with the latter being seemingly more susceptible to change.
Ultimately, this dissertation highlights the empirical and theoretical value of studying rarely considered contact varieties, while deploying an integrated approach that acknowledges the multi-faceted complexity of the contact communities where these varieties are spoken.Cambridge Trust International Scholarshi
- …