1,506 research outputs found
Investigating the build-up of precedence effect using reflection masking
The auditory processing level involved in the buildâup of precedence [Freyman et al., J. Acoust. Soc. Am. 90, 874â884 (1991)] has been investigated here by employing reflection masked threshold (RMT) techniques. Given that RMT techniques are generally assumed to address lower levels of the auditory signal processing, such an approach represents a bottomâup approach to the buildup of precedence. Three conditioner configurations measuring a possible buildup of reflection suppression were compared to the baseline RMT for four reflection delays ranging from 2.5â15 ms. No buildup of reflection suppression was observed for any of the conditioner configurations. Buildup of template (decrease in RMT for two of the conditioners), on the other hand, was found to be delay dependent. For five of six listeners, with reflection delay=2.5 and 15 ms, RMT decreased relative to the baseline. For 5â and 10âms delay, no change in threshold was observed. It is concluded that the lowâlevel auditory processing involved in RMT is not sufficient to realize a buildup of reflection suppression. This confirms suggestions that higher level processing is involved in PE buildup. The observed enhancement of reflection detection (RMT) may contribute to active suppression at higher processing levels
Recommended from our members
Deep Learning for Automatic Assessment and Feedback of Spoken English
Growing global demand for learning a second language (L2), particularly English, has led to
considerable interest in automatic spoken language assessment, whether for use in computerassisted language learning (CALL) tools or for grading candidates for formal qualifications.
This thesis presents research conducted into the automatic assessment of spontaneous nonnative English speech, with a view to be able to provide meaningful feedback to learners. One
of the challenges in automatic spoken language assessment is giving candidates feedback on
particular aspects, or views, of their spoken language proficiency, in addition to the overall
holistic score normally provided. Another is detecting pronunciation and other types of errors
at the word or utterance level and feeding them back to the learner in a useful way.
It is usually difficult to obtain accurate training data with separate scores for different
views and, as examiners are often trained to give holistic grades, single-view scores can
suffer issues of consistency. Conversely, holistic scores are available for various standard
assessment tasks such as Linguaskill. An investigation is thus conducted into whether
assessment scores linked to particular views of the speakerâs ability can be obtained from
systems trained using only holistic scores.
End-to-end neural systems are designed with structures and forms of input tuned to single
views, specifically each of pronunciation, rhythm, intonation and text. By training each
system on large quantities of candidate data, individual-view information should be possible
to extract. The relationships between the predictions of each system are evaluated to examine
whether they are, in fact, extracting different information about the speaker. Three methods
of combining the systems to predict holistic score are investigated, namely averaging their
predictions and concatenating and attending over their intermediate representations. The
combined graders are compared to each other and to baseline approaches.
The tasks of error detection and error tendency diagnosis become particularly challenging
when the speech in question is spontaneous and particularly given the challenges posed by
the inconsistency of human annotation of pronunciation errors. An approach to these tasks is
presented by distinguishing between lexical errors, wherein the speaker does not know how a
particular word is pronounced, and accent errors, wherein the candidateâs speech exhibits
consistent patterns of phone substitution, deletion and insertion. Three annotated corpora
x
of non-native English speech by speakers of multiple L1s are analysed, the consistency of
human annotation investigated and a method presented for detecting individual accent and
lexical errors and diagnosing accent error tendencies at the speaker level
Dealing with linguistic mismatches for automatic speech recognition
Recent breakthroughs in automatic speech recognition (ASR) have resulted in a word error rate (WER) on par with human transcribers on the English Switchboard benchmark. However, dealing with linguistic mismatches between the training and testing data is still a significant challenge that remains unsolved. Under the monolingual environment, it is well-known that the performance of ASR systems degrades significantly when presented with the speech from speakers with different accents, dialects, and speaking styles than those encountered during system training. Under the multi-lingual environment, ASR systems trained on a source language achieve even worse performance when tested on another target language because of mismatches in terms of the number of phonemes, lexical ambiguity, and power of phonotactic constraints provided by phone-level n-grams.
In order to address the issues of linguistic mismatches for current ASR systems, my dissertation investigates both knowledge-gnostic and knowledge-agnostic solutions. In the first part, classic theories relevant to acoustics and articulatory phonetics that present capability of being transferred across a dialect continuum from local dialects to another standardized language are re-visited. Experiments demonstrate the potentials that acoustic correlates in the vicinity of landmarks could help to build a bridge for dealing with mismatches across difference local or global varieties in a dialect continuum. In the second part, we design an end-to-end acoustic modeling approach based on connectionist temporal classification loss and propose to link the training of acoustics and accent altogether in a manner similar to the learning process in human speech perception. This joint model not only performed well on ASR with multiple accents but also boosted accuracies of accent identification task in comparison to separately-trained models
A computational model for studying L1âs effect on L2 speech learning
abstract: Much evidence has shown that first language (L1) plays an important role in the formation of L2 phonological system during second language (L2) learning process. This combines with the fact that different L1s have distinct phonological patterns to indicate the diverse L2 speech learning outcomes for speakers from different L1 backgrounds. This dissertation hypothesizes that phonological distances between accented speech and speakers' L1 speech are also correlated with perceived accentedness, and the correlations are negative for some phonological properties. Moreover, contrastive phonological distinctions between L1s and L2 will manifest themselves in the accented speech produced by speaker from these L1s. To test the hypotheses, this study comes up with a computational model to analyze the accented speech properties in both segmental (short-term speech measurements on short-segment or phoneme level) and suprasegmental (long-term speech measurements on word, long-segment, or sentence level) feature space. The benefit of using a computational model is that it enables quantitative analysis of L1's effect on accent in terms of different phonological properties. The core parts of this computational model are feature extraction schemes to extract pronunciation and prosody representation of accented speech based on existing techniques in speech processing field. Correlation analysis on both segmental and suprasegmental feature space is conducted to look into the relationship between acoustic measurements related to L1s and perceived accentedness across several L1s. Multiple regression analysis is employed to investigate how the L1's effect impacts the perception of foreign accent, and how accented speech produced by speakers from different L1s behaves distinctly on segmental and suprasegmental feature spaces. Results unveil the potential application of the methodology in this study to provide quantitative analysis of accented speech, and extend current studies in L2 speech learning theory to large scale. Practically, this study further shows that the computational model proposed in this study can benefit automatic accentedness evaluation system by adding features related to speakers' L1s.Dissertation/ThesisDoctoral Dissertation Speech and Hearing Science 201
Text Extraction and Web Searching in a Non-Latin Language
Recent studies of queries submitted to Internet Search Engines have shown that
non-English queries and unclassifiable queries have nearly tripled during the
last decade. Most search engines were originally engineered for English. They
do not take full account of inflectional semantics nor, for example, diacritics or
the use of capitals which is a common feature in languages other than English.
The literature concludes that searching using non-English and non-Latin based
queries results in lower success and requires additional user effort to achieve
acceptable precision.
The primary aim of this research study is to develop an evaluation methodology
for identifying the shortcomings and measuring the effectiveness of
search engines with non-English queries. It also proposes a number of solutions
for the existing situation. A Greek query log is analyzed considering the morphological
features of the Greek language. Also a text extraction experiment
revealed some problems related to the encoding and the morphological and
grammatical differences among semantically equivalent Greek terms. A first
stopword list for Greek based on a domain independent collection has been
produced and its application in Web searching has been studied. The effect of
lemmatization of query terms and the factors influencing text based image retrieval
in Greek are also studied. Finally, an instructional strategy is presented
for teaching non-English students how to effectively utilize search engines.
The evaluation of the capabilities of the search engines showed that international
and nationwide search engines ignore most of the linguistic idiosyncrasies
of Greek and other complex European languages. There is a lack of
freely available non-English resources to work with (test corpus, linguistic resources,
etc). The research showed that the application of standard IR techniques,
such as stopword removal, stemming, lemmatization and query expansion,
in Greek Web searching increases precision.
i
Predicting and auralizing acoustics in classrooms
Although classrooms have fairly simple geometries, this type of room is known to cause problems when trying to predict their acoustics using room acoustics computer modeling. Some typical features from a room acoustics point of view are: Parallel walls, low ceilings (the rooms are flat), uneven distribution of absorption, and most of the floor being covered with furniture which at long distances act as scattering elements, and at short distance provide strong specular components. The importance of diffraction and scattering is illustrated in numbers and by means of auralization, using ODEON 8 Beta
- âŠ