611 research outputs found
Spartan Daily, December 4, 1970
Volume 58, Issue 47https://scholarworks.sjsu.edu/spartandaily/5337/thumbnail.jp
About Voice: A Longitudinal Study of Speaker Recognition Dataset Dynamics
Like face recognition, speaker recognition is widely used for voice-based
biometric identification in a broad range of industries, including banking,
education, recruitment, immigration, law enforcement, healthcare, and
well-being. However, while dataset evaluations and audits have improved data
practices in computer vision and face recognition, the data practices in
speaker recognition have gone largely unquestioned. Our research aims to
address this gap by exploring how dataset usage has evolved over time and what
implications this has on bias and fairness in speaker recognition systems.
Previous studies have demonstrated the presence of historical, representation,
and measurement biases in popular speaker recognition benchmarks. In this
paper, we present a longitudinal study of speaker recognition datasets used for
training and evaluation from 2012 to 2021. We survey close to 700 papers to
investigate community adoption of datasets and changes in usage over a crucial
time period where speaker recognition approaches transitioned to the widespread
adoption of deep neural networks. Our study identifies the most commonly used
datasets in the field, examines their usage patterns, and assesses their
attributes that affect bias, fairness, and other ethical concerns. Our findings
suggest areas for further research on the ethics and fairness of speaker
recognition technology.Comment: 14 pages (23 with References and Appendix
Deep Transfer Learning for Automatic Speech Recognition: Towards Better Generalization
Automatic speech recognition (ASR) has recently become an important challenge
when using deep learning (DL). It requires large-scale training datasets and
high computational and storage resources. Moreover, DL techniques and machine
learning (ML) approaches in general, hypothesize that training and testing data
come from the same domain, with the same input feature space and data
distribution characteristics. This assumption, however, is not applicable in
some real-world artificial intelligence (AI) applications. Moreover, there are
situations where gathering real data is challenging, expensive, or rarely
occurring, which can not meet the data requirements of DL models. deep transfer
learning (DTL) has been introduced to overcome these issues, which helps
develop high-performing models using real datasets that are small or slightly
different but related to the training data. This paper presents a comprehensive
survey of DTL-based ASR frameworks to shed light on the latest developments and
helps academics and professionals understand current challenges. Specifically,
after presenting the DTL background, a well-designed taxonomy is adopted to
inform the state-of-the-art. A critical analysis is then conducted to identify
the limitations and advantages of each framework. Moving on, a comparative
study is introduced to highlight the current challenges before deriving
opportunities for future research
The effect of digital apps on Vietnamese EFL learners’ receptive vocabulary acquisition : a case study of quizlet and paper flashcards
The thesis aims to investigate the efficacy of a digital vocabulary learning application called Quizlet compared with that of a more traditional method, such as paper flashcards, among English as a Foreign Language (EFL) learners in Vietnam where the teaching and learning of English has been an object of concern for the government. Reports so far have recorded slow progress and official policies attempt to encourage improvement in the area including the use of digital media in teaching and learning. So it is legitimate to ask whether reliance on digital media in EFL education may be justified. This is the practical motivation of this project, which compares a digital tool and a more traditional tool used for the same purpose: the learning of the L2 lexicon.
The theoretical framework of the study is the Cognitive-Affective Theory of Learning with Media (CATLM) (Moreno & Mayer, 2007), and the evaluation framework used follows Miyamoto (2001) according to whom multimodal second language learning activities should be evaluated from three different perspectives:
the linguistic development in the learner, (2) the linguistic environment provided by the learning tool and (3) the learner’s perception on the learning tool. Consequently, this study examines two vocabulary learning tools, Quizlet and paper flashcards in terms of (a) actual learning outcomes; (b) input, output, interaction and feedback and (c) learners’ attitude.
This study follows a design including pre-test, training (two one-hour reading and vocabulary learning sessions per week for four weeks) and immediate post-test as well as delayed post-test. Participants in the study were an intact class of 39 high school students in Vietnam. They were divided into two groups. Approximately twenty new words selected from a reading passage were introduced to the students each week. As for the vocabulary learning tools, group A used Quizlet while group B paper flashcards for the first two weeks. Then, group A switched to paper flashcards, and group B Quizlet in the following two weeks. This method was used to counterbalance the order effect of using two different tools.
Data analysis included screen captures (Quizlet) and video recordings (paper flashcards) of six randomly selected participants’ learning activities during training sessions; improvements from vocabulary pre-tests to post-tests and; participants’ responses to a questionnaire. Results suggest that both of the tools have a positive influence on vocabulary learning. However, Quizlet appears to be more effective than paper flashcards in fostering vocabulary development. Additionally, Quizlet has various advantages over paper flashcards in terms of the linguistic environment provided for learning and meets students’ preference. However, paper flashcards do have some specific merits such as encouraging students to practise pronouncing words, which was not observed on Quizlet.
The research proposes that there is some justification to the belief that digital apps may elicit better results overall than some of the more traditional method for L2 vocabulary learning in English as a second language because they provide a greater variety of linguistic environments and because they can help meet the need for exposure to native English in the Vietnamese school system
Graphonomics and your Brain on Art, Creativity and Innovation : Proceedings of the 19th International Graphonomics Conference (IGS 2019 – Your Brain on Art)
[Italiano]: “Grafonomia e cervello su arte, creatività e innovazione”.
Un forum internazionale per discutere sui recenti progressi nell'interazione tra arti creative, neuroscienze, ingegneria, comunicazione, tecnologia, industria, istruzione, design, applicazioni forensi e mediche. I contributi hanno esaminato lo stato dell'arte, identificando sfide e opportunità , e hanno delineato le possibili linee di sviluppo di questo settore di ricerca. I temi affrontati includono: strategie integrate per la comprensione dei sistemi neurali, affettivi e cognitivi in ambienti realistici e complessi; individualità e differenziazione dal punto di vista neurale e comportamentale; neuroaesthetics (uso delle neuroscienze per spiegare e comprendere le esperienze estetiche a livello neurologico); creatività e innovazione; neuro-ingegneria e arte ispirata dal cervello, creatività e uso di dispositivi di mobile brain-body imaging (MoBI) indossabili; terapia basata su arte creativa; apprendimento informale; formazione; applicazioni forensi. / [English]: “Graphonomics and your brain on art, creativity and innovation”.
A single track, international forum for discussion on recent advances at the intersection of the creative arts, neuroscience, engineering, media, technology, industry, education, design, forensics, and medicine.
The contributions reviewed the state of the art, identified challenges and opportunities and created a roadmap for the field of graphonomics and your brain on art.
The topics addressed include: integrative strategies for understanding neural, affective and cognitive systems in realistic, complex environments; neural and behavioral individuality and variation; neuroaesthetics (the use of neuroscience to explain and understand the aesthetic experiences at the neurological level); creativity and innovation; neuroengineering and brain-inspired art, creative concepts and wearable mobile brain-body imaging (MoBI) designs; creative art therapy; informal learning; education; forensics
The development of self-identification in Chinese-Vietnamese children in Australia : the influence of family language practices and changing social environments
This thesis investigates the development of children’s self-identification in minority bi-ethnic migrant families in relation to their multilingual and multicultural practices, within the context of exogamous families in Australia. While these bi-ethnic partnerships implicitly or explicitly implement policies and strategies to encourage the use of home languages, there is scant understanding of the dynamic interrelation between the development of identity in multi-ethnic children and their language development in changing social environments. Bi- and multilingual children’s language acquisition, family language policy and identity issues have been extensively studied internationally. However, these studies do not systematically investigate the connections between identity development in multilingual children, their respective family’s linguistic and cultural input, and their social environments. This thesis examines family language practices and socio-environmental factors impacting young children’s identity construction, to complement previous research on Australian bilingual children. It seeks to contribute to the current debate between essentialist (psychological) versus non-essentialist (socio-linguistic) identity issues by examining children’s expression of self in response to the three languages in their environment, including their families’ referential practices. It also observes the effects of different social contexts and changing circumstances on children’s self-identification. The design of this research is longitudinal, as it aims to gather data from two Australian Cantonese-Vietnamese families over three years. The key finding of this study is that children construct their identity in a dynamic and context-bound way. Results identify three major influencing factors as playing a role in the children’s self-identification: 1) family language input and practices; 2) family ideologies, cultural practices, and family networks, as well as the migrant community and 3) peers and the childcare/school environments. This thesis contributes new empirical data to existing research on family language policy and adds new language pairs to the field of heritage language maintenance and child identity in the Australian context. The data suggests that self-identification develops in a context-bound way parallel to the context-bound language development proposed in Qi and Di Biase (2020). It reveals that children’s self-identification grows not merely under the influence of their family’s linguistic and cultural practices, but also adjusts to changing circumstances and pressures from peers and adult role models in the dominant environment. These findings may play a role in the preservation of heritage languages and family wellbeing
Robust text independent closed set speaker identification systems and their evaluation
PhD ThesisThis thesis focuses upon text independent closed set speaker
identi cation. The contributions relate to evaluation studies in the
presence of various types of noise and handset e ects. Extensive
evaluations are performed on four databases.
The rst contribution is in the context of the use of the Gaussian
Mixture Model-Universal Background Model (GMM-UBM) with
original speech recordings from only the TIMIT database. Four main
simulations for Speaker Identi cation Accuracy (SIA) are presented
including di erent fusion strategies: Late fusion (score based), early
fusion (feature based) and early-late fusion (combination of feature and
score based), late fusion using concatenated static and dynamic
features (features with temporal derivatives such as rst order
derivative delta and second order derivative delta-delta features,
namely acceleration features), and nally fusion of statistically
independent normalized scores.
The second contribution is again based on the GMM-UBM
approach. Comprehensive evaluations of the e ect of Additive White
Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and
without a G.712 type handset) upon identi cation performance are
undertaken. In particular, three NSN types with varying Signal to
Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus
interior and a crowded talking environment. The performance
evaluation also considered the e ect of late fusion techniques based on
score fusion, namely mean, maximum, and linear weighted sum fusion.
The databases employed were: TIMIT, SITW, and NIST 2008; and 120
speakers were selected from each database to yield 3,600 speech
utterances.
The third contribution is based on the use of the I-vector, four
combinations of I-vectors with 100 and 200 dimensions were employed.
Then, various fusion techniques using maximum, mean, weighted sum
and cumulative fusion with the same I-vector dimension were used to
improve the SIA. Similarly, both interleaving and concatenated I-vector
fusion were exploited to produce 200 and 400 I-vector dimensions. The
system was evaluated with four di erent databases using 120 speakers
from each database. TIMIT, SITW and NIST 2008 databases were
evaluated for various types of NSN namely, street-tra c NSN,
bus-interior NSN and crowd talking NSN; and the G.712 type handset
at 16 kHz was also applied.
As recommendations from the study in terms of the GMM-UBM
approach, mean fusion is found to yield overall best performance in terms
of the SIA with noisy speech, whereas linear weighted sum fusion is
overall best for original database recordings. However, in the I-vector
approach the best SIA was obtained from the weighted sum and the
concatenated fusion.Ministry of Higher Education
and Scienti c Research (MoHESR), and the Iraqi Cultural Attach e,
Al-Mustansiriya University, Al-Mustansiriya University College of
Engineering in Iraq for supporting my PhD scholarship
He\u27s Dark, Dark; Colorism Among African American Men
This study expands literature on colorism and the monolithic emphasis on the experiences of women by investigating black men’s experience with skin tone discrimination. The investigator seeks to interrogate how black males experience colorism by exploring how familial, peer associations, and media shape black males’ understanding of their skin-tone; by asking; what messages, if any, enforcing colorism ideals they receive; as well as the frequency of and adherence to such messages. The investigator utilized focus groups to gather data. Sample was limited to 10 self-identifying African-American black men age 18 and older. Focus group data is analyzed through an intersectional perspective, and thematic coding is utilized for analysis. Findings suggest light skinned and dark skinned men experience colorism differently. Light skinned men noted blatant colorism and often felt they had to authenticate their blackness. Darker skinned men reported more indirect colorism and negative stereotypes as prominent challenges with colorism
- …