65 research outputs found
Opening the Black Box of wav2vec Feature Encoder
Self-supervised models, namely, wav2vec and its variants, have shown
promising results in various downstream tasks in the speech domain. However,
their inner workings are poorly understood, calling for in-depth analyses on
what the model learns. In this paper, we concentrate on the convolutional
feature encoder where its latent space is often speculated to represent
discrete acoustic units. To analyze the embedding space in a reductive manner,
we feed the synthesized audio signals, which is the summation of simple sine
waves. Through extensive experiments, we conclude that various information is
embedded inside the feature encoder representations: (1) fundamental frequency,
(2) formants, and (3) amplitude, packed with (4) sufficient temporal detail.
Further, the information incorporated inside the latent representations is
analogous to spectrograms but with a fundamental difference: latent
representations construct a metric space so that closer representations imply
acoustic similarity
Understanding Probe Behaviors through Variational Bounds of Mutual Information
With the success of self-supervised representations, researchers seek a
better understanding of the information encapsulated within a representation.
Among various interpretability methods, we focus on classification-based linear
probing. We aim to foster a solid understanding and provide guidelines for
linear probing by constructing a novel mathematical framework leveraging
information theory. First, we connect probing with the variational bounds of
mutual information (MI) to relax the probe design, equating linear probing with
fine-tuning. Then, we investigate empirical behaviors and practices of probing
through our mathematical framework. We analyze the layer-wise performance curve
being convex, which seemingly violates the data processing inequality. However,
we show that the intermediate representations can have the biggest MI estimate
because of the tradeoff between better separability and decreasing MI. We
further suggest that the margin of linearly separable representations can be a
criterion for measuring the "goodness of representation." We also compare
accuracy with MI as the measuring criteria. Finally, we empirically validate
our claims by observing the self-supervised speech models on retaining word and
phoneme information.Comment: Accepted to ICASSP 2024, implementation available at
https://github.com/juice500ml/information_probin
Teachers’ Perceptions of Competency-Based Curriculum Implementation, and Government Support: A Mixed Methods Study on Grade 1-5 Teachers in Homabay County, Kenya
Education reform is necessary as it allows a country to periodically review, revise, and evaluate its education systems and programs. Kenya recently adopted a competency-based education system, known as competency-Based Curriculum(CBC). This approach allows students work at their own pace to demonstrate mastery of the competencies required for their chosen field of study. However, previous studies on the implementation stages of CBC, particularly in elementary teacher preparedness, have indicated that teachers' knowledge of CBC is inadequate, they are ill-prepared, and thus they are unable to effectively teach and evaluate the new curriculum. Therefore, this study aims to investigate teacher perceptions, self-efficacy on digital technology use, and government support in the implementation of CBC, to identify the challenges teachers are facing and the support needed to effectively implement the curriculum. The study used a mixed-method convergent research design to answer the research questions. The participants were grade 1-5 teachers drawn from Homa Bay county. The study findings revealed that CBC teachers have conflicting views about CBC. Among all survey constructs, the government resource support had the highest mean, while the need for training on information and communication technology and the provision of digital technology materials to schools were mostly unfavorable. Survey respondents indicated moderate agreement with the relevant assertions. The study recommends the use of perception theory instead of self-efficacy theory to investigate teachers' opinions on the implementation of CBC. This approach can help create links between the identified issues and outcomes. Further research is necessary to examine how parents perceive the implementation of CBC and how their involvement can aid in learners' acquisition of the necessary competencies and skills. Keywords: Teachers’ perceptions, Competency-based curriculum, Government support, Digital technology DOI: 10.7176/JEP/14-9-09 Publication date:March 31st 202
Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning
Automatic assessment of dysarthric speech is essential for sustained
treatments and rehabilitation. However, obtaining atypical speech is
challenging, often leading to data scarcity issues. To tackle the problem, we
propose a novel automatic severity assessment method for dysarthric speech,
using the self-supervised model in conjunction with multi-task learning.
Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity level
classification and an auxilary automatic speech recognition (ASR). For the
baseline experiments, we employ hand-crafted features such as eGeMaps and
linguistic features, and SVM, MLP, and XGBoost classifiers. Explored on the
Korean dysarthric speech QoLT database, our model outperforms the traditional
baseline methods, with a relative percentage increase of 4.79% for
classification accuracy. In addition, the proposed model surpasses the model
trained without ASR head, achieving 10.09% relative percentage improvements.
Furthermore, we present how multi-task learning affects the severity
classification performance by analyzing the latent representations and
regularization effect
Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification
This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes
Uncertainty Quantification (UQ) for automatic speech intelligibility assessment
for dysarthric speech. Current GoP methods rely heavily on neural
network-driven overconfident predictions, which is unsuitable for assessing
dysarthric speech due to its significant acoustic differences from healthy
speech. To alleviate the problem, UQ techniques were used on GoP by 1)
normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin)
and 2) modifying the scoring function (scaling, prior normalization). As a
result, prior-normalized maxlogit GoP achieves the best performance, with a
relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for
English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is
conducted to identify which phoneme scores significantly correlate with
intelligibility scores in each language.Comment: Accepted to Interspeech 202
Entrepreneurship education for women through project-based flipped learning: The impact of innovativeness and risk-taking on course satisfaction
PURPOSE: The primary aim of this research is to explore the correlation between learners’ characteristics and the perceived value and satisfaction associated with Project-Based Flipped Learning (PBFL) methodologies. A secondary objective involves investigating how these PBFL methodologies can be employed to enhance the quality of entrepreneurship education for women. METHODOLOGY: During the first semester of 2018, a total of 80 students enrolled in the Communication Society class were engaged in a longitudinal study, involving bi-weekly online surveys prior to the semester’s conclusion. The survey instruments utilized Likert-scale measurements, with a 5-point scoring system. The data acquired was subsequently analyzed using structural equation modeling, which facilitated the examination of both the pre- and post-change scores and the structural properties of their relationships with overall course satisfaction. In terms of statistical evaluation, the study employed Generalized Structured Component Analysis (GSCA), a powerful component-based SEM technique, thus ensuring a robust and academically rigorous interpretation of the data. FINDINGS: Our research sought to understand the effects of learners’ characteristics, specifically innovativeness and risk-taking, on course satisfaction in Project-Based Flipped Learning (PBFL). We found that female learners’ innovativeness positively influenced their perception of the project’s entertainment and educational value, which in turn increased preference for PBFL and course satisfaction. Interestingly, risk-taking did not significantly influence perceived project value, which provides insights into the role of personality traits in learning outcomes. IMPLICATIONS: Our study invigorates entrepreneurship education theory by highlighting the key role of learner innovativeness in PBFL course satisfaction, urging a nuanced examination of personality traits in educational contexts. Further, we question the established importance of risk-taking, necessitating a critical reassessment in this domain. These pivotal theoretical contributions challenge prevailing assumptions, enrich scholarly discourse, and open new avenues for research. On the practical side, our findings emphasize the imperative of fostering innovativeness in women’s entrepreneurship education. These insights underscore the need for a strategically tailored, creative learning environment, with the potential to enhance learner engagement and satisfaction significantly. In sum, our research generates transformative theoretical insights and provides actionable strategies for improving the practice of entrepreneurship education. ORIGINALITY AND VALUE: Our research presents a novel approach to fostering women entrepreneurs in the media sector through PBFL. This unique focus on the intersection of gender, media entrepreneurship, and PBFL distinguishes our study from existing literature. Furthermore, our findings offer educators invaluable guidance for enhancing female entrepreneurship education, thereby enriching the pedagogical landscape of this domain
A Comparative Study on the Performance of GSCA and CSA in Parameter Recovery for Structural Equation Models With Ordinal Observed Variables
A simulation based comparative study was designed to compare two alternative approaches to structural equation modeling—generalized structured component analysis (GSCA) with the alternating least squares (ALS) estimator vs. covariance structure analysis (CSA) with the maximum likelihood (ML) estimator or the weighted least squares mean and variance adjusted (WLSMV) estimator—in terms of parameter recovery with ordinal observed variables. The simulated conditions included the number of response categories in observed variables, distribution of ordinal observed variables, sample size, and model misspecification. The simulation outcomes focused on average root mean square error (RMSE) and average relative bias (RB) in parameter estimates. The results indicated that, by and large, GSCA-ALS recovered structural path coefficients more accurately than CSA-ML and CSA-WLSMV in either a correctly or incorrectly specified model, regardless of the number of response categories, observed variable distribution, and sample size. In terms of loadings, CSA-WLSMV outperformed GSCA-ALS and CSA-ML in almost all conditions. Implications and limitations of the current findings are discussed, as well as suggestions for future research
Left-Dominant Temporal-Frontal Hypercoupling in Schizophrenia Patients With Hallucinations During Speech Perception
International audienceBackground: Task-based functional neuroimaging studies of schizophrenia have not yet replicated the increased coordinated hyperactivity in speech-related brain regions that is reported with symptom-capture and resting-state studies of hallucinations. This may be due to suboptimal selection of cognitive tasks. Methods: In the current study, we used a task that allowed experimental manipulation of control over verbal material and compared brain activity between 23 schizophrenia patients (10 hallucinators, 13 nonhallucinators), 22 psychiatric (bipolar), and 27 healthy controls. Two conditions were presented, one involving inner verbal thought (in which control over verbal material was required) and another involving speech perception (SP; in which control verbal material was not required). Results: A functional connectivity analysis resulted in a left-dominant temporal-frontal network that included speech-related auditory and motor regions and showed hypercoupling in past-week hallucinating schizophrenia patients (relative to nonhallucinating patients) during SP only. Conclusions: These findings replicate our previous work showing generalized speech-related functional network hypercoupling in schizophrenia during inner verbal thought and SP, but extend them by suggesting that hypercoupling is related to past-week hallucination severity scores during SP only, when control over verbal material is not required. This result opens the possibility that practicing control over inner verbal thought processes may decrease the likelihood or severity of hallucinations
Symptom dimensions of the psychotic symptom rating scales in psychosis: a multisite study
The Psychotic Symptom Rating Scales (PSYRATS) is an instrument designed to quantify the severity of delusions and hallucinations and is typically used in research studies and clinical settings focusing on people with psychosis and schizophrenia. It is comprised of the auditory hallucinations (AHS) and delusions subscales (DS), but these subscales do not necessarily reflect the psychological constructs causing intercorrelation between clusters of scale items. Identification of these constructs is important in some clinical and research contexts because item clustering may be caused by underlying etiological processes of interest. Previous attempts to identify these constructs have produced conflicting results. In this study, we compiled PSYRATS data from 12 sites in 7 countries, comprising 711 participants for AHS and 520 for DS. We compared previously proposed and novel models of underlying constructs using structural equation modeling. For the AHS, a novel 4-dimensional model provided the best fit, with latent variables labeled Distress (negative content, distress, and control), Frequency (frequency, duration, and disruption), Attribution (location and origin of voices), and Loudness (loudness item only). For the DS, a 2-dimensional solution was confirmed, with latent variables labeled Distress (amount/intensity) and Frequency (preoccupation, conviction, and disruption). The within-AHS and within-DS dimension intercorrelations were higher than those between subscales, with the exception of the AHS and DS Distress dimensions, which produced a correlation that approached the range of the within-scale correlations. Recommendations are provided for integrating these underlying constructs into research and clinical applications of the PSYRATS
The impact of family stressors, interparental conflict, and parenting behaviors on children\u27s overt and relational aggression: A focus on Korean families
The purpose of this study was to examine the processes through which family stressors such as negative stressful life events and daily hassles of family living influence aggressive behaviors of Korean school age children. Interparental conflict (overt conflict style) and parenting behaviors were conceptualized as playing an intervening role. Separate dimensions of parenting behaviors (psychological control, harsh discipline, and acceptance) and children\u27s aggressive behaviors (overt and relational aggression) are included in this study. The sample included 349 mothers and children aged 11-12 years old. Mother and child report models were analyzed separately. In the mother report model, interparental conflict was found to be a significant mediator. That is, higher levels of family stressors were associated with higher levels of interparental conflict, which in turn, were related to children\u27s overt and relational aggression. Parenting behaviors did not play an important role as a mediator in the relationship between interparental conflict and children\u27s aggression. Mothers in the present study seemed able to compartmentalize and keep their feelings and negativity arising out of interparental conflict from affecting their parenting behaviors. In the child report model, interparental conflict was directly and indirectly related to children\u27s overt and relational aggression through less effective parenting. Findings supported the spillover hypothesis that negative feelings of parents under conflictual marital relationships spillover into the mother-child relationship, and thus have an impact on children\u27s overt and relational aggression
- …