65 research outputs found

    Opening the Black Box of wav2vec Feature Encoder

    Full text link
    Self-supervised models, namely, wav2vec and its variants, have shown promising results in various downstream tasks in the speech domain. However, their inner workings are poorly understood, calling for in-depth analyses on what the model learns. In this paper, we concentrate on the convolutional feature encoder where its latent space is often speculated to represent discrete acoustic units. To analyze the embedding space in a reductive manner, we feed the synthesized audio signals, which is the summation of simple sine waves. Through extensive experiments, we conclude that various information is embedded inside the feature encoder representations: (1) fundamental frequency, (2) formants, and (3) amplitude, packed with (4) sufficient temporal detail. Further, the information incorporated inside the latent representations is analogous to spectrograms but with a fundamental difference: latent representations construct a metric space so that closer representations imply acoustic similarity

    Understanding Probe Behaviors through Variational Bounds of Mutual Information

    Full text link
    With the success of self-supervised representations, researchers seek a better understanding of the information encapsulated within a representation. Among various interpretability methods, we focus on classification-based linear probing. We aim to foster a solid understanding and provide guidelines for linear probing by constructing a novel mathematical framework leveraging information theory. First, we connect probing with the variational bounds of mutual information (MI) to relax the probe design, equating linear probing with fine-tuning. Then, we investigate empirical behaviors and practices of probing through our mathematical framework. We analyze the layer-wise performance curve being convex, which seemingly violates the data processing inequality. However, we show that the intermediate representations can have the biggest MI estimate because of the tradeoff between better separability and decreasing MI. We further suggest that the margin of linearly separable representations can be a criterion for measuring the "goodness of representation." We also compare accuracy with MI as the measuring criteria. Finally, we empirically validate our claims by observing the self-supervised speech models on retaining word and phoneme information.Comment: Accepted to ICASSP 2024, implementation available at https://github.com/juice500ml/information_probin

    Teachers’ Perceptions of Competency-Based Curriculum Implementation, and Government Support: A Mixed Methods Study on Grade 1-5 Teachers in Homabay County, Kenya

    Get PDF
    Education reform is necessary as it allows a country to periodically review, revise, and evaluate its education systems and programs. Kenya recently adopted a competency-based education system, known as competency-Based Curriculum(CBC). This approach allows students work at their own pace to demonstrate mastery of the competencies required for their chosen field of study. However, previous studies on the implementation stages of CBC, particularly in elementary teacher preparedness, have indicated that teachers' knowledge of CBC is inadequate, they are ill-prepared, and thus they are unable to effectively teach and evaluate the new curriculum. Therefore, this study aims to investigate teacher perceptions, self-efficacy on digital technology use, and government support in the implementation of CBC, to identify the challenges teachers are facing and the support needed to effectively implement the curriculum. The study used a mixed-method convergent research design to answer the research questions. The participants were grade 1-5 teachers drawn from Homa Bay county. The study findings revealed that CBC teachers have conflicting views about CBC. Among all survey constructs, the government resource support had the highest mean, while the need for training on information and communication technology and the provision of digital technology materials to schools were mostly unfavorable. Survey respondents indicated moderate agreement with the relevant assertions. The study recommends the use of perception theory instead of self-efficacy theory to investigate teachers' opinions on the implementation of CBC. This approach can help create links between the identified issues and outcomes. Further research is necessary to examine how parents perceive the implementation of CBC and how their involvement can aid in learners' acquisition of the necessary competencies and skills. Keywords: Teachers’ perceptions, Competency-based curriculum, Government support, Digital technology DOI: 10.7176/JEP/14-9-09 Publication date:March 31st 202

    Automatic Severity Assessment of Dysarthric speech by using Self-supervised Model with Multi-task Learning

    Full text link
    Automatic assessment of dysarthric speech is essential for sustained treatments and rehabilitation. However, obtaining atypical speech is challenging, often leading to data scarcity issues. To tackle the problem, we propose a novel automatic severity assessment method for dysarthric speech, using the self-supervised model in conjunction with multi-task learning. Wav2vec 2.0 XLS-R is jointly trained for two different tasks: severity level classification and an auxilary automatic speech recognition (ASR). For the baseline experiments, we employ hand-crafted features such as eGeMaps and linguistic features, and SVM, MLP, and XGBoost classifiers. Explored on the Korean dysarthric speech QoLT database, our model outperforms the traditional baseline methods, with a relative percentage increase of 4.79% for classification accuracy. In addition, the proposed model surpasses the model trained without ASR head, achieving 10.09% relative percentage improvements. Furthermore, we present how multi-task learning affects the severity classification performance by analyzing the latent representations and regularization effect

    Speech Intelligibility Assessment of Dysarthric Speech by using Goodness of Pronunciation with Uncertainty Quantification

    Full text link
    This paper proposes an improved Goodness of Pronunciation (GoP) that utilizes Uncertainty Quantification (UQ) for automatic speech intelligibility assessment for dysarthric speech. Current GoP methods rely heavily on neural network-driven overconfident predictions, which is unsuitable for assessing dysarthric speech due to its significant acoustic differences from healthy speech. To alleviate the problem, UQ techniques were used on GoP by 1) normalizing the phoneme prediction (entropy, margin, maxlogit, logit-margin) and 2) modifying the scoring function (scaling, prior normalization). As a result, prior-normalized maxlogit GoP achieves the best performance, with a relative increase of 5.66%, 3.91%, and 23.65% compared to the baseline GoP for English, Korean, and Tamil, respectively. Furthermore, phoneme analysis is conducted to identify which phoneme scores significantly correlate with intelligibility scores in each language.Comment: Accepted to Interspeech 202

    Entrepreneurship education for women through project-based flipped learning: The impact of innovativeness and risk-taking on course satisfaction

    Get PDF
    PURPOSE: The primary aim of this research is to explore the correlation between learners’ characteristics and the perceived value and satisfaction associated with Project-Based Flipped Learning (PBFL) methodologies. A secondary objective involves investigating how these PBFL methodologies can be employed to enhance the quality of entrepreneurship education for women. METHODOLOGY: During the first semester of 2018, a total of 80 students enrolled in the Communication Society class were engaged in a longitudinal study, involving bi-weekly online surveys prior to the semester’s conclusion. The survey instruments utilized Likert-scale measurements, with a 5-point scoring system. The data acquired was subsequently analyzed using structural equation modeling, which facilitated the examination of both the pre- and post-change scores and the structural properties of their relationships with overall course satisfaction. In terms of statistical evaluation, the study employed Generalized Structured Component Analysis (GSCA), a powerful component-based SEM technique, thus ensuring a robust and academically rigorous interpretation of the data. FINDINGS: Our research sought to understand the effects of learners’ characteristics, specifically innovativeness and risk-taking, on course satisfaction in Project-Based Flipped Learning (PBFL). We found that female learners’ innovativeness positively influenced their perception of the project’s entertainment and educational value, which in turn increased preference for PBFL and course satisfaction. Interestingly, risk-taking did not significantly influence perceived project value, which provides insights into the role of personality traits in learning outcomes. IMPLICATIONS: Our study invigorates entrepreneurship education theory by highlighting the key role of learner innovativeness in PBFL course satisfaction, urging a nuanced examination of personality traits in educational contexts. Further, we question the established importance of risk-taking, necessitating a critical reassessment in this domain. These pivotal theoretical contributions challenge prevailing assumptions, enrich scholarly discourse, and open new avenues for research. On the practical side, our findings emphasize the imperative of fostering innovativeness in women’s entrepreneurship education. These insights underscore the need for a strategically tailored, creative learning environment, with the potential to enhance learner engagement and satisfaction significantly. In sum, our research generates transformative theoretical insights and provides actionable strategies for improving the practice of entrepreneurship education. ORIGINALITY AND VALUE: Our research presents a novel approach to fostering women entrepreneurs in the media sector through PBFL. This unique focus on the intersection of gender, media entrepreneurship, and PBFL distinguishes our study from existing literature. Furthermore, our findings offer educators invaluable guidance for enhancing female entrepreneurship education, thereby enriching the pedagogical landscape of this domain

    A Comparative Study on the Performance of GSCA and CSA in Parameter Recovery for Structural Equation Models With Ordinal Observed Variables

    Get PDF
    A simulation based comparative study was designed to compare two alternative approaches to structural equation modeling—generalized structured component analysis (GSCA) with the alternating least squares (ALS) estimator vs. covariance structure analysis (CSA) with the maximum likelihood (ML) estimator or the weighted least squares mean and variance adjusted (WLSMV) estimator—in terms of parameter recovery with ordinal observed variables. The simulated conditions included the number of response categories in observed variables, distribution of ordinal observed variables, sample size, and model misspecification. The simulation outcomes focused on average root mean square error (RMSE) and average relative bias (RB) in parameter estimates. The results indicated that, by and large, GSCA-ALS recovered structural path coefficients more accurately than CSA-ML and CSA-WLSMV in either a correctly or incorrectly specified model, regardless of the number of response categories, observed variable distribution, and sample size. In terms of loadings, CSA-WLSMV outperformed GSCA-ALS and CSA-ML in almost all conditions. Implications and limitations of the current findings are discussed, as well as suggestions for future research

    Left-Dominant Temporal-Frontal Hypercoupling in Schizophrenia Patients With Hallucinations During Speech Perception

    Get PDF
    International audienceBackground: Task-based functional neuroimaging studies of schizophrenia have not yet replicated the increased coordinated hyperactivity in speech-related brain regions that is reported with symptom-capture and resting-state studies of hallucinations. This may be due to suboptimal selection of cognitive tasks. Methods: In the current study, we used a task that allowed experimental manipulation of control over verbal material and compared brain activity between 23 schizophrenia patients (10 hallucinators, 13 nonhallucinators), 22 psychiatric (bipolar), and 27 healthy controls. Two conditions were presented, one involving inner verbal thought (in which control over verbal material was required) and another involving speech perception (SP; in which control verbal material was not required). Results: A functional connectivity analysis resulted in a left-dominant temporal-frontal network that included speech-related auditory and motor regions and showed hypercoupling in past-week hallucinating schizophrenia patients (relative to nonhallucinating patients) during SP only. Conclusions: These findings replicate our previous work showing generalized speech-related functional network hypercoupling in schizophrenia during inner verbal thought and SP, but extend them by suggesting that hypercoupling is related to past-week hallucination severity scores during SP only, when control over verbal material is not required. This result opens the possibility that practicing control over inner verbal thought processes may decrease the likelihood or severity of hallucinations

    Symptom dimensions of the psychotic symptom rating scales in psychosis: a multisite study

    Full text link
    The Psychotic Symptom Rating Scales (PSYRATS) is an instrument designed to quantify the severity of delusions and hallucinations and is typically used in research studies and clinical settings focusing on people with psychosis and schizophrenia. It is comprised of the auditory hallucinations (AHS) and delusions subscales (DS), but these subscales do not necessarily reflect the psychological constructs causing intercorrelation between clusters of scale items. Identification of these constructs is important in some clinical and research contexts because item clustering may be caused by underlying etiological processes of interest. Previous attempts to identify these constructs have produced conflicting results. In this study, we compiled PSYRATS data from 12 sites in 7 countries, comprising 711 participants for AHS and 520 for DS. We compared previously proposed and novel models of underlying constructs using structural equation modeling. For the AHS, a novel 4-dimensional model provided the best fit, with latent variables labeled Distress (negative content, distress, and control), Frequency (frequency, duration, and disruption), Attribution (location and origin of voices), and Loudness (loudness item only). For the DS, a 2-dimensional solution was confirmed, with latent variables labeled Distress (amount/intensity) and Frequency (preoccupation, conviction, and disruption). The within-AHS and within-DS dimension intercorrelations were higher than those between subscales, with the exception of the AHS and DS Distress dimensions, which produced a correlation that approached the range of the within-scale correlations. Recommendations are provided for integrating these underlying constructs into research and clinical applications of the PSYRATS

    The impact of family stressors, interparental conflict, and parenting behaviors on children\u27s overt and relational aggression: A focus on Korean families

    No full text
    The purpose of this study was to examine the processes through which family stressors such as negative stressful life events and daily hassles of family living influence aggressive behaviors of Korean school age children. Interparental conflict (overt conflict style) and parenting behaviors were conceptualized as playing an intervening role. Separate dimensions of parenting behaviors (psychological control, harsh discipline, and acceptance) and children\u27s aggressive behaviors (overt and relational aggression) are included in this study. The sample included 349 mothers and children aged 11-12 years old. Mother and child report models were analyzed separately. In the mother report model, interparental conflict was found to be a significant mediator. That is, higher levels of family stressors were associated with higher levels of interparental conflict, which in turn, were related to children\u27s overt and relational aggression. Parenting behaviors did not play an important role as a mediator in the relationship between interparental conflict and children\u27s aggression. Mothers in the present study seemed able to compartmentalize and keep their feelings and negativity arising out of interparental conflict from affecting their parenting behaviors. In the child report model, interparental conflict was directly and indirectly related to children\u27s overt and relational aggression through less effective parenting. Findings supported the spillover hypothesis that negative feelings of parents under conflictual marital relationships spillover into the mother-child relationship, and thus have an impact on children\u27s overt and relational aggression
    • …
    corecore