414 research outputs found

    Detecting Rater Effects Using Many-Facet Rasch Models and Bootstrap Techniques

    Get PDF
    The quality of ratings provided by expert raters in evaluating language learners’ constructed responses in performance assessment is typically investigated by means of statistical modeling. Several rater effects, including severity/leniency, central tendency, and randomness, have been well documented in the psychometrics literature (Myford & Wolfe, 2003). This study applies the Many-Facets Rasch Models to detect these rater effects for an in-house speaking assessment for international teaching assistants (ITAs) in a US university. The goal of this study is to evaluate the extent to which the models, estimation procedures, and statistics/numerical indices that are adopted in this study would work as intended in this context. Two simulation studies are conducted where different model parameters are simulated from different distributions, and a parametric bootstrap procedure is applied to attest to the statistical properties (i.e., consistency, variability, and mean squared error) of the parameter estimates and fit statistics. Then, the model parameters are estimated from the actual data, and the estimates are compared using different estimation procedures (Joint Maximum Likelihood (JML) vs. Marginal Maximum Likelihood (MML)) and different computational implementations (R vs. Facets). The parametric bootstrap procedure is also applied to provide an estimate of the sampling distributions of the parameters and fit statistics through replications. Finally, the indices for rater effects detection are compared using both numerical summaries and plotting techniques. Results indicated that, when the model parameters and rater effects were simulated, the estimated severity parameters and the fit statistics were sensitive in detecting the intended effects. In comparison, MML estimation method showed certain superiority, in terms of statistical consistency and variability, over JML estimation method. But neither estimation method was free of bias. This was also true when the actual data were analyzed. Moreover, in terms of detecting the centrality or randomness effects in the actual data, evidence from the fit statistics could be used in conjunction with other indices from Facets and visualization techniques. However, the bootstrap results for the fit statistics indicated that, when the empirical distributions of the fit statistics were considered, disagreements between MML and JML were relatively large and the rule-of-thumb critical ranges of the fit statistic may be questionable

    Influence of mass unbalancing of three-cylinder engine on idle vibration based on powertrain model

    Get PDF
    This paper proposes a model of three-cylinder engine excitation which includes the characteristics of the mass unbalancing. Then, a powertrain model is established to analyze the response of powertrain mounts under different balancing strategy of the mass unbalancing during idle condition. Simulation is performed based on the powertrain model to demonstrate how different balancing strategies of three-cylinder engines influence the design of powertrain mounts

    Talking Face Generation by Adversarially Disentangled Audio-Visual Representation

    Full text link
    Talking face generation aims to synthesize a sequence of face images that correspond to a clip of speech. This is a challenging task because face appearance variation and semantics of speech are coupled together in the subtle movements of the talking face regions. Existing works either construct specific face appearance model on specific subjects or model the transformation between lip motion and speech. In this work, we integrate both aspects and enable arbitrary-subject talking face generation by learning disentangled audio-visual representation. We find that the talking face sequence is actually a composition of both subject-related information and speech-related information. These two spaces are then explicitly disentangled through a novel associative-and-adversarial training process. This disentangled representation has an advantage where both audio and video can serve as inputs for generation. Extensive experiments show that the proposed approach generates realistic talking face sequences on arbitrary subjects with much clearer lip motion patterns than previous work. We also demonstrate the learned audio-visual representation is extremely useful for the tasks of automatic lip reading and audio-video retrieval.Comment: AAAI Conference on Artificial Intelligence (AAAI 2019) Oral Presentation. Code, models, and video results are available on our webpage: https://liuziwei7.github.io/projects/TalkingFace.htm

    Solutions of stationary Kirchhoff equations involving nonlocal operators with critical nonlinearity in RN

    Get PDF
    In this paper, we consider the existence and multiplicity of solutions for fractional Schrödinger equations with critical nonlinearity in RN. We use the fractional version of Lions' second concentration-compactness principle and concentration-compactness principle at infinity to prove that (PSc) condition holds locally. Under suitable assumptions, we prove that it has at least one solution and, for any m ∈ N, it has at least m pairs of solutions. Moreover, these solutions can converge to zero in some Sobolev space as Îµ â†’ 0

    Modeling statistics ITAs’ speaking performances in a certification test

    Get PDF
    In light of the ever-increasing capability of computer technology and advancement in speech and natural language processing techniques, automated speech scoring of constructed responses is gaining popularity in many high-stakes assessment and low-stakes educational settings. Automated scoring is a highly interdisciplinary and complex subject, and there is much unknown about the strengths and weaknesses of automated speech scoring systems (Evanini & Zechner, 2020). Research in automated speech scoring has been centralized around a few proprietary systems owned by large testing companies. Consequently, existing systems only serve large-scale standardized assessment purposes. Application of automated scoring technologies in local assessment contexts is much desired but rarely realized because the system’s inner workings have remained unfamiliar to many language assessment professionals. Moreover, assumptions about the reliability of human scores, on which automated scoring systems are trained, are untenable in many local assessment situations, where a myriad of factors would work together to co-determine the human scores. These factors may include the rating design, the test takers’ abilities, and the raters’ specific rating behaviors (e.g., severity/leniency, internal consistency, and application of the rating scale). In an attempt to apply automated scoring procedures to a local context, the primary purpose of this study is to develop and evaluate an appropriate automated speech scoring model for a local certification test of international teaching assistants (ITAs). To meet this goal, this study first implemented feature extraction and selection based on existing automated speech scoring technologies and the scoring rubric of the local speaking test. Then, the reliability of the human ratings was investigated based on both Classical Test Theory (CTT) and Item Response Theory (IRT) frameworks, focusing on detecting potential rater effects that could negatively impact the quality of the human scores. Finally, by experimenting and comparing a series of statistical modeling options, this study investigated the extent to which the association between the automatically extracted features and the human scores could be statistically modeled to offer a mechanism that reflects the multifaceted nature of the performance assessment in a unified statistical framework. The extensive search for the speech or linguistic features, covering the sub-domains of fluency, pronunciation, rhythm, vocabulary, grammar, content, and discourse cohesion, revealed that a small set of useful variables could be identified. A large number of features could be effectively summarized as single latent factors that showed reasonably high associations with the human scores. Reliability analysis of human scoring indicated that both inter-rater reliability and intra-rater reliability were acceptable, and through a fine-grained IRT analysis, several raters who were prone to the central tendency or randomness effects were identified. Model fit indices, model performance in prediction, and model diagnostics results in the statistical modeling indicated that the most appropriate approach to model the relationship between the features and the final human scores was a cumulative link model (CLM). In contrast, the most appropriate approach to model the relationship between the features and the ratings from the multiple raters was a cumulative link mixed model (CLMM). These models suggested that higher ability levels were significantly related to the lapse of time, faster speech with fewer disfluencies, more varied and sophisticated vocabulary, more complex syntactic structures, and fewer rater effects. Based on the model’s prediction on unseen data, the rating-level CLMM achieved an accuracy of 0.64, a Pearson correlation of 0.58, and a quadratically-weighted kappa of 0.57, as compared to the human ratings on the 3-point scale. Results from this study could be used to inform the development, design, and implementation for a prototypical automated scoring system for prospective ITAs, as well as providing empirical evidence for future scale development, rater training, and support for assessment-related instruction for the testing program and diagnostic feedback for the ITA test takers

    On the importance of heavy fields in pseudo-scalar inflation

    Full text link
    Pseudo-scalar inflation coupled with U(1) gauge fields through the Chern-Simons term has been extensively studied. However, new physics arising from UV theories may still influence the pseudo-scalar field at low-energy scales, potentially impacting predictions of inflation. In the realm of effective field theory (EFT), we investigated axion inflation, where operators from heavy fields are also present, in addition to the axion and gauge fields. The integrated out fields have two significant effects: the non-linear dispersion regime and coupling heavy modes to the Chern-Simons term. The first effect changes the propagation of the curvature fluctuation, while the second one results in additional operators that contribute to curvature fluctuation via inverse decay. We derived the power spectrum and magnitude of equilateral non-Gaussianity in this low-energy EFT. We found that the second effect could become significant as the mass of heavy fields approaches Hubble scale.Comment: 40 pages, 10 figures; Add Section 5; Publication versio

    Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis

    Full text link
    In this paper, we propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis. We empirically observe that sparse convolution operation causes larger quantization errors than standard convolution. However, conventional network quantization methods directly binarize the weights and activations in sparse convolution, resulting in performance drop due to the significant quantization loss. On the contrary, we search the optimal subset of convolution operation that activates the sparse convolution at various locations for quantization error alleviation, and the performance gap between real-valued and binary sparse convolutional networks is closed without complexity overhead. Specifically, we first present the shifted sparse convolution that fuses the information in the receptive field for the active sites that match the pre-defined positions. Then we employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution, and the quantization errors are significantly alleviated for efficient point cloud analysis. For fair evaluation of the proposed method, we empirically select the recently advances that are beneficial for sparse convolution network binarization to construct a strong baseline. The experimental results on Scan-Net and NYU Depth v2 show that our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods by a remarkable margin without additional computation overhead for binarizing sparse convolutional networks.Comment: Accepted to CVPR202
    • …
    corecore