376 research outputs found

    Detecting Rater Effects Using Many-Facet Rasch Models and Bootstrap Techniques

    Get PDF
    The quality of ratings provided by expert raters in evaluating language learners’ constructed responses in performance assessment is typically investigated by means of statistical modeling. Several rater effects, including severity/leniency, central tendency, and randomness, have been well documented in the psychometrics literature (Myford & Wolfe, 2003). This study applies the Many-Facets Rasch Models to detect these rater effects for an in-house speaking assessment for international teaching assistants (ITAs) in a US university. The goal of this study is to evaluate the extent to which the models, estimation procedures, and statistics/numerical indices that are adopted in this study would work as intended in this context. Two simulation studies are conducted where different model parameters are simulated from different distributions, and a parametric bootstrap procedure is applied to attest to the statistical properties (i.e., consistency, variability, and mean squared error) of the parameter estimates and fit statistics. Then, the model parameters are estimated from the actual data, and the estimates are compared using different estimation procedures (Joint Maximum Likelihood (JML) vs. Marginal Maximum Likelihood (MML)) and different computational implementations (R vs. Facets). The parametric bootstrap procedure is also applied to provide an estimate of the sampling distributions of the parameters and fit statistics through replications. Finally, the indices for rater effects detection are compared using both numerical summaries and plotting techniques. Results indicated that, when the model parameters and rater effects were simulated, the estimated severity parameters and the fit statistics were sensitive in detecting the intended effects. In comparison, MML estimation method showed certain superiority, in terms of statistical consistency and variability, over JML estimation method. But neither estimation method was free of bias. This was also true when the actual data were analyzed. Moreover, in terms of detecting the centrality or randomness effects in the actual data, evidence from the fit statistics could be used in conjunction with other indices from Facets and visualization techniques. However, the bootstrap results for the fit statistics indicated that, when the empirical distributions of the fit statistics were considered, disagreements between MML and JML were relatively large and the rule-of-thumb critical ranges of the fit statistic may be questionable

    Influence of mass unbalancing of three-cylinder engine on idle vibration based on powertrain model

    Get PDF
    This paper proposes a model of three-cylinder engine excitation which includes the characteristics of the mass unbalancing. Then, a powertrain model is established to analyze the response of powertrain mounts under different balancing strategy of the mass unbalancing during idle condition. Simulation is performed based on the powertrain model to demonstrate how different balancing strategies of three-cylinder engines influence the design of powertrain mounts

    Solutions of stationary Kirchhoff equations involving nonlocal operators with critical nonlinearity in RN

    Get PDF
    In this paper, we consider the existence and multiplicity of solutions for fractional Schrödinger equations with critical nonlinearity in RN. We use the fractional version of Lions' second concentration-compactness principle and concentration-compactness principle at infinity to prove that (PSc) condition holds locally. Under suitable assumptions, we prove that it has at least one solution and, for any m ∈ N, it has at least m pairs of solutions. Moreover, these solutions can converge to zero in some Sobolev space as Îµ â†’ 0

    Modeling statistics ITAs’ speaking performances in a certification test

    Get PDF
    In light of the ever-increasing capability of computer technology and advancement in speech and natural language processing techniques, automated speech scoring of constructed responses is gaining popularity in many high-stakes assessment and low-stakes educational settings. Automated scoring is a highly interdisciplinary and complex subject, and there is much unknown about the strengths and weaknesses of automated speech scoring systems (Evanini & Zechner, 2020). Research in automated speech scoring has been centralized around a few proprietary systems owned by large testing companies. Consequently, existing systems only serve large-scale standardized assessment purposes. Application of automated scoring technologies in local assessment contexts is much desired but rarely realized because the system’s inner workings have remained unfamiliar to many language assessment professionals. Moreover, assumptions about the reliability of human scores, on which automated scoring systems are trained, are untenable in many local assessment situations, where a myriad of factors would work together to co-determine the human scores. These factors may include the rating design, the test takers’ abilities, and the raters’ specific rating behaviors (e.g., severity/leniency, internal consistency, and application of the rating scale). In an attempt to apply automated scoring procedures to a local context, the primary purpose of this study is to develop and evaluate an appropriate automated speech scoring model for a local certification test of international teaching assistants (ITAs). To meet this goal, this study first implemented feature extraction and selection based on existing automated speech scoring technologies and the scoring rubric of the local speaking test. Then, the reliability of the human ratings was investigated based on both Classical Test Theory (CTT) and Item Response Theory (IRT) frameworks, focusing on detecting potential rater effects that could negatively impact the quality of the human scores. Finally, by experimenting and comparing a series of statistical modeling options, this study investigated the extent to which the association between the automatically extracted features and the human scores could be statistically modeled to offer a mechanism that reflects the multifaceted nature of the performance assessment in a unified statistical framework. The extensive search for the speech or linguistic features, covering the sub-domains of fluency, pronunciation, rhythm, vocabulary, grammar, content, and discourse cohesion, revealed that a small set of useful variables could be identified. A large number of features could be effectively summarized as single latent factors that showed reasonably high associations with the human scores. Reliability analysis of human scoring indicated that both inter-rater reliability and intra-rater reliability were acceptable, and through a fine-grained IRT analysis, several raters who were prone to the central tendency or randomness effects were identified. Model fit indices, model performance in prediction, and model diagnostics results in the statistical modeling indicated that the most appropriate approach to model the relationship between the features and the final human scores was a cumulative link model (CLM). In contrast, the most appropriate approach to model the relationship between the features and the ratings from the multiple raters was a cumulative link mixed model (CLMM). These models suggested that higher ability levels were significantly related to the lapse of time, faster speech with fewer disfluencies, more varied and sophisticated vocabulary, more complex syntactic structures, and fewer rater effects. Based on the model’s prediction on unseen data, the rating-level CLMM achieved an accuracy of 0.64, a Pearson correlation of 0.58, and a quadratically-weighted kappa of 0.57, as compared to the human ratings on the 3-point scale. Results from this study could be used to inform the development, design, and implementation for a prototypical automated scoring system for prospective ITAs, as well as providing empirical evidence for future scale development, rater training, and support for assessment-related instruction for the testing program and diagnostic feedback for the ITA test takers

    Binarizing Sparse Convolutional Networks for Efficient Point Cloud Analysis

    Full text link
    In this paper, we propose binary sparse convolutional networks called BSC-Net for efficient point cloud analysis. We empirically observe that sparse convolution operation causes larger quantization errors than standard convolution. However, conventional network quantization methods directly binarize the weights and activations in sparse convolution, resulting in performance drop due to the significant quantization loss. On the contrary, we search the optimal subset of convolution operation that activates the sparse convolution at various locations for quantization error alleviation, and the performance gap between real-valued and binary sparse convolutional networks is closed without complexity overhead. Specifically, we first present the shifted sparse convolution that fuses the information in the receptive field for the active sites that match the pre-defined positions. Then we employ the differentiable search strategies to discover the optimal opsitions for active site matching in the shifted sparse convolution, and the quantization errors are significantly alleviated for efficient point cloud analysis. For fair evaluation of the proposed method, we empirically select the recently advances that are beneficial for sparse convolution network binarization to construct a strong baseline. The experimental results on Scan-Net and NYU Depth v2 show that our BSC-Net achieves significant improvement upon our srtong baseline and outperforms the state-of-the-art network binarization methods by a remarkable margin without additional computation overhead for binarizing sparse convolutional networks.Comment: Accepted to CVPR202
    • …
    corecore