34 research outputs found

    Antiinflammatory and Antioxidant Flavonoids and Phenols from Cardiospermum halicacabum (倒地鈴 Dào Dì Líng)

    Get PDF
    ABSTRACTSeventeen compounds, quercetin-3-O-ι-l-rhamnoside (1), kaempferol-3-O-ι-l-rhamnoside (2), apigenin-7-O-β-d-glucuronide (3), apigenin 7-O-β-d-glucuronide methyl ester (4), apigenin 7-O-β-d-glucuronide ethyl ester (5), chrysoeriol (6), apigenin (7), kaempferol (8), luteolin (9), quercetin (10), methyl 3,4-dihydroxybenzoate (11), p-coumaric acid (12), 4-hydroxybenzoic acid (13), hydroquinone (14), protocathehuic acid (15), gallic acid (16), and indole-3-carboxylic acid (17), were isolated from the ethanol extract of Taiwanese Cardiospermum halicabum. All chemical structures were determined by physical and extensive spectroscopic analyses such as 1H Nuclear Magnetic Resonance spectroscopy (NMR), 13C NMR, 1H-1H Correlation spectroscopy (1H-1H COSY), Heteronuclear Multiple Quantum Coherence spectroscopy (HMQC), Heteronuclear Multiple-bond Correlation spectroscopy (HMBC), and Nuclear Overhauser Effect spectroscopy (NOESY), as well as comparison with literature values. Furthermore, the High-Performance Liquid Chromatography-Photodiode Array Detector (HPLC-DAD) fingerprint profile was established for the determination of major constituents in the EtOAc extract and retention times of the isolated compounds. All isolated compounds were also evaluated for antiinflammatory and antioxidant activities

    When can Multidimensional Item Response Theory (MIRT) Models be a Solution for Differential Item Functioning (DIF)? A Monte Carlo Simulation Study

    No full text
    Thesis (Ph.D.)--University of Washington, 2015The present study was designed to examine whether multidimensional item response theory (MIRT) models might be useful in controlling for differential item functioning (DIF) when estimating primary ability, or whether traditional (and simpler) unidimensional item response theory (UIRT) models with DIF items removed are sufficient for accurately estimating primary ability. Researchers have argued that the leading cause of DIF is the inclusion of “multidimensional” test items. That is, tests thought to be unidimensional—one latent (unobserved) construct or trait per item measured—are actually measuring at least one other latent trait besides the one of interest. Additionally, most “problem” DIF is likely due to items measuring multiple traits that are noncompensatory in nature: to get an item correct, an examinee needs a sufficient amount of all relevant traits (one trait cannot compensate for another trait). However, few studies have conducted empirical research on MIRT models; of the few that have, none examined the use of MIRT models for the purpose of controlling for DIF, and none empirically compared the performance of compensatory and noncompensatory MIRT models. The present study contributes new information on the performance of these methodologies for multidimensional test items by addressing the following main research question: How accurately do UIRT and MIRT models calibrate the primary ability estimate (θ1) for focal and reference groups? The data in this simulation study were generated for a test with 40 items and 2,000 examinees, and assumed a 2-parameter logistic (2PL), 2-dimensional, noncompensatory case. Five conditions were manipulated, including: between-dimension correlation (0 and 0.3), reference-to-focal group size balance (1:1 and 9:1), primary dimension discrimination level (0.5 and 0.8), secondary dimension discrimination level (0.2 and 0.5), and percentage of DIF items (0%, 10%, 20%, and 30%; all DIF favored the reference group). Five model approaches were then applied for IRT calibration, with results saved and averaged for each condition: Approach 1 (UIRTd): UIRT, no items removed from analysis; Approach 2 (UIRTnds): UIRT, after removing DIF-detected items (using Mantel–Haenszel with standard criterion p-value ≤ 0.05); Approach 3 (UIRTndl): UIRT, after removing DIF-detected items (using Mantel–Haenszel with standard criterion p-value ≤ 0.10); Approach 4 (MIRTc): compensatory MIRT, no items removed from analysis; and Approach 5 (MIRTnc): noncompensatory MIRT, no items removed from analysis. The impact of these modeling approaches and manipulated conditions on the accuracy of primary ability estimates was the focus of the investigation. Accuracy was judged by bias, which was calculated using the typical definition of the mean difference across the 500 replications between the estimated θ ̂1 and the true primary θ1 used to generate the data. Analyses of variance (ANOVAs) on model-derived mean ability estimates were then used to identify main effects and simple interactions among modeling approaches and conditions. As was expected, for the focal group, the ANOVA results showed that the UIRTd model (no items removed from analysis) yielded the worst bias (the focal group’s primary ability was consistently underestimated and reference group’s primary ability was consistently over-estimated) compared to all other models. Using UIRTnds and UIRTndl models (DIF-detected items removed from analyses, one with the standard alpha level and the other with a liberal alpha level) led to the smallest bias, and use of the two types of MIRT models (MIRTc and MIRTnc) led to slightly more bias than the two UIRT models with DIF removed, but these differences were not significant (i.e., the only model that differed from the others was the UIRT model that completely ignored DIF). In other words, the simple model UIRT approach works as well as the complex MIRT approaches, but only for researchers willing to remove items with DIF prior to calibration; for those with limited item pools, the MIRT approach works just as well without removing DIF items

    Stability of Item Parameters in Equating Items

    No full text
    Thesis (Master's)--University of Washington, 2012The paper investigates the item factors that may cause item parameter instability. The primary concern of standardized tests is the accuracy of test score interpretation and the appropriateness of test score use across multiple tests. Equating is essential for any testing program. Equating refers to a statistical process used to adjust scores on test forms and establish comparability between alternate forms. Regarding the assumption of item parameter invariance, the statistical properties of the common items are stable across forms. Content and context effects on item parameter estimates appear most likely to violate the assumption will be discussed. Context effects, such as the item type, position adjacency to different kinds of items, wording or appearance and arrangement, as well as content effects, such as instructional and curricular emphasis, have been found to impact item parameter estimates. The data for this study came from the state-level Washington Assessment of Student Learning (WASL) tenth grade mathematics exams administered from 1999 to 2004. Item factors were labeled first. After labeling item characteristics, the process of test equating and the suspect item identifications was conducted. Two methods, Robust Z-statistics and the signed area between item characteristics curves, were used to detect items that demonstrate item parameter drift. The thesis presents the results of the analyses. Patterns regarding the features of unstable items are described and suggestions for future item development or for selection of anchor items are made

    The Effects of Collapsing Ordered Categorical Variables on Tests of Measurement Invariance

    No full text
    Cross-cultural comparisons of latent variable means demands equivalent loadings and intercepts or thresholds. Although equivalence generally emphasizes items as originally designed, researchers sometimes modify response options in categorical items. For example, substantive research interests drive decisions to reduce the number of item categories. Further, categorical multiple-group confirmatory factor analysis (MG-CFA) methods generally require that the number of indicator categories is equal across groups; however, categories with few observations in at least one group can cause challenges. In the current paper, we examine the impact of collapsing ordinal response categories in MG-CFA. An empirical analysis and a complementary simulation study suggests meaningful impacts on model fit due to collapsing categories. We also found reduced scale reliability, measured as a function of Fisher’s information. Our findings further illustrate artifactual fit improvement, pointing to the possibility of data dredging for improved model-data consistency in challenging invariance contexts with large numbers of groups

    lsasim: an R package for simulating large-scale assessment data

    No full text
    This article provides an overview of the R package lsasim, designed to facilitate the generation of data that mimics a large scale assessment context. The package features functions for simulating achievement data according to a number of common IRT models with known parameters. A clear advantage of lsasim over other simulation software is that the achievement data, in the form of item responses, can arise from multiple-matrix sampled test designs. Furthermore, lsasim offers the possibility of simulating data that adhere to general properties found in the background questionnaire (mostly ordinal, correlated variables that are also related to varying degrees with some latent trait). Although the background questionnaire data can be linked to the test responses, all aspects of lsasim can function independently, affording researchers a high degree of flexibility in terms of possible research questions and the part of an assessment that is of most interest

    Sensitivity of the RMSD for detecting item-level misfit in low-performing countries

    No full text
    Although the root-mean squared deviation (RMSD) is a popular statistical measure for evaluating country-specific item-level misfit (i.e., differential item functioning [DIF]) in international large-scale assessment, this paper shows that its sensitivity to detect misfit may depend strongly on the proficiency distribution of the considered countries. Specifically, items for which most respondents in a country have a very low (or high) probability of providing a correct answer will rarely be flagged by the RMSD as showing misfit, even if very strong DIF is present. With many international large-scale assessment initiatives moving toward covering a more heterogeneous group of countries, this raises issues for the ability of the RMSD to detect item-level misfit, especially in low-performing countries that are not well-aligned with the overall difficulty level of the test. This may put one at risk of incorrectly assuming measurement invariance to hold, and may also inflate estimated between-country difference in proficiency. The degree to which the RMSD is able to detect DIF in low-performing countries is studied using both an empirical example from PISA 2015 and a simulation study

    lsasim: an R package for simulating large-scale assessment data

    No full text
    This article provides an overview of the R package lsasim, designed to facilitate the generation of data that mimics a large scale assessment context. The package features functions for simulating achievement data according to a number of common IRT models with known parameters. A clear advantage of lsasim over other simulation software is that the achievement data, in the form of item responses, can arise from multiple-matrix sampled test designs. Furthermore, lsasim offers the possibility of simulating data that adhere to general properties found in the background questionnaire (mostly ordinal, correlated variables that are also related to varying degrees with some latent trait). Although the background questionnaire data can be linked to the test responses, all aspects of lsasim can function independently, affording researchers a high degree of flexibility in terms of possible research questions and the part of an assessment that is of most interest

    Checking equity: Why differential item functioning analysis should be a routine part of developing conceptual assessments

    No full text
    We provide a tutorial on differential item functioning (DIF) analysis, an analytic method useful for identifying potentially biased items in assessments. After explaining a number of methodological approaches, we test for gender bias in two scenarios that demonstrate why DIF analysis is crucial for developing assessments, particularly because simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness. First, a significant difference between groups on total scores can exist even when items are not biased, as we illustrate with data collected during the validation of the Homeostasis Concept Inventory. Second, item bias can exist even when the two groups have exactly the same distribution of total scores, as we illustrate with a simulated data set. We also present a brief overview of how DIF analysis has been used in the biology education literature to illustrate the way DIF items need to be reevaluated by content experts to determine whether they should be revised or removed from the assessment. Finally, we conclude by arguing that DIF analysis should be used routinely to evaluate items in developing conceptual assessments. These steps will ensure more equitable—and therefore more valid—scores from conceptual assessments

    Changes in Digital Learning During a Pandemic : Findings From the ICILS Teacher Panel

    No full text
    Findings from the ICILS Teacher Panel 2020 study.nonPeerReviewe
    corecore