521 research outputs found

    Examination of Parameter Estimation Using Recursive Bayesian Analysis in Simulated Item Response Theory Applications

    Get PDF
    Examination of Parameter Estimation Using Recursive Bayesian Analysis in Simulated Item Response Theory Applications by Robert Hendrick For the past several years, high-stakes testing has been the predominant indicator used to assess students\u27 academic ability. School systems, teachers, parents, and students are dependent upon the accuracy of academic ability estimates designated, θs, by item response theory (IRT) computer programs. In this study, the accuracy of 3 parameter logistic (3PL) IRT estimates of academic ability were obtained from the BILOG-MG and WinBUGS computer programs which were employed to compare the use of non-informative and informative priors in θ estimation. The rationale for comparing the output of these two computer programs is that the underlying statistical theory employed in these two computer programs is different, and there may be a notable difference in the accuracy of θ estimation when an informative prior is used by WinBUGS in analyzing skewed populations. In particular, the θ parameter estimates of BILOG-MG using traditional IRT analysis with non-informative priors in each situation and the θ parameter estimates of WinBUGS using Recursive Bayesian Analysis (RBA) with informative priors are compared to the true simulated θ value using Root Mean Square Errors (RMSEs). To make this comparison, Monte Carlo computer simulation is used across three occasions within three conditions giving nine comparison situations. For the priors and data generated, results show similar θ estimation accuracy for a normally distributed latent trait (RMSE = 0.35), a more accurate θ estimation process using RBA compared to traditional analysis (RMSEs of 0.36 compared to 0.76) when using latent trait distributions skewed in a similar direction, and less accurate θ estimation using RBA compared to traditional analysis (RMSEs of 1.48 compared to 0.80) when using extremely skewed negative then positive distributions in a longitudinal setting. Implications for further research include extensions to other IRT models, developing prior elicitation equations, and applying Bayesian informative prior elicitation in BILOG-MG

    The analysis of Iran universities’ 2003-2004 entrance examination to detect biased items

    Get PDF
    Item bias or differential item function (DIF) refers to the situation in which the probability of correct responses to an item for examinees with equal ability measured by test but belong to different groups are not equal. The existence of bias in items decreases the validity of the test. In this study the range of item difficulty among surveyed groups, has been used as a method for detecting the item bias in Persian literature subtest as part of the Entrance Examination to Universities of Iran in 2003-2004. For this purpose, report cards of 5000 (each group of 1000 examinees) participants in this examination from three provinces i.e. Yazd, Azerbaijan Sharghi and Kurdistan as sample groups were analyzed using the computerized program, BILOG-MG. Out of 25, two items, numbers 9 and 10 showed bias between gender groups and both were in favour of female group and were identified as biased items. Of this number, four items numbers 2, 7, 9, and 12 showed bias among linguistic groups

    Examining the Two Categorical Datas by Jmetrik, Bilog-Mg and Irtpro with Application of Mathematics Exam

    Get PDF
    The aim of this study was to examination of two-category rated mathematics course final exam based on Item Response Theory data analyzed with the help of 2-Parameter Logistic Model and determination of the ability and standard errors with the help of different programs. This study involves a comparative interpretation of some descriptive statistics and analysis. Therefore, research has characterized as relational model which is one of the general survey models. For this purpose, 771 students’ final achievement test responses to a 20-point final exam, were analyzed by BILOG, IRT PRO and JMETRİK programs. Item Response Theory assumptions were analyzed with SPSS and Factor 9.3 programs. Working as a result of the analysis of data all of the IRT assumptions are met and the most appropriate model of data set has been concluded that the twoparameter logistic model. The study also found that there is a statistically significant relationship between the estimated parameters related to individual ability and error at the level of .01. Especially compared to the others there is also significant relationship between JMETRİK and IRT PRO. Different models and methods of research proposals have been made in terms of response patterns to be analyzed a gain for the same data set

    Assessment of Item Parameter Drift of Known Items in a University Placement Exam

    Get PDF
    abstract: ABSTRACT This study investigated the possibility of item parameter drift (IPD) in a calculus placement examination administered to approximately 3,000 students at a large university in the United States. A single form of the exam was administered continuously for a period of two years, possibly allowing later examinees to have prior knowledge of specific items on the exam. An analysis of IPD was conducted to explore evidence of possible item exposure. Two assumptions concerning items exposure were made: 1) item recall and item exposure are positively correlated, and 2) item exposure results in the items becoming easier over time. Special consideration was given to two contextual item characteristics: 1) item location within the test, specifically items at the beginning and end of the exam, and 2) the use of an associated diagram. The hypotheses stated that these item characteristics would make the items easier to recall and, therefore, more likely to be exposed, resulting in item drift. BILOG-MG 3 was used to calibrate the items and assess for IPD. No evidence was found to support the hypotheses that the items located at the beginning of the test or with an associated diagram drifted as a result of item exposure. Three items among the last ten on the exam drifted significantly and became easier, consistent with item exposure. However, in this study, the possible effects of item exposure could not be separated from the effects of other potential factors such as speededness, curriculum changes, better test preparation on the part of subsequent examinees, or guessing.Dissertation/ThesisM.A. Educational Psychology 201

    Vertical equating of curriculum-based tests in reading, mathematics, and language arts

    Full text link
    Vertical equating of a large urban school district\u27s curriculum-based achievement tests for students in grades one through five was performed by using the BILOG-MG software package produced by Zimowski, Muraki, Mislevy, and Bock. This software extends the application of the item response theory (IRT) approach to test analysis to testing situations involving multiple groups BILOG-MG, item parameter estimates were derived using a marginal maximum a posteriori approach in combination with an expectation-maximization algorithm (EM). Estimates of examinee ability were produced using the Bayesian, or expected a posteriori, estimate with a normal prior distribution; Linkage of successive test forms was accomplished by administering a common set of items to students at two successive grade levels, thereby providing estimation of the relative mean of each group of respondents, based on their responses both to the common items and to their own grade level items. Using aggregated data, a raw score to scale score conversion was created that placed each possible raw score on a grade-level test onto a continuum that extended across the grades

    Analisis Hasil Uji Kompetensi Pelajaran Bahasa Inggris (Test of English Proficiency) dengan Model Logistik 1 Parameter (1 PL), 2 Parameter (2 PL) Dan 3 Parameter (3 PL)

    Get PDF
    Penelitian ini bertujuan untuk menganalisis hasil Test of English Proficiency (TOEP) dengan menggunakan 1 Parameter, 2 Parameter dan 3 Parameter Logistik (PL). Penelitian ini diawali dengan pengumpulan hasil ujicoba tes TOEP dari seluruh peserta tes ini di Indonesia dengan menggunakan dokumentasi. Data yang telah terkumpul kemudian akan disusun untuk kemudian dianalisis dengan software Komputer Bilog MG. Program komputer Bilog MG digunakan dalam penelitian ini untuk memperoleh data output yang memberikan informasi untuk mengetahui estimasi tingkat kemampuan peserta (measure of difficulty), kesalahan pengukuran (standard error of measurement), kecocokan data dengan model (infit dan outfit), serta korelasi daya beda butir soal (point b isse rial). Program ini memberikan informasi yang cukup untuk menganalisis butir soal dengan JRT• (Teori Respon Butir). Hasil penelitian menunjukkan bahwa Model Logistik 3 Parameter (3PL) merupakan model yang paling sesuai untuk memberikan informasi mengenai estimasi tingkat kemampuan peserta (measure of difficulty), kesalahan pengukuran (standard error of measurement), kecocokan data dengan model (infit dan outfit), serta korelasi daya beda butir soal (point bisserial) karena model ini mampu menunjukkan lebih banyak butir soal yang menunjukkan informasi tersebut

    Karakteristik Kemampuan Siswa Melalui Ujian Sekolah Mata Pelajaran Kimia Pada SMA Di Kecamatan Teluk Ambon Baguala Menggunakan Classical Test Theory (CTT) Dan Item Respons Theory (IRT) Model Rasch

    Get PDF
    ABSTRAK Penelitian ini bertujuan untuk mengetahui Karakteristik Kemampuan Siswa Melalui Ujian Sekolah Mata Pelajaran Kimia Pada SMA Di Kecamatan Teluk Ambon Baguala Menggunakan Classical Test Theory (CTT) Dan Item Respons Theory (IRT) Model Rasch. Subjek penelitian adalah seluruh lembar jawaban peserta Ujian Sekolah Mata Pelajaran Kimia. Penelitian ini merupakan penelitian kuantitatif menggunakan pendekatan  ex-post facto. Hasil analisis dengan pendekatan teori tes klasik menunjukkan 80% butir memiliki tingkat kesulitan butir berfungsi baik, 100% butir daya bedanya belum memenuhi syarat, dan 100% butir memiliki pengecoh berfungsi baik dengan indeks reliabilitas tes 0,897. Analisis dengan pendekatan teori respons butir Model Rasch menunjukkan terdapat 17 (85%) butir soal cocok (fit) dengan model, fungsi informasi maksimum 4,12 pada    θ = -0,41, dan besarnya kesalahan pengukuran (SEM) 0,43.   Keyword: Ujian Sekolah, Classical Test Theory, Item Respons Theory, Rasch. &nbsp

    Estimação da Usabilidade de Sites e-commerce Pelo Método da Máxima Verossimilhança

    Get PDF
    This article investigates the performance of Maximum Likelihood (ML)method to estimate the usability of e-commerce sites and compares theperformance between the software Bilog-MG ® and Excel ® in theestimation of these usability. For this, we used real data from a study onthe degree of usability of 361 e-commerce sites, which were applied 32items calibrated by the unidimensional logistic model of two parameters(MLU2) of Item Response Theory (IRT). The estimation process by MLwas developed in BILOG® and Excel® softwares. The results showedthat the ML method is flawed when there is a constant pattern of response,which may occur during application of the first items on the questionnaire.However, the method performs well when the pattern of responses is notconstant. Moreover, process performance prepared in Excel® was betterthan in conventional software BILOG-MG®. The parameters of the itemsalso influence the estimation of ML.O presente artigo investiga o desempenho do método da Máxima Verossimilhança (MV) na estimação da usabilidade de sites e-commerce e compara o desempenho entre os Softwares BILOG-MG® e Excel® na estimação dessas usabilidades. Para isso, foram utilizados dados reais de um estudo sobre o grau de usabilidade de 361 sites de e-commerce, no qual foram aplicados 32 itens calibrados por meio do modelo logístico unidimensional de dois parâmetros (MLU2) da Teoria da Resposta ao Item (TRI). O processo de estimação da usabilidade por MV foi feito nos softwares BILOG-MG® e Excel®. Os resultados mostraram que o método de MV apresenta deficiências quando existe um padrão de resposta constante, o que pode ocorrer durante a aplicação dos primeiros itens do questionário. Entretanto, o método apresenta um bom desempenho quando o padrão de respostas não é constante. Além disso, o desempenho do processo elaborado no Excel® foi melhor do que no software convencional BILOG-MG®. Os parâmetros dos itens também influenciam a estimação por MV
    • …