    Genellenebilirlik Kuramı ve SPSS ile GENOVA Programlarıyla Hesaplanan G ve K Çalışmalarına İlişkin Sonuçların Karşılaştırılması

    In generalizability theory, it is possible to estimate one reliability value according to several different sources of measurement error and their interaction into one study. However, in classical test theory, different reliability values can be obtained for situations in more than one source of variance. Generalizability theory is an extension of the classical test theory which considers multiple sources of measurement error simultaneously. Analyses related to generalizability theory were generally done with GENOVA computer packet program. Because using the program is hard and complex to understand, it is a major problem in studies related to generalizability theory. Musquash and O’Connor (2006) described to use SPSS (also SAS and MATLAB) for conducting the analyses about generalizability theory. In this study, generalizability and dependability coefficients for both generalizability (G) study and decision (D) studies were presented by using both GENOVA and SPSS computer packet programs. This study also provides an illustrative example and comparison of the results on both programs.Genellenebilirlik kuramında, birden fazla kaynaktan meydana gelen hataların her birinin ve etkileşimlerinin büyüklüklerini aynı anda tek bir analizle kestirmek mümkündür. Genellenebilirlik kuramı klasik test kuramını da kapsayan, onun uzantısı olan bir kuram niteliği taşımaktadır. Genellenebilirlik kuramına ilişkin analizler genellikle GENOVA paket programıyla yapılmıştır. Ancak bu programın kullanımının zor ve karmaşık olması, genellenebilirlik çalışmalarının yapılmasındaki en büyük sınırlılık olarak araştırmacıların karşısına çıkmaktadır. Musquash ve O’Connor (2006), genellenebilirlik kuramına ilişkin tüm analizlerin yapılabileceği bir SPSS programı geliştirmişlerdir. Bu çalışmada, genellenebilirlik kuramına ve terminolojisine ilişkin genel bir bakış açısı oluşturulmaya çalışılmıştır. Ayrıca, genellenebilirlik kuramına bağlı genellenebilirlik (G) ve karar (K) çalışmalarında elde edilen genellenebilirlik ve güvenirlik katsayılarının yukarıda ifade edilen iki farklı paket programıyla elde edilen değerleri bir arada sunulmuştur

    The Use of Rasch Model in Likert Types Scales: An Application on the Fear of Negative Evaluation Scale-Student Form (FNE-SF)

    Bu araştırmada, Likert tipi ölçeklerin psikometrik özelliklerinin incelenmesinde ve bu ölçeklerden alınan puanlar üzerinden gerçekleştirilen farka dayalı istatistiklerde Rasch modelinin kullanımına örnek olabilecek bir çalışmanın alanyazına kazandırılması amaçlanmıştır. Araştırma, 367 ortaöğretim öğrencisinden oluşan bir çalışma grubu üzerinde yürütülmüştür. Araştırmanın verileri Olumsuz Değerlendirilme Korkusu Ölçeği-Öğrenci Formu (ODKÖ-ÖF) ile Akademik Beklentilere İlişkin Stres Envanteri (ABSE) kullanılarak toplanmıştır. Çalışma kapsamında toplanan veriler FACETS paket programından yararlanılarak Rasch modeline göre analiz edilmiştir. Araştırmadan elde edilen bulgular; ODKÖ-ÖF’ye ilişkin 16 maddeden oluşan tek boyutlu modelde öğrenciler ve maddelerin yüksek güvenirlikte birbirinden ayırt edildiğini, ölçekte kullanılan beşli derecelendirmenin etkin bir biçimde çalıştığını, uyum istatistiklerinin kabul edilebilir sınırlar içerisinde kaldığını ve gözlenen ile beklenen test karakteristik eğrilerinin büyük ölçüde örtüştüğünü göstermiştir. Çok yüzeyli Rasch analizi sonucunda, öğrencilerin sınıf ortamında öğretmenleri ya da arkadaşları tarafından olumsuz değerlendirileceklerine dair yaşadıkları korkunun ABSE puanlarına göre farklılaştığı saptanmış ve bu bulgu ODKÖ-ÖF’nin ölçüt geçerliğine yönelik bir kanıt olarak yorumlanmıştır.This study aims to introduce a new study into the literature that serves as an example of an investigation into psychometric characteristics of Likert-type scales, as well as an example of the utilization of the Rasch model for determining statistics based on the differences in the scores obtained by using these scales. The study was conducted on a study group who consisted of 367 secondary school students. Data were collected using Fear of Negative Evaluation Scale-Student Form (FNE-SF) and Academic Expectations Stress Inventory (AESI). The collected data were analyzed in accordance with the Rasch model using the FACETS packet program. The findings obtained in the study show that students and items could be distinguished from one another with a high reliability, the 5-point structure of the scale worked effectively, the fit indexes fell within acceptable limits and the observed and expected test characteristic curves were overlap to a considerable degree. As a result of many facet Rasch analysis it was determined that significant differentiation among students’ fears of negative evaluations according to their AESI scores and this finding was considered as evidence for criterion validity of FNE-SF

    An Analysis of Peer Assessment through Many Facet Rasch Model

    This study analyses peer assessment through many facet Rasch model (MFRM). The research was performed with 91 undergraduate students and with lecturer teaching the course. The research data were collected with holistic rubric employed by 6 peers and the lecturer in rating the projects prepared by 85 students taking the course. This study analyses raters, measurements for students who are rated, criteria used in rating and extent to which rubrics fulfil their function. Moreover, it also investigates effects of peers’ levels of achievement on the process. In consequence, it was found that raters differed in the levels of strictness and generosity in rating, and that students were distinguished adequately in terms of the property measured. Besides, a very high level of reliability value was estimated in relation to the criteria in the study.  This was interpreted as that they functioned in a reliable way in distinguishing between students’ performances. It was found in the analyses of achievement levels of peers taking part in peer assessment that ratings made by students with high levels of achievement differed significantly from those made by students with medium or low level of achievement. Finally, the views about peer assessment were generally positive. Keywords: peer assessment, many facet Rasch model, levels of peer achievement, rubri

    Study of test equating on the common item non-equivalent group designOrtak maddeli denk olmayan gruplar desenine ilişkin test eşitleme çalışması

    This research aims at testing the statistical equivalence of different forms of a test which are administered at the same time. For our purposes, an equating design with shared items was used for non-equivalent groups. Non-equivalent groups design with common items is used for problems that might arise in relation to the reliability and implementation of tests in which different forms are applied. The data set of the research was obtained from   responses   given by students participating in the PISA 2009 application within Turkey’s sample. The data collected from the 761 students of 15 age group who had answered the 3rd and 10th booklets of the science studies literacy test were analyzed through Tucker Linear equating, Levine linear equating, frequency prediction and Braun-Holland linear equating methods. The weighted mean error squares averages indices that were obtained through equating procedures were 0.046 for the Tucker- linear equating, 0.072 for the Levine- linear equating, 0.049 for frequency prediction, and 0.034 for the Braun-Holland linear equating. It was observed based on the WMSE coefficient that the Braun-Holland linear equating method was the most appropriate for the equating of booklets 3 and 10 in the PISA 2009 Science Studies sub-test ÖzetBu araştırmanın amacı aynı anda uygulanan bir teste ait farklı formların istatistiksel eşitliğini sınamaktır. Bu amaç için denk olmayan gruplar için ortak maddeli eşitleme deseni kullanılmıştır.  Ortak maddeli denk olmayan gruplarda ortak test deseni; farklı formların uygulandığı testlerin güvenliği ve uygulamasıyla ilgili meydana gelebilecek problemlerden dolayı kullanılmaktadır. Araştırmanın veri setini, PISA 2009 uygulamasına Türkiye örnekleminde katılmış olan öğrencilerin vermiş oldukları cevaplar oluşturmaktadırlar. Fen Bilimleri okuryazarlık testinin 3. ve 10. kitapçıklarını cevaplayan 15-yaş grubu 761 öğrenciden elde edilen veriler Tucker doğrusal eşitleme, Levine doğrusal eşitleme, frekans tahmin ve Braun-Holland doğrusal eşitleme yöntemlerine göre analiz edilmiştir. Eşitleme işlemleri sonucunda elde edilen ağırlıklandırılmış hata kareleri ortalaması indeksleri ise Tucker-Doğrusal Eşitleme için 0,046; Levine-Doğrusal Eşitleme için 0,072; Frekans Tahmin Eşit Yüzdelikli eşitleme için 0,049 ve Braun-Holland Doğrusal Eşitleme için ise 0,034 olarak bulunmuştur. Ağırlıklandırılmış hata kareleri ortalaması katsayılarına göre Braun-Holland Doğrusal Eşitleme yönteminin PISA 2009 Fen Bilimleri alttestindeki 3 ve 10 numaralı kitapçıkların eşitlenmesi için en uygun yöntem olduğu görülmektedir

    Bernstein Collocation Method for Solving Nonlinear Fredholm-Volterra Integrodifferential Equations in the Most General Form

    A collocation method based on the Bernstein polynomials defined on the interval [a,b] is developed for approximate solutions of the Fredholm-Volterra integrodifferential equation (FVIDE) in the most general form. This method is reduced to linear FVIDE via the collocation points and quasilinearization technique. Some numerical examples are also given to demonstrate the applicability, accuracy, and efficiency of the proposed method

    The Effect of Leachate on the Compacted and Consolidated Clay Soils

    Solid waste landfills constitute a potential major threat to groundwater quality. Water present in the waste, rainwater infiltration during and/or after the landfilling process and groundwater penetration can result in the generation of leachate. Leachate is a kind of waste liquid consisting of waste contaminants. Clay soils are natural matters to minimize the permeability of natural soil liners in landfill areas. Some contaminants in the leachate can alter compacted clay soils and cause increasing or decreasing permeability. This study investigates effects of leachate on the permeability of the compacted and consolidated clay soils, thereby evaluating the effectiveness of these clay soils as liners in preventing groundwater contamination. To determine removal capability of compacted and consolidated clay soils, some metal ions (Fe(II), Mn(II)) are also measured in influent and effluent of the lab-scale reactor. According to results of this study, Fe(II) and Mn(II) removal efficiency increases with time. Fe(OH)3 and MnO2 precipitations on the clay soil particles increase oxidation rate depending on the autocatalytic effect. Also, in the beginning, some decrease has been observed in the compacted and consolidated clay soils permeability associated with the contamination. However, as time goes by, these results show that leachates may cause an increase in the permeability

    A Comparison of the Logistic Regression and Contingency Table Methods for Simultaneous Detection of Uniform and Nonuniform DIF

    In this study, we investigate the logistic regression (LR), Mantel-Haenszel (MH), and Breslow-Day (BD) procedures for the simultaneous detection of both uniform and nonuniform differential item functioning (DIF). A simulation study was used to assess and compare the Type I error rate and power of a combined decision rule (CDR), which assesses DIF using a combination of the decisions made with BD and MH to those of LR. The results revealed that while the Type I error rate of CDR was consistently below the nominal alpha level, the Type I error rate of LR was high for the conditions having unequal ability distributions. In addition, the power of CDR was consistently higher than that of LR across all forms of DIF

    The Comparison of Reliability According to Generalizability Theory and Classical Test Theory on Random Data

    In this study, a framework was tried to be presented by focusing on the similarities and differences of generalizability theory (GT) and classical test theory (CTT). Although this example consists of 125 students, 18 items, and 4 raters, all data was obtained completely in a random way. This completely random data which was different than any data previously used in the research available was used to assess and compare the GT and CTT. The results reflects that the Cronbach's alpha and G-coefficients are very low and the same for a single facet design (s x i) and Phi-coefficient and (1) coefficient were obtained as .457 and.456, respectively for two facet design (s x i x r) which was not examined by CTT

    Studying Reliability of Open Ended Mathematics Items According to the Classical Test Theory and Generalizability Theory

    In this study, the Classical test theory and generalizability theory were used for determination to reliability of scores obtained from measurement tool of mathematics success. 24 open-ended mathematics question of the TIMSS-1999 was applied to 203 students in 2007-spring semester. Internal consistency of scores was found as 0.92. For determination of interrater consistency, Kendall's concordance coefficient was calculated as 0.52. Generalizability coefficient for mathematics scores was 0.92 and phi coefficient was 0.90. The variance component of raters accounted for 2.1% of the total variance. According to all results, it was seen that measurement tool of mathematics success was reliable for determination of students' mathematics success. Although there was a difference between means of four raters' scores, it was found that there was consistency of their scores