1,407 research outputs found
Method versatility in analysing human attitudes towards technology
Various research domains are facing new challenges brought about by growing volumes of data. To make optimal use of them, and to increase the reproducibility of research findings, method versatility is required. Method versatility is the ability to flexibly apply widely varying data analytic methods depending on the study goal and the dataset characteristics.
Method versatility is an essential characteristic of data science, but in other areas of research, such as educational science or psychology, its importance is yet to be fully accepted. Versatile methods can enrich the repertoire of specialists who validate psychometric instruments, conduct data analysis of large-scale educational surveys, and communicate their findings to the academic community, which corresponds to three stages of the research cycle: measurement, research per se, and communication. In this thesis, studies related to these stages have a common theme of human attitudes towards technology, as this topic becomes vitally important in our age of ever-increasing digitization.
The thesis is based on four studies, in which method versatility is introduced in four different ways: the consecutive use of methods, the toolbox choice, the simultaneous use, and the range extension. In the first study, different methods of psychometric analysis are used consecutively to reassess psychometric properties of a recently developed scale measuring affinity for technology interaction. In the second, the random forest algorithm and hierarchical linear modeling, as tools from machine learning and statistical toolboxes, are applied to data analysis of a large-scale educational survey related to students’ attitudes to information and communication technology. In the third, the challenge of selecting the number of clusters in model-based clustering is addressed by the simultaneous use of model fit, cluster separation, and the stability of partition criteria, so that generalizable separable clusters can be selected in the data related to teachers’ attitudes towards technology. The fourth reports the development and evaluation of a scholarly knowledge graph-powered dashboard aimed at extending the range of scholarly communication means.
The findings of the thesis can be helpful for increasing method versatility in various research areas. They can also facilitate methodological advancement of academic training in data analysis and aid further development of scholarly communication in accordance with open science principles.Verschiedene Forschungsbereiche müssen sich durch steigende Datenmengen neuen Herausforderungen stellen. Der Umgang damit erfordert – auch in Hinblick auf die Reproduzierbarkeit von Forschungsergebnissen – Methodenvielfalt. Methodenvielfalt ist die Fähigkeit umfangreiche Analysemethoden unter Berücksichtigung von angestrebten Studienzielen und gegebenen Eigenschaften der Datensätze flexible anzuwenden.
Methodenvielfalt ist ein essentieller Bestandteil der Datenwissenschaft, der aber in seinem Umfang in verschiedenen Forschungsbereichen wie z. B. den Bildungswissenschaften oder der Psychologie noch nicht erfasst wird. Methodenvielfalt erweitert die Fachkenntnisse von Wissenschaftlern, die psychometrische Instrumente validieren, Datenanalysen von groß angelegten Umfragen im Bildungsbereich durchführen und ihre Ergebnisse im akademischen Kontext präsentieren. Das entspricht den drei Phasen eines Forschungszyklus: Messung, Forschung per se und Kommunikation. In dieser Doktorarbeit werden Studien, die sich auf diese Phasen konzentrieren, durch das gemeinsame Thema der Einstellung zu Technologien verbunden. Dieses Thema ist im Zeitalter zunehmender Digitalisierung von entscheidender Bedeutung.
Die Doktorarbeit basiert auf vier Studien, die Methodenvielfalt auf vier verschiedenen Arten vorstellt: die konsekutive Anwendung von Methoden, die Toolbox-Auswahl, die simultane Anwendung von Methoden sowie die Erweiterung der Bandbreite. In der ersten Studie werden verschiedene psychometrische Analysemethoden konsekutiv angewandt, um die psychometrischen Eigenschaften einer entwickelten Skala zur Messung der Affinität von Interaktion mit Technologien zu überprüfen. In der zweiten Studie werden der Random-Forest-Algorithmus und die hierarchische lineare Modellierung als Methoden des Machine Learnings und der Statistik zur Datenanalyse einer groß angelegten Umfrage über die Einstellung von Schülern zur Informations- und Kommunikationstechnologie herangezogen. In der dritten Studie wird die Auswahl der Anzahl von Clustern im modellbasierten Clustering bei gleichzeitiger Verwendung von Kriterien für die Modellanpassung, der Clustertrennung und der Stabilität beleuchtet, so dass generalisierbare trennbare Cluster in den Daten zu den Einstellungen von Lehrern zu Technologien ausgewählt werden können. Die vierte Studie berichtet über die Entwicklung und Evaluierung eines wissenschaftlichen wissensgraphbasierten Dashboards, das die Bandbreite wissenschaftlicher Kommunikationsmittel erweitert.
Die Ergebnisse der Doktorarbeit tragen dazu bei, die Anwendung von vielfältigen Methoden in verschiedenen Forschungsbereichen zu erhöhen. Außerdem fördern sie die methodische Ausbildung in der Datenanalyse und unterstützen die Weiterentwicklung der wissenschaftlichen Kommunikation im Rahmen von Open Science
ピアアセスメントのための項目反応理論と整数計画法を用いたグループ構成最適化
In recent years, large-scale e-learning environments such as Massive Online Open Courses (MOOCs) have become increasingly popular. In such environments, peer assessment, which is mutual assessment among learners, has been used to evaluate reports and programming assignments. When the number of learners increases as in MOOCs, peer assessment is often conducted by dividing learners into multiple groups to reduce the learners’ assessment workload. In this case, however, the accuracy of peer assessment depends on the way to form groups. To solve the problem, this study proposes a group optimization method based on item response theory (IRT) and integer programming. The proposed group optimization method is formulated as an integer programming problem that maximizes the Fisher information, which is a widely used index of ability assessment accuracy in IRT. Experimental results, however, show that the proposed method cannot sufficiently improve the accuracy compared to the random group formulation. To overcome this limitation, this study introduces the concept of external raters and proposes an external rater selection method that assigns a few appropriate external raters to each learner after the groups were formed using the proposed group optimization method. In this study, an external rater is defined as a peer-rater who belongs to different groups. The proposed external rater selection method is formulated as an integer programming problem that maximizes the lower bound of the Fisher information of the estimated ability of the learners by the external raters. Experimental results using both simulated and real-world peer assessment data show that the introduction of external raters is useful to improve the accuracy sufficiently. The result also demonstrates that the proposed external rater selection method based on IRT models can significantly improve the accuracy of ability assessment than the random selection.近年,MOOCsなどの大規模型eラーニングが普及してきた.大規模な数の学習者が参加している場合には,教師が一人で学習者のレポートやプログラム課題などを評価することは難しい.大規模の学習者の評価手法の一つとして,学習者同士によるピアアセスメントが注目されている.MOOCsのように学習者数が多い場合のピアアセスメントは,評価の負担を軽減するために学習者を複数のグループに分割してグループ内のメンバ同士で行うことが多い.しかし,この場合,グループ構成の仕方によって評価結果が大きく変化してしまう問題がある.この問題を解決するために,本研究では,項目反応理論と整数計画法を用いて,グループで行うピアアセスメントの精度を最適化するグループ構成手法を提案する.具体的には,項目反応理論において学習者の能力測定精度を表すフィッシャー情報量を最大化する整数計画問題としてグループ構成問題を定式化する.実験の結果,ランダムグループ構成と比べて,提案手法はおおむね測定精度を改善したが,それは限定的な結果であることが明らかとなった.そこで,本研究ではさらに,異なるグループから数名の学習者を外部評価者として各学習者に割り当て外部評価者選択手法を提案する.シミュレーションと実データ実験により,提案手法を用いることで能力測定精度を大幅に改善できることを示す.電気通信大学201
Explanation-by-Example Based on Item Response Theory
Intelligent systems that use Machine Learning classification algorithms are
increasingly common in everyday society. However, many systems use black-box
models that do not have characteristics that allow for self-explanation of
their predictions. This situation leads researchers in the field and society to
the following question: How can I trust the prediction of a model I cannot
understand? In this sense, XAI emerges as a field of AI that aims to create
techniques capable of explaining the decisions of the classifier to the
end-user. As a result, several techniques have emerged, such as
Explanation-by-Example, which has a few initiatives consolidated by the
community currently working with XAI. This research explores the Item Response
Theory (IRT) as a tool to explaining the models and measuring the level of
reliability of the Explanation-by-Example approach. To this end, four datasets
with different levels of complexity were used, and the Random Forest model was
used as a hypothesis test. From the test set, 83.8% of the errors are from
instances in which the IRT points out the model as unreliable.Comment: 15 pages, 5 figures, 3 tables, submitted for the BRACIS'22 conferenc
Theoretical and Practical Advances in Computer-based Educational Measurement
This open access book presents a large number of innovations in the world of operational testing. It brings together different but related areas and provides insight in their possibilities, their advantages and drawbacks. The book not only addresses improvements in the quality of educational measurement, innovations in (inter)national large scale assessments, but also several advances in psychometrics and improvements in computerized adaptive testing, and it also offers examples on the impact of new technology in assessment. Due to its nature, the book will appeal to a broad audience within the educational measurement community. It contributes to both theoretical knowledge and also pays attention to practical implementation of innovations in testing technology
New measurement paradigms
This collection of New Measurement Paradigms papers represents a snapshot of the variety of measurement methods in use at the time of writing across several projects funded by the National Science Foundation (US) through its REESE and DR K–12 programs. All of the projects are developing and testing intelligent learning environments that seek to carefully measure and promote student learning, and the purpose of this collection of papers is to describe and illustrate the use of several measurement methods employed to achieve this. The papers are deliberately short because they are designed to introduce the methods in use and not to be a textbook chapter on each method.
The New Measurement Paradigms collection is designed to serve as a reference point for researchers who are working in projects that are creating e-learning environments in which there is a need to make judgments about students’ levels of knowledge and skills, or for those interested in this but who have not yet delved into these methods
Facilitating Variable-Length Computerized Classification Testing Via Automatic Racing Calibration Heuristics
Thesis (Ph.D.) - Indiana University, School of Education, 2015Computer Adaptive Tests (CATs) have been used successfully with standardized tests. However, CATs are rarely practical for assessment in instructional contexts, because large numbers of examinees are required a priori to calibrate items using item response theory (IRT). Computerized Classification Tests (CCTs) provide a practical alternative to IRT-based CATs. CCTs show promise for instructional contexts, since many fewer examinees are required for item parameter estimation. However, there is a paucity of clear guidelines indicating when items are sufficiently calibrated in CCTs.
Is there an efficient and accurate CCT algorithm which can estimate item parameters adaptively? Automatic Racing Calibration Heuristics (ARCH) was invented as a new CCT method and was empirically evaluated in two studies.
Monte Carlo simulations were run on previous administrations of a computer literacy test, consisting of 85 items answered by 104 examinees. Simulations resulted in determination of thresholds needed by the ARCH method for parameter estimates. These thresholds were subsequently used in 50 sets of computer simulations in order to compare accuracy and efficiency of ARCH with the sequential probability ratio test (SPRT) and with an enhanced method called EXSPRT. In the second study, 5,729 examinees took an online plagiarism test, where ARCH was implemented in parallel with SPRT and EXSPRT for comparison.
Results indicated that new statistics were needed by ARCH to establish thresholds and to determine when ARCH could begin. The ARCH method resulted in test lengths significantly shorter than SPRT, and slightly longer than EXSPRT without sacrificing accuracy of classification of examinees as masters and nonmasters.
This research was the first of its kind in evaluating the ARCH method. ARCH appears to be a viable CCT method, which could be particularly useful in massively open online courses (MOOCs). Additional studies with different test content and contexts are needed
Recommended from our members
Toward a Robust and Universal Crowd Labeling Framework
The advent of fast and economical computers with large electronic storage has led to a large volume of data, most of which is unlabeled. While computers provide expeditious, accurate and low-cost computation, they still lag behind in many tasks that require human intelligence such as labeling medical images, videos or text. Consequently, current research focuses on a combination of computer accuracy and human intelligence to complete labeling task. In most cases labeling needs to be done by domain experts, however, because of the variability in expertise, experience, and intelligence of human beings, experts can be scarce.
As an alternative to using domain experts, help is sought from non-experts, also known as Crowd, to complete tasks that cannot be readily automated. Since crowd labelers are non-expert, multiple labels per instance are acquired for quality purposes. The final label is obtained by com- bining these multiple labels. It is very common that the ground truth, instance difficulty, and the labeler ability are unknown entities. Therefore, the aggregation task becomes a “chicken and egg” problem to start with.
Despite the fact that much research using machine learning and statistical techniques has been conducted in this area (e.g., [Dekel and Shamir, 2009; Hovy et al., 2013a; Liu et al., 2012; Donmez and Carbonell, 2008]), many questions remain unresolved, these include: (a) What are the best ways to evaluate labelers? (b) It is common to use expert-labeled instances (ground truth) to evaluate la- beler ability (e.g., [Le et al., 2010; Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012; Khattak and Salleb-Aouissi, 2013]). The question is, what should be the cardinality of the set of expert-labeled instances to have an accurate evaluation? (c) Which factors other than labeler expertise (e.g., difficulty of instance, prevalence of class, bias of a labeler toward a particular class) can affect the labeling accuracy? (d) Is there any optimal way to combine multiple labels to get the
best labeling accuracy? (e) Should the labels provided by oppositional/malicious labelers be dis- carded and blocked? Or is there a way to use the “information” provided by oppositional/malicious labelers? (f) How can labelers and instances be evaluated if the ground truth is not known with certitude?
In this thesis, we investigate these questions. We present methods that rely on few expert-labeled instances (usually 0.1% -10% of the dataset) to evaluate various parameters using a frequentist and a Bayesian approach. The estimated parameters are then used for label aggregation to produce one final label per instance.
In the first part of this thesis, we propose a method called Expert Label Injected Crowd Esti- mation (ELICE) and extend it to different versions and variants. ELICE is based on a frequentist approach for estimating the underlying parameters. The first version of ELICE estimates the pa- rameters i.e., labeler expertise and data instance difficulty, using the accuracy of crowd labelers on expert-labeled instances [Khattak and Salleb-Aouissi, 2011; Khattak and Salleb-Aouissi, 2012]. The multiple labels for each instance are combined using weighted majority voting. These weights are the scores of labeler reliability on any given instance, which are obtained by inputting the pa- rameters in the logistic function.
In the second version of ELICE [Khattak and Salleb-Aouissi, 2013], we introduce entropy as a way to estimate the uncertainty of labeling. This provides an advantage of differentiating between good, random and oppositional/malicious labelers. The aggregation of labels for ELICE version 2 flips the label (for binary classification) provided by the oppositional/malicious labeler thus utilizing the information that is generally discarded by other labeling methodologies.
Both versions of ELICE have a cluster-based variant in which rather than making a random choice of instances from the whole dataset, clusters of data are first formed using any clustering approach e.g., K-means. Then an equal number of instances from each cluster are chosen randomly to get expert-labels. This is done to ensure equal representation of each class in the test dataset.
Besides taking advantage of expert-labeled instances, the third version of ELICE [Khattak and Salleb-Aouissi, 2016], incorporates pairwise/circular comparison of labelers to labelers and in- stances to instances. The idea here is to improve accuracy by using the crowd labels, which unlike expert-labels, are available for the whole dataset and may provide a more comprehensive view of the labeler ability and instance difficulty. This is especially helpful for the case when the domain
experts do not agree on one label and ground truth is not known for certain. Therefore, incorporating more information beyond expert labels can provide better results.
We test the performance of ELICE on simulated labels as well as real labels obtained from Amazon Mechanical Turk. Results show that ELICE is effective as compared to state-of-the-art methods. All versions and variants of ELICE are capable of delaying phase transition. The main contribution of ELICE is that it makes the use of all possible information available from crowd and experts. Next, we also present a theoretical framework to estimate the number of expert-labeled instances needed to achieve certain labeling accuracy. Experiments are presented to demonstrate the utility of the theoretical bound.
In the second part of this thesis, we present Crowd Labeling Using Bayesian Statistics (CLUBS) [Khattak and Salleb-Aouissi, 2015; Khattak et al., 2016b; Khattak et al., 2016a], a new approach for crowd labeling to estimate labeler and instance parameters along with label aggregation. Our approach is inspired by Item Response Theory (IRT). We introduce new parameters and refine the existing IRT parameters to fit the crowd labeling scenario. The main challenge is that unlike IRT, in the crowd labeling case, the ground truth is not known and has to be estimated based on the parameters. To overcome this challenge, we acquire expert-labels for a small fraction of instances in the dataset. Our model estimates the parameters based on the expert-labeled instances. The estimated parameters are used for weighted aggregation of crowd labels for the rest of the dataset. Experiments conducted on synthetic data and real datasets with heterogeneous quality crowd-labels show that our methods perform better than many state-of-the-art crowd labeling methods.
We also conduct significance tests between our methods and other state-of-the-art methods to check the significance of the accuracy of these methods. The results show the superiority of our method in most cases. Moreover, we present experiments to demonstrate the impact of the accuracy of final aggregated labels when used as training data. The results essentially emphasize the need for high accuracy of the aggregated labels.
In the last part of the thesis, we present past and contemporary research related to crowd la- beling. We conclude with future of crowd labeling and further research directions. To summarize, in this thesis, we have investigated different methods for estimating crowd labeling parameters and using them for label aggregation. We hope that our contribution will be useful to the crowd labeling community
- …