60 research outputs found
Recommended from our members
Learning Latent Characteristics of Data and Models using Item Response Theory
A supervised machine learning model is trained with a large set of labeled training data, and evaluated on a smaller but still large set of test data. Especially with deep neural networks (DNNs), the complexity of the model requires that an extremely large data set is collected to prevent overfitting. It is often the case that these models do not take into account specific attributes of the training set examples, but instead treat each equally in the process of model training. This is due to the fact that it is difficult to model latent traits of individual examples at the scale of hundreds of thousands or millions of data points. However, there exist a set of psychometric methods that can model attributes of specific examples and can greatly improve model training and evaluation in the supervised learning process.
Item Response Theory (IRT) is a well-studied psychometric methodology for scale construction and evaluation. IRT jointly models human ability and example characteristics such as difficulty based on human response data. We introduce new evaluation metrics for both humans and machine learning models build using IRT, and propose new methods for applying IRT to machine learning-scale data.
We use IRT to make contributions to the machine learning community in the following areas: (i) new test sets for evaluating machine learning models with respect to a human population, (ii) new insights about how deep-learning models learn by tracking example difficulty and training conditions, and (iii) new methods for data selection and curriculum building to improve model training efficiency, (iv) a new test of electronic health literacy built with questions extracted from de-identified patient Electronic Health Records (EHRs).
We first introduce two new evaluation sets built and validated using IRT. These tests are the first IRT test sets to be applied to natural language processing tasks. Using IRT test sets allows for more comprehensive comparison of NLP models. Second, by modeling the difficulty of test set examples, we identify patterns that emerge when training deep neural network models that are consistent with human learning patterns. Specifically, as models are trained with larger training sets, they learn easy test set examples more quickly than hard examples. Third, we present a method for using soft labels on a subset of training data to improve deep learning model generalization. We show that fine-tuning a trained deep neural network with as little as 0.1% of the training data can improve model generalization in terms of test set accuracy. Fourth, we propose a new method for estimating IRT example and model parameters that allows for learning parameters at a much larger scale than previously available to accommodate the large data sets required for deep learning. This allows for learning IRT models at machine learning scale, with hundreds of thousands of examples and large ensembles of machine learning models. The response patterns of machine learning models can be used to learn IRT example characteristics instead of human response patterns. Fifth, we introduce a dynamic curriculum learning process that estimates model competency during training to adaptively select training data that is appropriate for learning at the given epoch. Finally, we introduce the ComprehENotes test, the first test of EHR comprehension for humans. The test is an accurate measure for identifying individuals with low EHR note comprehension ability, and validates the effectiveness of previously self-reported patient comprehension evaluations
Recommended from our members
Item Parameter Drift as an Indication of Differential Opportunity to Learn: An Exploration of item Flagging Methods & Accurate Classification of Examinees
The presence of outlying anchor items is an issue faced by many testing agencies. The decision to retain or remove an item is a difficult one, especially when the content representation of the anchor set becomes questionable by item removal decisions. Additionally, the reason for the aberrancy is not always clear, and if the performance of the item has changed due to improvements in instruction, then removing the anchor item may not be appropriate and might produce misleading conclusions about the proficiency of the examinees. This study is conducted in two parts consisting of both a simulation and empirical data analysis. In these studies, the effect on examinee classification was investigated when the decision was made to remove or retain aberrant anchor items. Three methods of detection were explored; (1) delta plot, (2) IRT b-parameter plots, and (3) the RPU method. In the simulation study, degree of aberrancy was manipulated as well as the ability distribution of examinees and five aberrant item schemes were employed. In the empirical data analysis, archived statewide science achievement data that was suspected to possess differential opportunity to learn between administrations was re-analyzed using the various item parameter drift detection methods. The results for both the simulation and empirical data study provide support for eliminating the use of flagged items for linking assessments when a matrix-sampling design is used and a large number of items are used within that anchor. While neither the delta nor the IRT b-parameter plot methods produced results that would overwhelmingly support their use, it is recommended that both methods be employed in practice until further research is conducted for alternative methods, such as the RPU method since classification accuracy increases when such methods are employed and items are removed and most often, growth is not misrepresented by doing so
Effect of time span and task load on pilot mental workload
Two sets of experiments were run to examine how the mental workload of a pilot might be measured. The effects of continuous manual control activity versus discrete assigned mental tasks (including the length of time between receiving an assignment and executing it) were examined. The first experiment evaluated the strengths and weaknesses of measuring mental workload with an objective perforamance (altitude deviations) and five subjective ratings (activity level, complexity, difficulty, stress, and workload). The second set of experiments built upon the first set by increasing workload intensities and adding another performance measure: airspeed deviation. The results are discussed for both low and high experience pilots
Desired fertility and the impact of population policies
Ninety percent of the differences across countries in total fertility rates are accounted for solely by differences in women's reported desired fertility. Using desired fertility constructed from both retrospective and prospective questions, together with instrumental variables estimation, it is shown this strong result is not affected by either ex-post rationalization of births nor the dependence of desired fertility on contraceptive access or cost. Moreover, despite the obvious role of contraception as a proximate determinant of fertility, the additional effect of contraceptive availability or family planning on fertility is quantitatively small and explains very little cross country variation. These empirical results are consistent with theories in which fertility is determined by parent's choices about children within the social, educational, economic, and cultural environment that parents, and especially women, face. They contradict theories that assert a large causal role for expansion of contraception in the reduction of fertility.Reproductive Health,Gender and Social Development,Life Sciences&Biotechnology,Biodiversity,Poverty Reduction Strategies
A Geography of Water Matters in the Ord Catchment, Northern Australia
This thesis examines water matters in the Ord catchment. It shows how social, environmental, cultural and economic dynamics are manifest in water matters. In so doing, it critiques material and discursive practices that create environmental injustices, and highlights efforts underway to remedy thoes. The thesis makes two major contributions. First, to dissect water politics in the Ord through the prism of how water matters - from water supply and sanitation, to water allocations for cultural flows. Second, to demonstrate a theoretical means twoards this end, by combining political ecology and environmental justice with a Masseyian spatial approach. Water, as a physical substance, makes tangible invisible power relations. To consider this, the thesis marries political ecology, with its focus on how power and politics help shape human-environment relationships, to environmental justice. A politics of difference informs the particular type of environmental justice drawn on here: it asks whether there is recognition of difference, plurality of participation, and equity in distribution of benefits, in environmental matters (Schlosberg, 2004). This nuanced theoretical terrain blends well with a Masseyian spatial approach that acknowledges places as made of 'loose ends and missing links' (Massey, 2005:12). The latter holds that places are never finished, are always being made, while the former analyses how power relations operate through processes.The thesis presents water matters as contested yet crucial to making sense of social-environmental matters; through contextaulising governance transformations and current water dilemmas, the shape of this contestation becomes clear. This involves spaces of interests coming together, and spaces where interests remain apart. These gaps are renegotiated through instruments such as the Ord Final Agreement. However, fraught water matters do persist, in part due to the complex place-based politics of water in the Ord that include Indigenous politics, environmental contestation, development processes, and a recent colonising history
An Investigation on the Procedural Rhetoric of Curated Difficulty
The discussion of difficulty exists as a prominent topic within the realm of play. In terms of accessibility, the difficulty associated with any form of play becomes rather crucial. In video games, difficulty can both enhance the immersion and interactions between a player and the game; however, difficulty can also act as a barrier to entrance and gatekeep experiences from individuals with disabilities. Developers utilize difficulty as a tool to deliver different forms of narratives and rhetoric to their audience. The field of Ludic Rhetoric observes and studies how rhetoric can be implemented and facilitated within contexts of play and a case study of how developers utilize difficulty to communicate with audiences. Understanding the procedural rhetoric involved in fine-tuning difficulty illuminates the relationship between accessibility and authorial intent. Developers fine-tune difficulty through the curation of alternative mechanics, systems, and experiences. Each of these categories represents adjustable vectors game developers utilize to create accessible experiences. Learning how these vectors adjust difficulty can reveal how other forms of rhetoric can accommodate their respective rhetoric to account for audience, while also maintaining authorial intent
From pixels to people : recovering location, shape and pose of humans in images
Humans are at the centre of a significant amount of research in computer vision. Endowing machines with the ability to perceive people from visual data is an immense scientific challenge with a high degree of direct practical relevance. Success in automatic perception can be measured at different levels of abstraction, and this will depend on which intelligent behaviour we are trying to replicate: the ability to localise persons in an image or in the environment, understanding how persons are moving at the skeleton and at the surface level, interpreting their interactions with the environment including with other people, and perhaps even anticipating future actions. In this thesis we tackle different sub-problems of the broad research area referred to as "looking at people", aiming to perceive humans in images at different levels of granularity. We start with bounding box-level pedestrian detection: We present a retrospective analysis of methods published in the decade preceding our work, identifying various strands of research that have advanced the state of the art. With quantitative exper- iments, we demonstrate the critical role of developing better feature representations and having the right training distribution. We then contribute two methods based on the insights derived from our analysis: one that combines the strongest aspects of past detectors and another that focuses purely on learning representations. The latter method outperforms more complicated approaches, especially those based on hand- crafted features. We conclude our work on pedestrian detection with a forward-looking analysis that maps out potential avenues for future research. We then turn to pixel-level methods: Perceiving humans requires us to both separate them precisely from the background and identify their surroundings. To this end, we introduce Cityscapes, a large-scale dataset for street scene understanding. This has since established itself as a go-to benchmark for segmentation and detection. We additionally develop methods that relax the requirement for expensive pixel-level annotations, focusing on the task of boundary detection, i.e. identifying the outlines of relevant objects and surfaces. Next, we make the jump from pixels to 3D surfaces, from localising and labelling to fine-grained spatial understanding. We contribute a method for recovering 3D human shape and pose, which marries the advantages of learning-based and model- based approaches. We conclude the thesis with a detailed discussion of benchmarking practices in computer vision. Among other things, we argue that the design of future datasets should be driven by the general goal of combinatorial robustness besides task-specific considerations.Der Mensch steht im Zentrum vieler Forschungsanstrengungen im Bereich des maschinellen Sehens. Es ist eine immense wissenschaftliche Herausforderung mit hohem unmittelbarem Praxisbezug, Maschinen mit der Fähigkeit auszustatten, Menschen auf der Grundlage von visuellen Daten wahrzunehmen. Die automatische Wahrnehmung kann auf verschiedenen Abstraktionsebenen erfolgen. Dies hängt davon ab, welches intelligente Verhalten wir nachbilden wollen: die Fähigkeit, Personen auf der Bildfläche oder im 3D-Raum zu lokalisieren, die Bewegungen von Körperteilen und Körperoberflächen zu erfassen, Interaktionen einer Person mit ihrer Umgebung einschließlich mit anderen Menschen zu deuten, und vielleicht sogar zukünftige Handlungen zu antizipieren. In dieser Arbeit beschäftigen wir uns mit verschiedenen Teilproblemen die dem breiten Forschungsgebiet "Betrachten von Menschen" gehören. Beginnend mit der Fußgängererkennung präsentieren wir eine Analyse von Methoden, die im Jahrzehnt vor unserem Ausgangspunkt veröffentlicht wurden, und identifizieren dabei verschiedene Forschungsstränge, die den Stand der Technik vorangetrieben haben. Unsere quantitativen Experimente zeigen die entscheidende Rolle sowohl der Entwicklung besserer Bildmerkmale als auch der Trainingsdatenverteilung. Anschließend tragen wir zwei Methoden bei, die auf den Erkenntnissen unserer Analyse basieren: eine Methode, die die stärksten Aspekte vergangener Detektoren kombiniert, eine andere, die sich im Wesentlichen auf das Lernen von Bildmerkmalen konzentriert. Letztere übertrifft kompliziertere Methoden, insbesondere solche, die auf handgefertigten Bildmerkmalen basieren. Wir schließen unsere Arbeit zur Fußgängererkennung mit einer vorausschauenden Analyse ab, die mögliche Wege für die zukünftige Forschung aufzeigt. Anschließend wenden wir uns Methoden zu, die Entscheidungen auf Pixelebene betreffen. Um Menschen wahrzunehmen, müssen wir diese sowohl praezise vom Hintergrund trennen als auch ihre Umgebung verstehen. Zu diesem Zweck führen wir Cityscapes ein, einen umfangreichen Datensatz zum Verständnis von Straßenszenen. Dieser hat sich seitdem als Standardbenchmark für Segmentierung und Erkennung etabliert. Darüber hinaus entwickeln wir Methoden, die die Notwendigkeit teurer Annotationen auf Pixelebene reduzieren. Wir konzentrieren uns hierbei auf die Aufgabe der Umgrenzungserkennung, d. h. das Erkennen der Umrisse relevanter Objekte und Oberflächen. Als nächstes machen wir den Sprung von Pixeln zu 3D-Oberflächen, vom Lokalisieren und Beschriften zum präzisen räumlichen Verständnis. Wir tragen eine Methode zur Schätzung der 3D-Körperoberfläche sowie der 3D-Körperpose bei, die die Vorteile von lernbasierten und modellbasierten Ansätzen vereint. Wir schließen die Arbeit mit einer ausführlichen Diskussion von Evaluationspraktiken im maschinellen Sehen ab. Unter anderem argumentieren wir, dass der Entwurf zukünftiger Datensätze neben aufgabenspezifischen Überlegungen vom allgemeinen Ziel der kombinatorischen Robustheit bestimmt werden sollte
Papers on education. II, Problems of textbook effectivity
• Preface.
• A. Elango, I. Unt. Theory of education in Tartu University.
• I. Unt. School textbooks and individualized instruction
• U. Läänemets. How to find material for foreign language textbooks
• U. Läänemets. How to evaluate the quality of language textbooks and ascertain their suitability for practical learning.
• J. Mikk. Studies on Teaching Material Readability.
• H. Kukemelk, J. Mikk. The prognosticating effectivity of learning a text in -physics
• H. Kukemelk. The dependence of the learning time on the text characteristics
• M. Lepik. Mathematical verbal problems; differences in solving difficulties.
• E. Mikk. A morphological analysis program for the Estonian language.
• J. Mikk, E. Mikk, J.Tirmaste. Computerized readability analysis of textbooks of English.
• Appendix. Papers on Textbook Problems in the Collections of Papers of Tartu University "Soviet Pedagogy and School"http://tartu.ester.ee/record=b1057432~S1*es
- …