Essays on Data Science: Computational Measurement for Learning and Teaching

Abstract

To study teaching and learning at a large scale, we must introduce new methods for the analysis of rich, unstructured data -- such as audio, video, and transcribed text -- from classrooms. In this dissertation, I develop and apply computational and statistical methods to measure teaching and learning processes captured in two unstructured sources: student writing and transcripts of teacher speech from classroom lessons. My first essay introduces Coupled Likelihood Estimation (CLE), a method that improves the precision of parameter estimates in models of unstructured data features while requiring fewer expert-labeled observations. It combines information from limited samples of expert-labeled data with larger samples of data with machine-predicted labels. CLE leverages the geometric structure of the joint likelihood from both identifying (labeled) and non-identifying (unlabeled) data, constraining parameter estimates to, approximately, a surface defined by the unlabeled data's likelihood. Simulations demonstrate that CLE is unbiased, reduces root mean squared error, and yields narrower confidence intervals compared to existing methods, in some cases effectively achieving average efficiency gains equivalent to doubling the expert-labeled sample size. An application estimating the effect of an educational intervention on student writing quality illustrates CLE’s practical utility, producing estimates closer to an oracle benchmark using only 18\% of the expert-labeled data. By amplifying the value of limited labeled data, CLE lowers barriers to high-quality inference in resource-constrained domains such as healthcare, education, and policy evaluation. The method’s broad applicability, theoretical guarantees, and computational approach offer a pathway to cost-effective, reliable analyses in settings where researchers face high labeling costs. My second essay leverages natural language processing techniques to study the use of mathematical vocabulary in elementary math classrooms. My collaborators and I develop a rules-based computational measure of mathematical vocabulary use. We find that teachers differ substantially in the amount of mathematical vocabulary they model for their students. Students of teachers in the 75th percentile were exposed to 28 more mathematical terms per lesson (4,480 per year) than students of a teacher in the 25th percentile. Observed characteristics explain very little of this variation in teachers' mathematical vocabulary use. Finally, students randomly assigned to teachers’ who used more mathematical vocabulary in previous years scored higher on standardized tests of mathematics. This implies that teachers who expose their students to more mathematical vocabulary are more effective teachers of mathematics. Across value-added studies, a teacher one standard deviation above the mean in effectiveness raises math scores by between .10 and .15 \citep{BacherHicks2023}; our estimate of the effect of being assigned to a teacher who uses one standard deviation more mathematical language accounts for roughly half of this variation, indicating that our measure is a powerful predictor of teacher effectiveness. In my third essay, I develop Contextual Value Separation (CVS), a general method for identifying words used differently between pre-specified subsets of documents in large text corpora. CVS achieves this by combining contextual embeddings with machine learning classifiers, permutation testing, and statistical adjustments for multiple comparisons. Whereas current methods identify words that predict membership within a given class of documents, CVS reveals cases where separate classes of the documents use the same word in differing ways. For example, experienced and novice math teachers may use a mathematical vocabulary term with similar frequency but in markedly different ways or contexts. This approach can search over a specified set of target words or over the entire vocabulary of the corpus. For each target word, CVS infers how consistently its contextual embeddings differ by subset. Because vocabularies are large, the method includes multiple testing correction to control the false-discovery rate, typically yielding a small set of words whose usage varies between the document classes. After identifying the words whose usage most consistently differs, example usages from each subset are extracted for qualitative examination. CVS easily extends to other forms of unstructured data represented by embeddings, such as video and audio. The method can be used as an exploratory tool for hypothesis generation, to test a priori hypotheses, or to detect treatment effects on textual outcomes in experimental settings. To demonstrate the method, I analyze a set of transcripts from upper elementary mathematics lessons and identify two ways that teachers with larger impacts on math scores use mathematical vocabulary differently: more use of the mathematical meanings of polysemous terms and more requests that students engage with questions related to the terms. The method can be easily extended to other forms of unstructured data can be encoded into vectors, e.g., audio and video. As a collection, these three essays reveal the promise of computational methods for enabling the analysis of text data (and other rich, unstructured data sources). They contribute several novel findings in the field of education regarding mathematical vocabulary and effective teaching. From a statistical point of view, CLE introduces a new way to leverage large amounts of machine labeled data, which, in addition to its value for educational research, can lower the cost of research in several domains, such as phenotyping electronic health records.Educatio

Similar works

This paper was published in Harvard University - DASH.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.