3 research outputs found
Magic Markup: Maintaining Document-External Markup with an LLM
Text documents, including programs, typically have human-readable semantic
structure. Historically, programmatic access to these semantics has required
explicit in-document tagging. Especially in systems where the text has an
execution semantics, this means it is an opt-in feature that is hard to support
properly. Today, language models offer a new method: metadata can be bound to
entities in changing text using a model's human-like understanding of
semantics, with no requirements on the document structure. This method expands
the applications of document annotation, a fundamental operation in program
writing, debugging, maintenance, and presentation. We contribute a system that
employs an intelligent agent to re-tag modified programs, enabling rich
annotations to automatically follow code as it evolves. We also contribute a
formal problem definition, an empirical synthetic benchmark suite, and our
benchmark generator. Our system achieves an accuracy of 90% on our benchmarks
and can replace a document's tags in parallel at a rate of 5 seconds per tag.
While there remains significant room for improvement, we find performance
reliable enough to justify further exploration of applications.Comment: 10 pages; 2 figures; to be published in the 2024
Conference Companio
rTisane: Externalizing conceptual models for data analysis increases engagement with domain knowledge and improves statistical model quality
Statistical models should accurately reflect analysts' domain knowledge about
variables and their relationships. While recent tools let analysts express
these assumptions and use them to produce a resulting statistical model, it
remains unclear what analysts want to express and how externalization impacts
statistical model quality. This paper addresses these gaps. We first conduct an
exploratory study of analysts using a domain-specific language (DSL) to express
conceptual models. We observe a preference for detailing how variables relate
and a desire to allow, and then later resolve, ambiguity in their conceptual
models. We leverage these findings to develop rTisane, a DSL for expressing
conceptual models augmented with an interactive disambiguation process. In a
controlled evaluation, we find that rTisane's DSL helps analysts engage more
deeply with and accurately externalize their assumptions. rTisane also leads to
statistical models that match analysts' assumptions, maintain analysis intent,
and better fit the data