286 research outputs found
Elimination of Spurious Ambiguity in Transition-Based Dependency Parsing
We present a novel technique to remove spurious ambiguity from transition
systems for dependency parsing. Our technique chooses a canonical sequence of
transition operations (computation) for a given dependency tree. Our technique
can be applied to a large class of bottom-up transition systems, including for
instance Nivre (2004) and Attardi (2006)
Conversation Trees: A Grammar Model for Topic Structure in Forums
Online forum discussions proceed differently from face-to-face conversations and any single thread on an online forum contains posts on different subtopics. This work aims to characterize the content of a forum thread as a conversation tree of topics. We present models that jointly per- form two tasks: segment a thread into sub- parts, and assign a topic to each part. Our core idea is a definition of topic structure using probabilistic grammars. By leveraging the flexibility of two grammar formalisms, Context-Free Grammars and Linear Context-Free Rewriting Systems, our models create desirable structures for forum threads: our topic segmentation is hierarchical, links non-adjacent segments on the same topic, and jointly labels the topic during segmentation. We show that our models outperform a number of tree generation baselines
Obfuscation for Privacy-preserving Syntactic Parsing
The goal of homomorphic encryption is to encrypt data such that another party
can operate on it without being explicitly exposed to the content of the
original data. We introduce an idea for a privacy-preserving transformation on
natural language data, inspired by homomorphic encryption. Our primary tool is
{\em obfuscation}, relying on the properties of natural language. Specifically,
a given English text is obfuscated using a neural model that aims to preserve
the syntactic relationships of the original sentence so that the obfuscated
sentence can be parsed instead of the original one. The model works at the word
level, and learns to obfuscate each word separately by changing it into a new
word that has a similar syntactic role. The text obfuscated by our model leads
to better performance on three syntactic parsers (two dependency and one
constituency parsers) in comparison to an upper-bound random substitution
baseline. More specifically, the results demonstrate that as more terms are
obfuscated (by their part of speech), the substitution upper bound
significantly degrades, while the neural model maintains a relatively high
performing parser. All of this is done without much sacrifice of privacy
compared to the random substitution upper bound. We also further analyze the
results, and discover that the substituted words have similar syntactic
properties, but different semantic content, compared to the original words.Comment: Accepted to IWPT 202
Recommended from our members
Shared and distinct transcriptional programs underlie the hybrid nature of iNKT cells
Invariant natural killer T (iNKT) cells are innate-like T lymphocytes that act as critical regulators of the immune response. To better characterize this population, we profiled iNKT cell gene expression during ontogeny and in peripheral subsets as part of the Immunological Genome Project (ImmGen). High-resolution comparative transcriptional analyses defined developmental and subset-specific iNKT cell gene expression programs. In addition, iNKT cells were found to share an extensive transcriptional program with natural killer (NK) cells, similar in magnitude to that shared with major histocompatibility complex (MHC)-restricted T cells. Strikingly, the NK- iNKT program also operated constitutively in γδT cells and in adaptive T cells following activation. Together, our findings highlight a core effector program regulated distinctly in innate and adaptive lymphocytes
Can Large Language Model Summarizers Adapt to Diverse Scientific Communication Goals?
In this work, we investigate the controllability of large language models
(LLMs) on scientific summarization tasks. We identify key stylistic and content
coverage factors that characterize different types of summaries such as paper
reviews, abstracts, and lay summaries. By controlling stylistic features, we
find that non-fine-tuned LLMs outperform humans in the MuP review generation
task, both in terms of similarity to reference summaries and human preferences.
Also, we show that we can improve the controllability of LLMs with
keyword-based classifier-free guidance (CFG) while achieving lexical overlap
comparable to strong fine-tuned baselines on arXiv and PubMed. However, our
results also indicate that LLMs cannot consistently generate long summaries
with more than 8 sentences. Furthermore, these models exhibit limited capacity
to produce highly abstractive lay summaries. Although LLMs demonstrate strong
generic summarization competency, sophisticated content control without costly
fine-tuning remains an open problem for domain-specific applications.Comment: ACL 2024 camera read
- …
