Search CORE

266 research outputs found

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Author: Gong Mackenzie
Liu Yan
Liu Yang
Peng Siyao
Yu Yue
Zeldes Amir
Zhu Yilun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

arXiv.org e-Print Archive

Crossref

Sparsify-then-Classify: From Internal Neurons of Large Language Models To Efficient Text Classifiers

Author: Anderson Ashton
Jiao Difan
Liu Yilun
Publication venue
Publication date: 27/11/2023
Field of study

Among the many tasks that Large Language Models (LLMs) have revolutionized is text classification. However, existing approaches for applying pretrained LLMs to text classification predominantly rely on using single token outputs from only the last layer of hidden states. As a result, they suffer from limitations in efficiency, task-specificity, and interpretability. In our work, we contribute an approach that uses all internal representations by employing multiple pooling strategies on all activation and hidden states. Our novel lightweight strategy, Sparsify-then-Classify (STC) first sparsifies task-specific features layer-by-layer, then aggregates across layers for text classification. STC can be applied as a seamless plug-and-play module on top of existing LLMs. Our experiments on a comprehensive set of models and datasets demonstrate that STC not only consistently improves the classification performance of pretrained and fine-tuned models, but is also more efficient for both training and inference, and is more intrinsically interpretable.Comment: 23 pages, 5 figures, 8 tables Code available at https://github.com/difanj0713/Sparsify-then-Classif

arXiv.org e-Print Archive

When transcription meets recombination: a lesson from the human RECQ protein complexes

Author: Conaway Joan W
Liu Yilun
Publication venue: Biology Reports Ltd
Publication date
Field of study

Since the cloning of the first human RECQ gene, RECQ1, more than 15 years ago, RECQ helicases have been a major focus in cancer research. Recent studies of human RECQ protein complexes are providing insight into their roles in various DNA metabolic pathways that protect the integrity of our genome

Crossref

PubMed Central