64 research outputs found
MICK: A Meta-Learning Framework for Few-shot Relation Classification with Small Training Data
Few-shot relation classification seeks to classify incoming query instances
after meeting only few support instances. This ability is gained by training
with large amount of in-domain annotated data. In this paper, we tackle an even
harder problem by further limiting the amount of data available at training
time. We propose a few-shot learning framework for relation classification,
which is particularly powerful when the training data is very small. In this
framework, models not only strive to classify query instances, but also seek
underlying knowledge about the support instances to obtain better instance
representations. The framework also includes a method for aggregating
cross-domain knowledge into models by open-source task enrichment.
Additionally, we construct a brand new dataset: the TinyRel-CM dataset, a
few-shot relation classification dataset in health domain with purposely small
training data and challenging relation classes. Experimental results
demonstrate that our framework brings performance gains for most underlying
classification models, outperforms the state-of-the-art results given small
training data, and achieves competitive results with sufficiently large
training data
Preface
The University of Pennsylvania Working Papers in Linguistics (PWPL) is an occasional series published by the Penn Linguistics Club, the graduate student organization of the Linguistics Department of the University of Pennsylvania. The series has included volumes of previously unpublished work, or work in progress, by linguists with an ongoing affiliation with the Department, as well as volumes of papers from the NWAV conference and the Penn Linguistics Colloquium.
We thank the Graduate Students Association Council of the University of Pennsylvania for financial support.
This volume is the result of combined efforts of many people. Papers were selected and reviewed for content under the direction of the issue editors. Atissa Banuazizi did most of the legwork for collecting the papers, and the PWPL editors carried out the production of the actual volume. Special thanks are due to Hikyoung Lee for her production help, expert proofreading, and amazing post-its. All remaining errors are the responsibility of the series editors or the authors, as the case may be
Parallel Aligned Treebank Corpora at LDC: Methodology, Annotation and Integration
Proceedings of the Workshop on Annotation and
Exploitation of Parallel Corpora AEPC 2010.
Editors: Lars Ahrenberg, Jörg Tiedemann and Martin Volk.
NEALT Proceedings Series, Vol. 10 (2010), 14-23.
© 2010 The editors and contributors.
Published by
Northern European Association for Language
Technology (NEALT)
http://omilia.uio.no/nealt .
Electronically published at
Tartu University Library (Estonia)
http://hdl.handle.net/10062/15893
Linguistic Resources for Effective, Affordable, Reusable Speech-to-Text
This paper describes ongoing efforts at Linguistic Data Consortium to create shared evaluation resources for improved speech-to-text technology. The DARPA EARS Program (Effective, Affordable, Reusable Speech-to-Text) is focused on enabling core STT technology to produce rich, highly accurate output in a range of languages and speaking styles. The aggressive EARS program goals motivate new approaches to corpus creation and distribution. EARS research sites require multilingual broadcast news and telephone speech, transcripts and annotations at a much higher volume than for any previous technology program. In response to these demands, LDC has developed new corpora for training and evaluating speech-to-text systems in English, Arabic and Chinese and to support systems that distinguish speakers, identify and repair disfluencies and punctuate a text to improve readability
- …