Search CORE

2,400 research outputs found

Component-Enhanced Chinese Character Embeddings

Author: Li Sujian
Li Wenjie
Li Yanran
Sun Fei
Publication venue
Publication date: 01/01/2015
Field of study

Distributed word representations are very useful for capturing semantic information and have been successfully applied in a variety of NLP tasks, especially on English. In this work, we innovatively develop two component-enhanced Chinese character embedding models and their bigram extensions. Distinguished from English word embeddings, our models explore the compositions of Chinese characters, which often serve as semantic indictors inherently. The evaluations on both word similarity and text classification demonstrate the effectiveness of our models.Comment: 6 pages, 2 figures, conference, EMNLP 201

arXiv.org e-Print Archive

CiteSeerX

The Hong Kong Polytechnic University Pao Yue-kong Library

Crossref

Better, Faster, Stronger Sequence Tagging Constituent Parsers

Author: Abdou Mostafa
Søgaard Anders
Vilares David
Publication venue
Publication date: 01/01/2019
Field of study

Sequence tagging models for constituent parsing are faster, but less accurate than other types of parsers. In this work, we address the following weaknesses of such constituent parsers: (a) high error rates around closing brackets of long constituents, (b) large label sets, leading to sparsity, and (c) error propagation arising from greedy decoding. To effectively close brackets, we train a model that learns to switch between tagging schemes. To reduce sparsity, we decompose the label set and use multi-task learning to jointly learn to predict sublabels. Finally, we mitigate issues from greedy decoding through auxiliary losses and sentence-level fine-tuning with policy gradient. Combining these techniques, we clearly surpass the performance of sequence tagging constituent parsers on the English and Chinese Penn Treebanks, and reduce their parsing time even further. On the SPMRL datasets, we observe even greater improvements across the board, including a new state of the art on Basque, Hebrew, Polish and Swedish.Comment: NAACL 2019 (long papers). Contains corrigendu

arXiv.org e-Print Archive

Crossref

Copenhagen University Research Information System

Object-oriented Neural Programming (OONP) for Document Understanding

Author: Cui Haotian
Liu Xianggen
Lu Zhengdong
Yan Yukun
Zheng Daqi
Publication venue
Publication date: 01/01/2018
Field of study

We propose Object-oriented Neural Programming (OONP), a framework for semantically parsing documents in specific domains. Basically, OONP reads a document and parses it into a predesigned object-oriented data structure (referred to as ontology in this paper) that reflects the domain-specific semantics of the document. An OONP parser models semantic parsing as a decision process: a neural net-based Reader sequentially goes through the document, and during the process it builds and updates an intermediate ontology to summarize its partial understanding of the text it covers. OONP supports a rich family of operations (both symbolic and differentiable) for composing the ontology, and a big variety of forms (both symbolic and differentiable) for representing the state and the document. An OONP parser can be trained with supervision of different forms and strength, including supervised learning (SL) , reinforcement learning (RL) and hybrid of the two. Our experiments on both synthetic and real-world document parsing tasks have shown that OONP can learn to handle fairly complicated ontology with training data of modest sizes.Comment: accepted by ACL 201

arXiv.org e-Print Archive

Crossref

Evaluation of Automatic Text Summarization Using Synthetic Facts

Author: Ahn Jaewook
Publication venue: DigitalCommons@CalPoly
Publication date: 01/06/2022
Field of study

Automatic text summarization has achieved remarkable success with the development of deep neural networks and the availability of standardized benchmark datasets. It can generate fluent, human-like summaries. However, the unreliability of the existing evaluation metrics hinders its practical usage and slows down its progress. To address this issue, we propose an automatic reference-less text summarization evaluation system with dynamically generated synthetic facts. We hypothesize that if a system guarantees a summary that has all the facts that are 100% known in the synthetic document, it can provide natural interpretability and high feasibility in measuring factual consistency and comprehensiveness. To our knowledge, our system is the first system that measures the overarching quality of the text summarization models with factual consistency, comprehensiveness, and compression rate. We validate our system by comparing its correlation with human judgment with existing N-gram overlap-based metrics such as ROUGE and BLEU and a BERT-based evaluation metric, BERTScore. Our system\u27s experimental evaluation of PEGASUS, BART, and T5 outperforms the current evaluation metrics in measuring factual consistency with a noticeable margin and demonstrates its statistical significance in measuring comprehensiveness and overall summary quality

DigitalCommons@CalPoly