26 research outputs found
SEQ^3: Differentiable Sequence-to-Sequence-to-Sequence Autoencoder for Unsupervised Abstractive Sentence Compression
Neural sequence-to-sequence models are currently the dominant approach in
several natural language processing tasks, but require large parallel corpora.
We present a sequence-to-sequence-to-sequence autoencoder (SEQ^3), consisting
of two chained encoder-decoder pairs, with words used as a sequence of discrete
latent variables. We apply the proposed model to unsupervised abstractive
sentence compression, where the first and last sequences are the input and
reconstructed sentences, respectively, while the middle sequence is the
compressed sentence. Constraining the length of the latent word sequences
forces the model to distill important information from the input. A pretrained
language model, acting as a prior over the latent sequences, encourages the
compressed sentences to be human-readable. Continuous relaxations enable us to
sample from categorical distributions, allowing gradient-based optimization,
unlike alternatives that rely on reinforcement learning. The proposed model
does not require parallel text-summary pairs, achieving promising results in
unsupervised sentence compression on benchmark datasets.Comment: Accepted to NAACL 201
Evaluation Measures for Hierarchical Classification: a unified view and novel approaches
Hierarchical classification addresses the problem of classifying items into a
hierarchy of classes. An important issue in hierarchical classification is the
evaluation of different classification algorithms, which is complicated by the
hierarchical relations among the classes. Several evaluation measures have been
proposed for hierarchical classification using the hierarchy in different ways.
This paper studies the problem of evaluation in hierarchical classification by
analyzing and abstracting the key components of the existing performance
measures. It also proposes two alternative generic views of hierarchical
evaluation and introduces two corresponding novel measures. The proposed
measures, along with the state-of-the art ones, are empirically tested on three
large datasets from the domain of text classification. The empirical results
illustrate the undesirable behavior of existing approaches and how the proposed
methods overcome most of these methods across a range of cases.Comment: Submitted to journa
LSHTC: A Benchmark for Large-Scale Text Classification
LSHTC is a series of challenges which aims to assess the performance of
classification systems in large-scale classification in a a large number of
classes (up to hundreds of thousands). This paper describes the dataset that
have been released along the LSHTC series. The paper details the construction
of the datsets and the design of the tracks as well as the evaluation measures
that we implemented and a quick overview of the results. All of these datasets
are available online and runs may still be submitted on the online server of
the challenges
Recommended from our members
Does the Mechanism of Lymph Node Invasion Affect Survival in Patients with Pancreatic Ductal Adenocarcinoma?
Background: Lymph node metastases are prognostically significant in pancreatic ductal adenocarcinoma. Little is known about the significance of direct lymph node invasion. Aim: The aim of this study is to find out whether direct lymph node invasion has the same prognostic significance as regional nodal metastases. Methods: Retrospective review of patients resected between 1/1/1993 and 7/31/2008. “Direct” was defined as tumor extension into adjacent nodes, and “regional” was defined as metastases to peripancreatic nodes. Results: Overall, 517 patients underwent pancreatic resection for adenocarcinoma, of whom 89 had one positive node (direct 26, regional 63), and 79 had two positive nodes (direct 6, regional 68, both 5). Overall, survival of node-negative patients was improved compared to patients with positive nodes (N0 30.8 months vs. N1 16.4 months; p < 0.001). There was no survival difference for patients with direct vs. regional lymph node invasion (p = 0.67). Patients with one positive node had a better overall survival compared to patients with ≥2 positive nodes (22.3 and 15 months, respectively; p < 0.001). The lymph node ratio (+LN/total LN) was prognostically significant after Cox regression (p < 0.001). Conclusions: Isolated direct invasion occurs in 20% of patients with one to two positive nodes. Node involvement by metastasis or by direct invasion are equally significant predictors of reduced survival. Both the number of positive nodes and the lymph node ratio are significant prognostic factors