Search CORE

1,706 research outputs found

Linguistic Structured Sparsity in Text Categorization

Author: Dani Yogatama
Noah A. Smith
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2014
Field of study

We introduce three linguistically moti-vated structured regularizers based on parse trees, topics, and hierarchical word clusters for text categorization. These regularizers impose linguistic bias in fea-ture weights, enabling us to incorporate prior knowledge into conventional bag-of-words models. We show that our structured regularizers consistently im-prove classification accuracies compared to standard regularizers that penalize fea-tures in isolation (such as lasso, ridge, and elastic net regularizers) on a range of datasets for various text prediction prob-lems: topic classification, sentiment anal-ysis, and forecasting.

CiteSeerX

Crossref

Neural Discourse Structure for Text Categorization

Author: Ji Yangfeng
Smith Noah
Publication venue
Publication date: 01/01/2017
Field of study

We show that discourse structure, as defined by Rhetorical Structure Theory and provided by an existing discourse parser, benefits text categorization. Our approach uses a recursive neural network and a newly proposed attention mechanism to compute a representation of the text that focuses on salient content, from the perspective of both RST and the task. Experiments consider variants of the approach and illustrate its strengths and weaknesses.Comment: ACL 2017 camera ready versio

arXiv.org e-Print Archive

Crossref

Dependency-based Convolutional Neural Networks for Sentence Embedding

Author: Huang Liang
Ma Mingbo
Xiang Bing
Zhou Bowen
Publication venue
Publication date: 01/01/2015
Field of study

In sentence modeling and classification, convolutional neural network approaches have recently achieved state-of-the-art results, but all such efforts process word vectors sequentially and neglect long-distance dependencies. To exploit both deep learning and linguistic structures, we propose a tree-based convolutional neural network model which exploit various long-distance relationships between words. Our model improves the sequential baselines on all three sentiment and question classification tasks, and achieves the highest published accuracy on TREC.Comment: this paper has been accepted by ACL 201

arXiv.org e-Print Archive

Crossref

Sparse Overcomplete Word Vector Representations

Author: Dyer Chris
Faruqui Manaal
Smith Noah
Tsvetkov Yulia
Yogatama Dani
Publication venue
Publication date: 01/01/2015
Field of study

Current distributed representations of words show little resemblance to theories of lexical semantics. The former are dense and uninterpretable, the latter largely based on familiar, discrete classes (e.g., supersenses) and relations (e.g., synonymy and hypernymy). We propose methods that transform word vectors into sparse (and optionally binary) vectors. The resulting representations are more similar to the interpretable features typically used in NLP, though they are discovered automatically from raw corpora. Because the vectors are highly sparse, they are computationally easy to work with. Most importantly, we find that they outperform the original vectors on benchmark tasks.Comment: Proceedings of ACL 201

arXiv.org e-Print Archive

Crossref

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

Asynchronous Training of Word Embeddings for Large Text Corpora

Author: Almuhareb A.
Boucher T.
Garten J.
Ghannay S.
Goikoetxea J.
Jurgens D. A.
Levy O.
Li Y.
Luong M.-T.
Mikolov T.
Recht B.
Socher R.
Socher R.
Stergiou S.
Vuurens J. B. P.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/12/2018
Field of study

Word embeddings are a powerful approach for analyzing language and have been widely popular in numerous tasks in information retrieval and text mining. Training embeddings over huge corpora is computationally expensive because the input is typically sequentially processed and parameters are synchronously updated. Distributed architectures for asynchronous training that have been proposed either focus on scaling vocabulary sizes and dimensionality or suffer from expensive synchronization latencies. In this paper, we propose a scalable approach to train word embeddings by partitioning the input space instead in order to scale to massive text corpora while not sacrificing the performance of the embeddings. Our training procedure does not involve any parameter synchronization except a final sub-model merge phase that typically executes in a few minutes. Our distributed training scales seamlessly to large corpus sizes and we get comparable and sometimes even up to 45% performance improvement in a variety of NLP benchmarks using models trained by our distributed procedure which requires

1/10

of the time taken by the baseline approach. Finally we also show that we are robust to missing words in sub-models and are able to effectively reconstruct word representations.Comment: This paper contains 9 pages and has been accepted in the WSDM201

arXiv.org e-Print Archive

Crossref