Search CORE

1,317 research outputs found

When Are Tree Structures Necessary for Deep Learning of Representations?

Author: Hovy Eudard
Jurafsky Dan
Li Jiwei
Luong Minh-Thang
Publication venue
Publication date: 01/01/2015
Field of study

Recursive neural models, which use syntactic parse trees to recursively generate representations bottom-up, are a popular architecture. But there have not been rigorous evaluations showing for exactly which tasks this syntax-based method is appropriate. In this paper we benchmark {\bf recursive} neural models against sequential {\bf recurrent} neural models (simple recurrent and LSTM models), enforcing apples-to-apples comparison as much as possible. We investigate 4 tasks: (1) sentiment classification at the sentence level and phrase level; (2) matching questions to answer-phrases; (3) discourse parsing; (4) semantic relation extraction (e.g., {\em component-whole} between nouns). Our goal is to understand better when, and why, recursive models can outperform simpler models. We find that recursive models help mainly on tasks (like semantic relation extraction) that require associating headwords across a long distance, particularly on very long sequences. We then introduce a method for allowing recurrent models to achieve similar performance: breaking long sentences into clause-like units at punctuation and processing them separately before combining. Our results thus help understand the limitations of both classes of models, and suggest directions for improving recurrent models

arXiv.org e-Print Archive

CiteSeerX

Named Entity Sequence Classification

Author: Namazifar Mahdi
Publication venue
Publication date: 06/12/2017
Field of study

Named Entity Recognition (NER) aims at locating and classifying named entities in text. In some use cases of NER, including cases where detected named entities are used in creating content recommendations, it is crucial to have a reliable confidence level for the detected named entities. In this work we study the problem of finding confidence levels for detected named entities. We refer to this problem as Named Entity Sequence Classification (NESC). We frame NESC as a binary classification problem and we use NER as well as recurrent neural networks to find the probability of candidate named entity is a real named entity. We apply this approach to Tweet texts and we show how we could find named entities with high confidence levels from Tweets

arXiv.org e-Print Archive

On Generalization and Regularization in Deep Learning

Author: Lemberger Pirmin
Publication venue
Publication date: 06/04/2017
Field of study

Why do large neural network generalize so well on complex tasks such as image classification or speech recognition? What exactly is the role regularization for them? These are arguably among the most important open questions in machine learning today. In a recent and thought provoking paper [C. Zhang et al.] several authors performed a number of numerical experiments that hint at the need for novel theoretical concepts to account for this phenomenon. The paper stirred quit a lot of excitement among the machine learning community but at the same time it created some confusion as discussions on OpenReview.net testifies. The aim of this pedagogical paper is to make this debate accessible to a wider audience of data scientists without advanced theoretical knowledge in statistical learning. The focus here is on explicit mathematical definitions and on a discussion of relevant concepts, not on proofs for which we provide references.Comment: 11 pages, 3 figures pedagogical pape

arXiv.org e-Print Archive

Fast Reading Comprehension with ConvNets

Author: Blitzer John
Lao Ni
Weinberger Kilian
Wu Felix
Yang Guandao
Publication venue
Publication date: 12/11/2017
Field of study

State-of-the-art deep reading comprehension models are dominated by recurrent neural nets. Their sequential nature is a natural fit for language, but it also precludes parallelization within an instances and often becomes the bottleneck for deploying such models to latency critical scenarios. This is particularly problematic for longer texts. Here we present a convolutional architecture as an alternative to these recurrent architectures. Using simple dilated convolutional units in place of recurrent ones, we achieve results comparable to the state of the art on two question answering tasks, while at the same time achieving up to two orders of magnitude speedups for question answering.Comment: 15 pages, 10 figures, submitted to ICLR 201

arXiv.org e-Print Archive

Mining fine-grained opinions on closed captions of YouTube videos with an attention-RNN

Author: Balazs Jorge A.
Marrese-Taylor Edison
Matsuo Yutaka
Publication venue
Publication date: 08/08/2017
Field of study

Video reviews are the natural evolution of written product reviews. In this paper we target this phenomenon and introduce the first dataset created from closed captions of YouTube product review videos as well as a new attention-RNN model for aspect extraction and joint aspect extraction and sentiment classification. Our model provides state-of-the-art performance on aspect extraction without requiring the usage of hand-crafted features on the SemEval ABSA corpus, while it outperforms the baseline on the joint task. In our dataset, the attention-RNN model outperforms the baseline for both tasks, but we observe important performance drops for all models in comparison to SemEval. These results, as well as further experiments on domain adaptation for aspect extraction, suggest that differences between speech and written text, which have been discussed extensively in the literature, also extend to the domain of product reviews, where they are relevant for fine-grained opinion mining.Comment: 8th Workshop on Computational Approaches to Subjectivity, Sentiment & Social Media Analysis (WASSA

arXiv.org e-Print Archive

A critique of neuro-linguistic programming from an ethnomethodological perspective: A practical application of conversational analysis

Author: Cahill Michael D.
Publication venue: University of Montana, Maureen and Mike Mansfield Library
Publication date: 01/01/1983
Field of study

AppLP: A Dialogue on Applications of Logic Programming

Author: Liu Yanhong A.
Warren David S.
Publication venue
Publication date: 07/04/2017
Field of study

This document describes the contributions of the 2016 Applications of Logic Programming Workshop (AppLP), which was held on October 17 and associated with the International Conference on Logic Programming (ICLP) in Flushing, New York City.Comment: David S. Warren and Yanhong A. Liu (Editors). 33 pages. Including summaries by Christopher Kane and abstracts or position papers by M. Aref, J. Rosenwald, I. Cervesato, E.S.L. Lam, M. Balduccini, J. Lobo, A. Russo, E. Lupu, N. Leone, F. Ricca, G. Gupta, K. Marple, E. Salazar, Z. Chen, A. Sobhi, S. Srirangapalli, C.R. Ramakrishnan, N. Bj{\o}rner, N.P. Lopes, A. Rybalchenko, and P. Tara

arXiv.org e-Print Archive

Harry Potter and the Action Prediction Challenge from Natural Language

Author: Gómez-Rodríguez Carlos
Vilares David
Publication venue
Publication date: 27/05/2019
Field of study

We explore the challenge of action prediction from textual descriptions of scenes, a testbed to approximate whether text inference can be used to predict upcoming actions. As a case of study, we consider the world of the Harry Potter fantasy novels and inferring what spell will be cast next given a fragment of a story. Spells act as keywords that abstract actions (e.g. 'Alohomora' to open a door) and denote a response to the environment. This idea is used to automatically build HPAC, a corpus containing 82,836 samples and 85 actions. We then evaluate different baselines. Among the tested models, an LSTM-based approach obtains the best performance for frequent actions and large scene descriptions, but approaches such as logistic regression behave well on infrequent actions.Comment: NAACL 2019 (short papers

arXiv.org e-Print Archive

Flexible Operator Embeddings via Deep Learning

Author: Marcus Ryan
Papaemmanouil Olga
Publication venue
Publication date: 31/01/2019
Field of study

Integrating machine learning into the internals of database management systems requires significant feature engineering, a human effort-intensive process to determine the best way to represent the pieces of information that are relevant to a task. In addition to being labor intensive, the process of hand-engineering features must generally be repeated for each data management task, and may make assumptions about the underlying database that are not universally true. We introduce flexible operator embeddings, a deep learning technique for automatically transforming query operators into feature vectors that are useful for a multiple data management tasks and is custom-tailored to the underlying database. Our approach works by taking advantage of an operator's context, resulting in a neural network that quickly transforms sparse representations of query operators into dense, information-rich feature vectors. Experimentally, we show that our flexible operator embeddings perform well across a number of data management tasks, using both synthetic and real-world datasets

arXiv.org e-Print Archive

Green Machine Learning via Augmented Gaussian Processes and Multi-Information Source Optimization

Author: Archetti Francesco
Candelieri Antonio
Perego Riccardo
Publication venue
Publication date: 25/06/2020
Field of study

Searching for accurate Machine and Deep Learning models is a computationally expensive and awfully energivorous process. A strategy which has been gaining recently importance to drastically reduce computational time and energy consumed is to exploit the availability of different information sources, with different computational costs and different "fidelity", typically smaller portions of a large dataset. The multi-source optimization strategy fits into the scheme of Gaussian Process based Bayesian Optimization. An Augmented Gaussian Process method exploiting multiple information sources (namely, AGP-MISO) is proposed. The Augmented Gaussian Process is trained using only "reliable" information among available sources. A novel acquisition function is defined according to the Augmented Gaussian Process. Computational results are reported related to the optimization of the hyperparameters of a Support Vector Machine (SVM) classifier using two sources: a large dataset - the most expensive one - and a smaller portion of it. A comparison with a traditional Bayesian Optimization approach to optimize the hyperparameters of the SVM classifier on the large dataset only is reported.Comment: 22 pages, 4 figures, submitted to Soft computing - Special Issue on "Optimization methods for decision making: advances and applications

arXiv.org e-Print Archive