12,484 research outputs found
Neural Discourse Structure for Text Categorization
We show that discourse structure, as defined by Rhetorical Structure Theory
and provided by an existing discourse parser, benefits text categorization. Our
approach uses a recursive neural network and a newly proposed attention
mechanism to compute a representation of the text that focuses on salient
content, from the perspective of both RST and the task. Experiments consider
variants of the approach and illustrate its strengths and weaknesses.Comment: ACL 2017 camera ready versio
Image-based Text Classification using 2D Convolutional Neural Networks
We propose a new approach to text classification
in which we consider the input text as an image and apply
2D Convolutional Neural Networks to learn the local and
global semantics of the sentences from the variations of the
visual patterns of words. Our approach demonstrates that
it is possible to get semantically meaningful features from
images with text without using optical character recognition
and sequential processing pipelines, techniques that traditional
natural language processing algorithms require. To validate
our approach, we present results for two applications: text
classification and dialog modeling. Using a 2D Convolutional
Neural Network, we were able to outperform the state-ofart
accuracy results for a Chinese text classification task and
achieved promising results for seven English text classification
tasks. Furthermore, our approach outperformed the memory
networks without match types when using out of vocabulary
entities from Task 4 of the bAbI dialog dataset
Content management by keywords: An analytical Study
Various methods of content analysis are described with special emphasis to keyword analysis. The paper is based on an analytical study of 97 keywords extracted from titles and abstracts of 70 research articles from INSPEC, taking ten from each year starting from 2000 to 2006, in decreasing order of relevance, on Fermi Liquid, which is a specific subject under Condensed Matter Physics. The keywords beginning with the letters ‗A‘ to ‗F‘ only are considered for this study. The keywords are indexed to critically examine its physical structure that is composed of three fundamental kernels, viz. key phrase, modulator and qualifier. The key phrase reflects the central concept, which is usually post-coordinated by the modulator to amend the central concept in accordance with the relevant context. The qualifier comes after the modulator to describe the particular state of the central concept and/or amended concept. The keywords are further classified in 36 classes on the basis of the 10 parameters, of which 4 parameters are intrinsic, i.e. associativeness, chronological appearance, frequency of occurrence and category; and remaining 6 parameters are extrinsic, i.e. Clarity of meaning, type of meaning, scope of meaning, level of perception, mode of creation and area of occurrence. The number of classes under 4 intrinsic parameters is 16, while the same under 6 extrinsic parameters are 20. A new taxonomy of keywords has been proposed here that will help to analyze research-trend of a subject and also identify potential research-areas under its scope
How did the discussion go: Discourse act classification in social media conversations
We propose a novel attention based hierarchical LSTM model to classify
discourse act sequences in social media conversations, aimed at mining data
from online discussion using textual meanings beyond sentence level. The very
uniqueness of the task is the complete categorization of possible pragmatic
roles in informal textual discussions, contrary to extraction of
question-answers, stance detection or sarcasm identification which are very
much role specific tasks. Early attempt was made on a Reddit discussion
dataset. We train our model on the same data, and present test results on two
different datasets, one from Reddit and one from Facebook. Our proposed model
outperformed the previous one in terms of domain independence; without using
platform-dependent structural features, our hierarchical LSTM with word
relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively
to predict discourse roles of comments in Reddit and Facebook discussions.
Efficiency of recurrent and convolutional architectures in order to learn
discursive representation on the same task has been presented and analyzed,
with different word and comment embedding schemes. Our attention mechanism
enables us to inquire into relevance ordering of text segments according to
their roles in discourse. We present a human annotator experiment to unveil
important observations about modeling and data annotation. Equipped with our
text-based discourse identification model, we inquire into how heterogeneous
non-textual features like location, time, leaning of information etc. play
their roles in charaterizing online discussions on Facebook
Neurocognitive Informatics Manifesto.
Informatics studies all aspects of the structure of natural and artificial information systems. Theoretical and abstract approaches to information have made great advances, but human information processing is still unmatched in many areas, including information management, representation and understanding. Neurocognitive informatics is a new, emerging field that should help to improve the matching of artificial and natural systems, and inspire better computational algorithms to solve problems that are still beyond the reach of machines. In this position paper examples of neurocognitive inspirations and promising directions in this area are given
- …