3,587 research outputs found
How did the discussion go: Discourse act classification in social media conversations
We propose a novel attention based hierarchical LSTM model to classify
discourse act sequences in social media conversations, aimed at mining data
from online discussion using textual meanings beyond sentence level. The very
uniqueness of the task is the complete categorization of possible pragmatic
roles in informal textual discussions, contrary to extraction of
question-answers, stance detection or sarcasm identification which are very
much role specific tasks. Early attempt was made on a Reddit discussion
dataset. We train our model on the same data, and present test results on two
different datasets, one from Reddit and one from Facebook. Our proposed model
outperformed the previous one in terms of domain independence; without using
platform-dependent structural features, our hierarchical LSTM with word
relevance attention mechanism achieved F1-scores of 71\% and 66\% respectively
to predict discourse roles of comments in Reddit and Facebook discussions.
Efficiency of recurrent and convolutional architectures in order to learn
discursive representation on the same task has been presented and analyzed,
with different word and comment embedding schemes. Our attention mechanism
enables us to inquire into relevance ordering of text segments according to
their roles in discourse. We present a human annotator experiment to unveil
important observations about modeling and data annotation. Equipped with our
text-based discourse identification model, we inquire into how heterogeneous
non-textual features like location, time, leaning of information etc. play
their roles in charaterizing online discussions on Facebook
mARC: Memory by Association and Reinforcement of Contexts
This paper introduces the memory by Association and Reinforcement of Contexts
(mARC). mARC is a novel data modeling technology rooted in the second
quantization formulation of quantum mechanics. It is an all-purpose incremental
and unsupervised data storage and retrieval system which can be applied to all
types of signal or data, structured or unstructured, textual or not. mARC can
be applied to a wide range of information clas-sification and retrieval
problems like e-Discovery or contextual navigation. It can also for-mulated in
the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast
to Conway approach, the objects evolve in a massively multidimensional space.
In order to start evaluating the potential of mARC we have built a mARC-based
Internet search en-gine demonstrator with contextual functionality. We compare
the behavior of the mARC demonstrator with Google search both in terms of
performance and relevance. In the study we find that the mARC search engine
demonstrator outperforms Google search by an order of magnitude in response
time while providing more relevant results for some classes of queries
Predicting Speech Acts in MOOC Forum Posts Using Conditional Random Fields
Massive Open Online Courses (MOOCs) have emerged as a way to reach large numbers of students by providing course materials as free online resources. The popularity of these courses has been reflected in high enrollment numbers, however it is unclear how successful MOOCs are at educating their students given their high attrition rates. One cause for this may be due to instructors' inability to manage the large number of students that enroll. While discussion forums are available for students to seek help, instructors are unable to monitor the large number of posts written in these forums. This study investigates the effectiveness of using machine learning models to classify posts into speech acts as a way to help instructors monitor these discussion forums. Speech acts describe the purpose of a post and may be indicative of common functions such as asking questions or raising issues. A linear classifier is compared against a conditional random field (CRF) classifier, which is able to leverage contextual information about the forum in order to make predictions. The results of this study find that CRFs outperform a simpler linear classifier, and this suggests that casting this prediction problem as a sequence labeling task is fruitful for predicting these speech acts, and automatically identifying posts of interest.Master of Science in Information Scienc
- …