63,954 research outputs found
Robust Modeling of Epistemic Mental States
This work identifies and advances some research challenges in the analysis of
facial features and their temporal dynamics with epistemic mental states in
dyadic conversations. Epistemic states are: Agreement, Concentration,
Thoughtful, Certain, and Interest. In this paper, we perform a number of
statistical analyses and simulations to identify the relationship between
facial features and epistemic states. Non-linear relations are found to be more
prevalent, while temporal features derived from original facial features have
demonstrated a strong correlation with intensity changes. Then, we propose a
novel prediction framework that takes facial features and their nonlinear
relation scores as input and predict different epistemic states in videos. The
prediction of epistemic states is boosted when the classification of emotion
changing regions such as rising, falling, or steady-state are incorporated with
the temporal features. The proposed predictive models can predict the epistemic
states with significantly improved accuracy: correlation coefficient (CoERR)
for Agreement is 0.827, for Concentration 0.901, for Thoughtful 0.794, for
Certain 0.854, and for Interest 0.913.Comment: Accepted for Publication in Multimedia Tools and Application, Special
Issue: Socio-Affective Technologie
Ontology of core data mining entities
In this article, we present OntoDM-core, an ontology of core data mining
entities. OntoDM-core defines themost essential datamining entities in a three-layered
ontological structure comprising of a specification, an implementation and an application
layer. It provides a representational framework for the description of mining
structured data, and in addition provides taxonomies of datasets, data mining tasks,
generalizations, data mining algorithms and constraints, based on the type of data.
OntoDM-core is designed to support a wide range of applications/use cases, such as
semantic annotation of data mining algorithms, datasets and results; annotation of
QSAR studies in the context of drug discovery investigations; and disambiguation of
terms in text mining. The ontology has been thoroughly assessed following the practices
in ontology engineering, is fully interoperable with many domain resources and
is easy to extend
Predicting the Quality of Short Narratives from Social Media
An important and difficult challenge in building computational models for
narratives is the automatic evaluation of narrative quality. Quality evaluation
connects narrative understanding and generation as generation systems need to
evaluate their own products. To circumvent difficulties in acquiring
annotations, we employ upvotes in social media as an approximate measure for
story quality. We collected 54,484 answers from a crowd-powered
question-and-answer website, Quora, and then used active learning to build a
classifier that labeled 28,320 answers as stories. To predict the number of
upvotes without the use of social network features, we create neural networks
that model textual regions and the interdependence among regions, which serve
as strong benchmarks for future research. To our best knowledge, this is the
first large-scale study for automatic evaluation of narrative quality.Comment: 7 pages, 2 figures. Accepted at the 2017 IJCAI conferenc
Automated assessment of non-native learner essays: Investigating the role of linguistic features
Automatic essay scoring (AES) refers to the process of scoring free text
responses to given prompts, considering human grader scores as the gold
standard. Writing such essays is an essential component of many language and
aptitude exams. Hence, AES became an active and established area of research,
and there are many proprietary systems used in real life applications today.
However, not much is known about which specific linguistic features are useful
for prediction and how much of this is consistent across datasets. This article
addresses that by exploring the role of various linguistic features in
automatic essay scoring using two publicly available datasets of non-native
English essays written in test taking scenarios. The linguistic properties are
modeled by encoding lexical, syntactic, discourse and error types of learner
language in the feature set. Predictive models are then developed using these
features on both datasets and the most predictive features are compared. While
the results show that the feature set used results in good predictive models
with both datasets, the question "what are the most predictive features?" has a
different answer for each dataset.Comment: Article accepted for publication at: International Journal of
Artificial Intelligence in Education (IJAIED). To appear in early 2017
(journal url: http://www.springer.com/computer/ai/journal/40593
The Origins of Computational Mechanics: A Brief Intellectual History and Several Clarifications
The principle goal of computational mechanics is to define pattern and
structure so that the organization of complex systems can be detected and
quantified. Computational mechanics developed from efforts in the 1970s and
early 1980s to identify strange attractors as the mechanism driving weak fluid
turbulence via the method of reconstructing attractor geometry from measurement
time series and in the mid-1980s to estimate equations of motion directly from
complex time series. In providing a mathematical and operational definition of
structure it addressed weaknesses of these early approaches to discovering
patterns in natural systems.
Since then, computational mechanics has led to a range of results from
theoretical physics and nonlinear mathematics to diverse applications---from
closed-form analysis of Markov and non-Markov stochastic processes that are
ergodic or nonergodic and their measures of information and intrinsic
computation to complex materials and deterministic chaos and intelligence in
Maxwellian demons to quantum compression of classical processes and the
evolution of computation and language.
This brief review clarifies several misunderstandings and addresses concerns
recently raised regarding early works in the field (1980s). We show that
misguided evaluations of the contributions of computational mechanics are
groundless and stem from a lack of familiarity with its basic goals and from a
failure to consider its historical context. For all practical purposes, its
modern methods and results largely supersede the early works. This not only
renders recent criticism moot and shows the solid ground on which computational
mechanics stands but, most importantly, shows the significant progress achieved
over three decades and points to the many intriguing and outstanding challenges
in understanding the computational nature of complex dynamic systems.Comment: 11 pages, 123 citations;
http://csc.ucdavis.edu/~cmg/compmech/pubs/cmr.ht
Predicting Good Configurations for GitHub and Stack Overflow Topic Models
Software repositories contain large amounts of textual data, ranging from
source code comments and issue descriptions to questions, answers, and comments
on Stack Overflow. To make sense of this textual data, topic modelling is
frequently used as a text-mining tool for the discovery of hidden semantic
structures in text bodies. Latent Dirichlet allocation (LDA) is a commonly used
topic model that aims to explain the structure of a corpus by grouping texts.
LDA requires multiple parameters to work well, and there are only rough and
sometimes conflicting guidelines available on how these parameters should be
set. In this paper, we contribute (i) a broad study of parameters to arrive at
good local optima for GitHub and Stack Overflow text corpora, (ii) an
a-posteriori characterisation of text corpora related to eight programming
languages, and (iii) an analysis of corpus feature importance via per-corpus
LDA configuration. We find that (1) popular rules of thumb for topic modelling
parameter configuration are not applicable to the corpora used in our
experiments, (2) corpora sampled from GitHub and Stack Overflow have different
characteristics and require different configurations to achieve good model fit,
and (3) we can predict good configurations for unseen corpora reliably. These
findings support researchers and practitioners in efficiently determining
suitable configurations for topic modelling when analysing textual data
contained in software repositories.Comment: to appear as full paper at MSR 2019, the 16th International
Conference on Mining Software Repositorie
- …