11,820 research outputs found
Annotation Graphs and Servers and Multi-Modal Resources: Infrastructure for Interdisciplinary Education, Research and Development
Annotation graphs and annotation servers offer infrastructure to support the
analysis of human language resources in the form of time-series data such as
text, audio and video. This paper outlines areas of common need among empirical
linguists and computational linguists. After reviewing examples of data and
tools used or under development for each of several areas, it proposes a common
framework for future tool development, data annotation and resource sharing
based upon annotation graphs and servers.Comment: 8 pages, 6 figure
Tweeting your Destiny: Profiling Users in the Twitter Landscape around an Online Game
Social media has become a major communication channel for communities
centered around video games. Consequently, social media offers a rich data
source to study online communities and the discussions evolving around games.
Towards this end, we explore a large-scale dataset consisting of over 1 million
tweets related to the online multiplayer shooter Destiny and spanning a time
period of about 14 months using unsupervised clustering and topic modelling.
Furthermore, we correlate Twitter activity of over 3,000 players with their
playtime. Our results contribute to the understanding of online player
communities by identifying distinct player groups with respect to their Twitter
characteristics, describing subgroups within the Destiny community, and
uncovering broad topics of community interest.Comment: Accepted at IEEE Conference on Games 201
Computational challenges, innovations and future of Scottish corpora
This chapter discusses the computational challenges and innovations encountered in the development of the Scottish corpora (the Scottish Corpus of Texts & Speech and the Corpus of Modern Scottish Writing), considers how tools for corpus analysis can encourage new audiences and complement existing resources, and explores possible future technological advances for corpus creation and exploitation
Selected Information Management Resources for Implementing New Knowledge Environments: An Annotated Bibliography
This annotated bibliography reviews scholarly work in the area of building and analyzing digital document collections with the aim of establishing a baseline of knowledge for work in the field of digital humanities. The bibliography is organized around three main topics: data stores, text corpora, and analytical facilitators. Each of these is then further divided into sub-topics to provide a broad snapshot of modern information management techniques for building and analyzing digital documents collections
Argumentation Mining in User-Generated Web Discourse
The goal of argumentation mining, an evolving research field in computational
linguistics, is to design methods capable of analyzing people's argumentation.
In this article, we go beyond the state of the art in several ways. (i) We deal
with actual Web data and take up the challenges given by the variety of
registers, multiple domains, and unrestricted noisy user-generated Web
discourse. (ii) We bridge the gap between normative argumentation theories and
argumentation phenomena encountered in actual data by adapting an argumentation
model tested in an extensive annotation study. (iii) We create a new gold
standard corpus (90k tokens in 340 documents) and experiment with several
machine learning methods to identify argument components. We offer the data,
source codes, and annotation guidelines to the community under free licenses.
Our findings show that argumentation mining in user-generated Web discourse is
a feasible but challenging task.Comment: Cite as: Habernal, I. & Gurevych, I. (2017). Argumentation Mining in
User-Generated Web Discourse. Computational Linguistics 43(1), pp. 125-17
Distributional semantic modeling: a revised technique to train term/word vector space models applying the ontology-related approach
We design a new technique for the distributional semantic modeling with a
neural network-based approach to learn distributed term representations (or
term embeddings) - term vector space models as a result, inspired by the recent
ontology-related approach (using different types of contextual knowledge such
as syntactic knowledge, terminological knowledge, semantic knowledge, etc.) to
the identification of terms (term extraction) and relations between them
(relation extraction) called semantic pre-processing technology - SPT. Our
method relies on automatic term extraction from the natural language texts and
subsequent formation of the problem-oriented or application-oriented (also
deeply annotated) text corpora where the fundamental entity is the term
(includes non-compositional and compositional terms). This gives us an
opportunity to changeover from distributed word representations (or word
embeddings) to distributed term representations (or term embeddings). This
transition will allow to generate more accurate semantic maps of different
subject domains (also, of relations between input terms - it is useful to
explore clusters and oppositions, or to test your hypotheses about them). The
semantic map can be represented as a graph using Vec2graph - a Python library
for visualizing word embeddings (term embeddings in our case) as dynamic and
interactive graphs. The Vec2graph library coupled with term embeddings will not
only improve accuracy in solving standard NLP tasks, but also update the
conventional concept of automated ontology development. The main practical
result of our work is the development kit (set of toolkits represented as web
service APIs and web application), which provides all necessary routines for
the basic linguistic pre-processing and the semantic pre-processing of the
natural language texts in Ukrainian for future training of term vector space
models.Comment: In English, 9 pages, 2 figures. Not published yet. Prepared for
special issue (UkrPROG 2020 conference) of the scientific journal "Problems
in programming" (Founder: National Academy of Sciences of Ukraine, Institute
of Software Systems of NAS Ukraine
- …