16,422 research outputs found
Measuring, Predicting and Visualizing Short-Term Change in Word Representation and Usage in VKontakte Social Network
Language in social media is extremely dynamic: new words emerge, trend and
disappear, while the meaning of existing words can fluctuate over time. Such
dynamics are especially notable during a period of crisis. This work addresses
several important tasks of measuring, visualizing and predicting short term
text representation shift, i.e. the change in a word's contextual semantics,
and contrasting such shift with surface level word dynamics, or concept drift,
observed in social media streams. Unlike previous approaches on learning word
representations from text, we study the relationship between short-term concept
drift and representation shift on a large social media corpus - VKontakte posts
in Russian collected during the Russia-Ukraine crisis in 2014-2015. Our novel
contributions include quantitative and qualitative approaches to (1) measure
short-term representation shift and contrast it with surface level concept
drift; (2) build predictive models to forecast short-term shifts in meaning
from previous meaning as well as from concept drift; and (3) visualize
short-term representation shift for example keywords to demonstrate the
practical use of our approach to discover and track meaning of newly emerging
terms in social media. We show that short-term representation shift can be
accurately predicted up to several weeks in advance. Our unique approach to
modeling and visualizing word representation shifts in social media can be used
to explore and characterize specific aspects of the streaming corpus during
crisis events and potentially improve other downstream classification tasks
including real-time event detection
The Cult of Word Fasting
This short story replaces the Kickshaws feature for this issue
On Experiencing Meaning: Irreducible Cognitive Phenomenology and Sinewave Speech
Upon first hearing sinewaves, all that can be discerned are beeps and whistles. But after hearing the original speech, the beeps and whistles sound like speech. The difference between these two episodes undoubtedly involves an alteration in phenomenal character. O’Callaghan (2011) argues that this alteration is non-sensory, but he leaves open the possibility of attributing it to some other source, e.g. cognition. I discuss whether the alteration in phenomenal character involved in sinewave speech provides evidence for cognitive phenomenology. I defend both the existence of cognitive phenomenology and the phenomenal contrast method, as each concerns the case presented here
Learning Visual Reasoning Without Strong Priors
Achieving artificial visual reasoning - the ability to answer image-related
questions which require a multi-step, high-level process - is an important step
towards artificial general intelligence. This multi-modal task requires
learning a question-dependent, structured reasoning process over images from
language. Standard deep learning approaches tend to exploit biases in the data
rather than learn this underlying structure, while leading methods learn to
visually reason successfully but are hand-crafted for reasoning. We show that a
general-purpose, Conditional Batch Normalization approach achieves
state-of-the-art results on the CLEVR Visual Reasoning benchmark with a 2.4%
error rate. We outperform the next best end-to-end method (4.5%) and even
methods that use extra supervision (3.1%). We probe our model to shed light on
how it reasons, showing it has learned a question-dependent, multi-step
process. Previous work has operated under the assumption that visual reasoning
calls for a specialized architecture, but we show that a general architecture
with proper conditioning can learn to visually reason effectively.Comment: Full AAAI 2018 paper is at arXiv:1709.07871. Presented at ICML 2017's
Machine Learning in Speech and Language Processing Workshop. Code is at
http://github.com/ethanjperez/fil
Towards a corpus-based, statistical approach of translation quality : measuring and visualizing linguistic deviance in student translations
In this article we present a corpus-based statistical approach to measuring translation quality, more particularly translation acceptability, by comparing the features of translated and original texts. We discuss initial findings that aim to support and objectify formative quality assessment. To that end, we extract a multitude of linguistic and textual features from both student and professional translation corpora that consist of many different translations by several translators in two different genres (fiction, news) and in two translation directions (English to French and French to Dutch). The numerical information gathered from these corpora is exploratively analysed with Principal Component Analysis, which enables us to identify stable, language-independent linguistic and textual indicators of student translations compared to translations produced by professionals. The differences between these types of translation are subsequently tested by means of ANOVA. The results clearly indicate that the proposed methodology is indeed capable of distinguishing between student and professional translations. It is claimed that this deviant behaviour indicates an overall lower translation quality in student translations: student translations tend to score lower at the acceptability level, that is, they deviate significantly from target-language norms and conventions. In addition, the proposed methodology is capable of assessing the acceptability of an individual student’s translation – a smaller linguistic distance between a given student translation and the norm set by the professional translations correlates with higher quality. The methodology is also able to provide objective and concrete feedback about the divergent linguistic dimensions in their text
ANNIS: a linguistic database for exploring information structure
In this paper, we discuss the design and implementation of our first version of the database "ANNIS" (ANNotation of Information Structure). For research based on empirical data, ANNIS provides a uniform environment for storing this data together with its linguistic annotations. A central database promotes standardized annotation, which facilitates interpretation and comparison of the data. ANNIS is used through a standard web browser and offers tier-based visualization of data and annotations, as well as search facilities that allow for cross-level and cross-sentential queries. The paper motivates the design of the system, characterizes its user interface, and provides an initial technical evaluation of ANNIS with respect to data size and query processing
Recommended from our members
Visualizing the Boni dialects with Historical Glottometry
This paper deals with the historical relations between dialects of Boni, a Cushitic language of Kenya and Somalia. Boni forms the subject of Volume 10 of the Language and Dialect Atlas of Kenya (Heine & Möhlig 1982). Heine presents evidence for three subgroups within Boni, as well as several areas of convergence between dialects belonging to different proposed subgroups. In reviewing his evidence, I find that two of the three splits are not supported by the data, and therefore his conclusions on convergence must also be reinterpreted. Given the presence of numerous intersecting isoglosses, the tree diagram is an inappropriate model for describing the relations between Boni dialects, and I turn to Historical Glottometry (Kalyan & François 2018) to provide a visualization of the data
- …