18,415 research outputs found
Sensing Subjective Well-being from Social Media
Subjective Well-being(SWB), which refers to how people experience the quality
of their lives, is of great use to public policy-makers as well as economic,
sociological research, etc. Traditionally, the measurement of SWB relies on
time-consuming and costly self-report questionnaires. Nowadays, people are
motivated to share their experiences and feelings on social media, so we
propose to sense SWB from the vast user generated data on social media. By
utilizing 1785 users' social media data with SWB labels, we train machine
learning models that are able to "sense" individual SWB from users' social
media. Our model, which attains the state-by-art prediction accuracy, can then
be used to identify SWB of large population of social media users in time with
very low cost.Comment: 12 pages, 1 figures, 2 tables, 10th International Conference, AMT
2014, Warsaw, Poland, August 11-14, 2014. Proceeding
Understanding confounding effects in linguistic coordination: an information-theoretic approach
We suggest an information-theoretic approach for measuring stylistic
coordination in dialogues. The proposed measure has a simple predictive
interpretation and can account for various confounding factors through proper
conditioning. We revisit some of the previous studies that reported strong
signatures of stylistic accommodation, and find that a significant part of the
observed coordination can be attributed to a simple confounding effect - length
coordination. Specifically, longer utterances tend to be followed by longer
responses, which gives rise to spurious correlations in the other stylistic
features. We propose a test to distinguish correlations in length due to
contextual factors (topic of conversation, user verbosity, etc.) and
turn-by-turn coordination. We also suggest a test to identify whether stylistic
coordination persists even after accounting for length coordination and
contextual factors
Recognition of nonmanual markers in American Sign Language (ASL) using non-parametric adaptive 2D-3D face tracking
This paper addresses the problem of automatically recognizing linguistically significant nonmanual expressions in American Sign Language from video. We develop a fully automatic system that is able to track facial expressions and head movements, and detect and recognize facial events continuously from video. The main contributions of the proposed framework are the following: (1) We have built a stochastic and adaptive ensemble of face trackers to address factors resulting in lost face track; (2) We combine 2D and 3D deformable face models to warp input frames, thus correcting for any variation in facial appearance resulting from changes in 3D head pose; (3) We use a combination of geometric features and texture features extracted from a canonical frontal representation. The proposed new framework makes it possible to detect grammatically significant nonmanual expressions from continuous signing and to differentiate successfully among linguistically significant expressions that involve subtle differences in appearance. We present results that are based on the use of a dataset containing 330 sentences from videos that were collected and linguistically annotated at Boston University
The Missing Link between Morphemic Assemblies and Behavioral Responses:a Bayesian Information-Theoretical model of lexical processing
We present the Bayesian Information-Theoretical (BIT) model of lexical processing: A mathematical model illustrating a novel approach to the modelling of language processes. The model shows how a neurophysiological theory of lexical processing relying on Hebbian association and neural assemblies can directly account for a variety of effects previously observed in behavioural experiments. We develop two information-theoretical measures of the distribution of usages of a morpheme or word, and use them to predict responses in three visual lexical decision datasets investigating inflectional morphology and polysemy. Our model offers a neurophysiological basis for the effects of
morpho-semantic neighbourhoods. These results demonstrate how distributed patterns of activation naturally result in the arisal of symbolic structures. We conclude by arguing that the modelling framework exemplified here, is
a powerful tool for integrating behavioural and neurophysiological results
Inducing Compact but Accurate Tree-Substitution Grammars
Tree substitution grammars (TSGs) are a compelling alternative to context-free grammars for modelling syntax. However, many popular techniques for estimating weighted TSGs (under the moniker of Data Oriented Parsing) suffer from the problems of inconsistency and over-fitting. We present a theoretically principled model which solves these problems using a Bayesian non-parametric formulation. Our model learns compact and simple grammars, uncovering latent linguistic structures (e.g., verb subcategorisation), and in doing so far out-performs a standard PCFG.
Diffusion of Lexical Change in Social Media
Computer-mediated communication is driving fundamental changes in the nature
of written language. We investigate these changes by statistical analysis of a
dataset comprising 107 million Twitter messages (authored by 2.7 million unique
user accounts). Using a latent vector autoregressive model to aggregate across
thousands of words, we identify high-level patterns in diffusion of linguistic
change over the United States. Our model is robust to unpredictable changes in
Twitter's sampling rate, and provides a probabilistic characterization of the
relationship of macro-scale linguistic influence to a set of demographic and
geographic predictors. The results of this analysis offer support for prior
arguments that focus on geographical proximity and population size. However,
demographic similarity -- especially with regard to race -- plays an even more
central role, as cities with similar racial demographics are far more likely to
share linguistic influence. Rather than moving towards a single unified
"netspeak" dialect, language evolution in computer-mediated communication
reproduces existing fault lines in spoken American English.Comment: preprint of PLOS-ONE paper from November 2014; PLoS ONE 9(11) e11311
Combining vocal tract length normalization with hierarchial linear transformations
Recent research has demonstrated the effectiveness of vocal tract length normalization (VTLN) as a rapid adaptation technique for statistical parametric speech synthesis. VTLN produces speech with naturalness preferable to that of MLLR-based adaptation techniques, being much closer in quality to that generated by the original av-erage voice model. However with only a single parameter, VTLN captures very few speaker specific characteristics when compared to linear transform based adaptation techniques. This paper pro-poses that the merits of VTLN can be combined with those of linear transform based adaptation in a hierarchial Bayesian frame-work, where VTLN is used as the prior information. A novel tech-nique for propagating the gender information from the VTLN prior through constrained structural maximum a posteriori linear regres-sion (CSMAPLR) adaptation is presented. Experiments show that the resulting transformation has improved speech quality with better naturalness, intelligibility and improved speaker similarity. Index Terms — Statistical parametric speech synthesis, hidden Markov models, speaker adaptation, vocal tract length normaliza-tion, constrained structural maximum a posteriori linear regression 1
Making "fetch" happen: The influence of social and linguistic context on nonstandard word growth and decline
In an online community, new words come and go: today's "haha" may be replaced
by tomorrow's "lol." Changes in online writing are usually studied as a social
process, with innovations diffusing through a network of individuals in a
speech community. But unlike other types of innovation, language change is
shaped and constrained by the system in which it takes part. To investigate the
links between social and structural factors in language change, we undertake a
large-scale analysis of nonstandard word growth in the online community Reddit.
We find that dissemination across many linguistic contexts is a sign of growth:
words that appear in more linguistic contexts grow faster and survive longer.
We also find that social dissemination likely plays a less important role in
explaining word growth and decline than previously hypothesized
Grip Force Reveals the Context Sensitivity of Language-Induced Motor Activity during “Action Words
Studies demonstrating the involvement of motor brain structures in language processing typically focus on \ud
time windows beyond the latencies of lexical-semantic access. Consequently, such studies remain inconclusive regarding whether motor brain structures are recruited directly in language processing or through post-linguistic conceptual imagery. In the present study, we introduce a grip-force sensor that allows online measurements of language-induced motor activity during sentence listening. We use this tool to investigate whether language-induced motor activity remains constant or is modulated in negative, as opposed to affirmative, linguistic contexts. Our findings demonstrate that this simple experimental paradigm can be used to study the online crosstalk between language and the motor systems in an ecological and economical manner. Our data further confirm that the motor brain structures that can be called upon during action word processing are not mandatorily involved; the crosstalk is asymmetrically\ud
governed by the linguistic context and not vice versa
- …