8,557 research outputs found
The Use of Lexical and Referential Cues in Childrenâs Online Interpretation of Adjectives
Recent research on moment-to-moment language comprehension has revealed striking differences between adults and preschool children. Adults rapidly use the referential principle to resolve syntactic ambiguity, assuming that modification is more likely when there are 2 possible referents for a definite noun phrase. Young children do not. We examine the scope of this phenomenon by exploring whether children use the referential principle to resolve another form of ambiguity. Scalar adjectives (big, small) are typically used to refer to an object when contrasting members of the same category are present in the scene (big and small coins). In the present experiment, 5-year-olds and adults heard instructions like âPoint to the big (small) coinâ while their eye-movements were measured to displays containing 1 or 2 coins. Both groups rapidly recruited the meaning of the adjective to distinguish between referents of different sizes. Critically, like adults, children were quicker to look to the correct item in trials containing 2 possible referents compared with 1. Nevertheless, children's sensitivity to the referential principle was substantially delayed compared to adults', suggesting possible differences in the recruitment of this top- down cue. The implications of current and previous findings are discussed with respect to the development of the architecture of language comprehension.LinguisticsPsycholog
Syntactic Topic Models
The syntactic topic model (STM) is a Bayesian nonparametric model of language
that discovers latent distributions of words (topics) that are both
semantically and syntactically coherent. The STM models dependency parsed
corpora where sentences are grouped into documents. It assumes that each word
is drawn from a latent topic chosen by combining document-level features and
the local syntactic context. Each document has a distribution over latent
topics, as in topic models, which provides the semantic consistency. Each
element in the dependency parse tree also has a distribution over the topics of
its children, as in latent-state syntax models, which provides the syntactic
consistency. These distributions are convolved so that the topic of each word
is likely under both its document and syntactic context. We derive a fast
posterior inference algorithm based on variational methods. We report
qualitative and quantitative studies on both synthetic data and hand-parsed
documents. We show that the STM is a more predictive model of language than
current models based only on syntax or only on topics
Recommended from our members
Corpus approaches to language in the media
The main aim of this chapter is to offer an overview of research that has adopted the methodology of Corpus Linguistics to study aspects of language use in the media. The overview begins by introducing the key principles and analytical tools adopted in corpus research. To demonstrate the contribution of corpus approaches to media linguistics, a selection of recent corpus studies is subsequently discussed. The final section summarises the strengths and limitations of corpus approaches and discusses avenues for further research
Towards an Indexical Model of Situated Language Comprehension for Cognitive Agents in Physical Worlds
We propose a computational model of situated language comprehension based on
the Indexical Hypothesis that generates meaning representations by translating
amodal linguistic symbols to modal representations of beliefs, knowledge, and
experience external to the linguistic system. This Indexical Model incorporates
multiple information sources, including perceptions, domain knowledge, and
short-term and long-term experiences during comprehension. We show that
exploiting diverse information sources can alleviate ambiguities that arise
from contextual use of underspecific referring expressions and unexpressed
argument alternations of verbs. The model is being used to support linguistic
interactions in Rosie, an agent implemented in Soar that learns from
instruction.Comment: Advances in Cognitive Systems 3 (2014
Distributional Effects of Gender Contrasts Across Categories
This paper proposes a methodology for comparing grammatical contrasts across categories with the tools of distributional semantics. After outlining why such a comparison is relevant to current theoretical work on gender and other morphosyntactic features, we present intrinsic and extrinsic predictability as instruments for analyzing semantic contrasts between pairs of words. We then apply our method to a dataset of gender pairs of French nouns and adjectives. We find that, while the distributional effect of gender is overall less predictable for nouns than for adjectives, it is heavily influenced by semantic properties of the adjectives
Detecting and Monitoring Hate Speech in Twitter
Social Media are sensors in the real world that can be used to measure the pulse of societies.
However, the massive and unfiltered feed of messages posted in social media is a phenomenon that
nowadays raises social alarms, especially when these messages contain hate speech targeted to a
specific individual or group. In this context, governments and non-governmental organizations
(NGOs) are concerned about the possible negative impact that these messages can have on individuals
or on the society. In this paper, we present HaterNet, an intelligent system currently being used by
the Spanish National Office Against Hate Crimes of the Spanish State Secretariat for Security that
identifies and monitors the evolution of hate speech in Twitter. The contributions of this research
are many-fold: (1) It introduces the first intelligent system that monitors and visualizes, using social
network analysis techniques, hate speech in Social Media. (2) It introduces a novel public dataset on
hate speech in Spanish consisting of 6000 expert-labeled tweets. (3) It compares several classification
approaches based on different document representation strategies and text classification models. (4)
The best approach consists of a combination of a LTSM+MLP neural network that takes as input the
tweetâs word, emoji, and expression tokensâ embeddings enriched by the tf-idf, and obtains an area
under the curve (AUC) of 0.828 on our dataset, outperforming previous methods presented in the
literatureThe work by Quijano-Sanchez was supported by the Spanish Ministry of Science and Innovation
grant FJCI-2016-28855. The research of Liberatore was supported by the Government of Spain, grant MTM2015-65803-R, and by the European Unionâs Horizon 2020 Research and Innovation Programme, under the Marie Sklodowska-Curie grant agreement No. 691161 (GEOSAFE). All the financial support is gratefully acknowledge
Analysis of characteristics of semantics of spoken language in normally developing Hindi speaking children
Background: There appears to be a lack of database of and dearth of studies focusing on the characteristics of semantics in Hindi speaking school aged children. Such a data base will be useful for building vocabulary for language disordered children and for constructing AAC boards for non-verbal children. Hence, it is essential to study the characteristics of semantics of normally developing children. This paper focuses on describing the semantic characteristics of spoken language in Hindi speaking children.Methods: 200 normally developing Hindi speaking children within the age group of 3 - 7 years were shown and instructed to describe three validated pictures of daily events. The responses were recorded and transcribed. Analyses included type-token ratio, frequency of occurrence and comparisons between different word classes.Results: Percentage of nouns is highest followed by verbs, pronouns, adjectives. Frequency of occurrence of words increases with increase in age. The common words with high frequency of occurrence are hÆ, huÌ, rÎhe, rÎha, rÎhi, dÆ·a, Ér, khel, gaÉi, log, pe, ke. There appears to be marked increase in different classes of words, one at 4 yrs of age (after Sr. KG) and other at 6 yrs of age (standard I).Conclusions: One of the highlighting features of this study is the huge database of semantics (of spoken language) collected from 200 school going children. Creating such a database and utilizing it for assessing language of the disordered population appears to be the need of the hour.
- âŠ