20 research outputs found
Predicting Lexical Norms: A Comparison between a Word Association Model and Text-Based Word Co-occurrence Models
In two studies we compare a distributional semantic model derived from word co-occurrences and a word association based model in their ability to predict properties that affect lexical processing. We focus on age of acquisition, concreteness, and three affective variables, namely valence, arousal, and dominance, since all these variables have been shown to be fundamental in word meaning. In both studies we use a model based on data obtained in a continued free word association task to predict these variables. In Study 1 we directly compare this model to a word co-occurrence model based on syntactic dependency relations to see which model is better at predicting the variables under scrutiny in Dutch. In Study 2 we replicate our findings in English and compare our results to those reported in the literature. In both studies we find the word association-based model fit to predict diverse word properties. Especially in the case of predicting affective word properties, we show that the association model is superior to the distributional model
When cheating is an honest mistake
Dishonesty is an intriguing phenomenon, studied extensively across various disciplines due to its impact on people’s lives as well as society in general. To examine dishonesty in a controlled setting, researchers have developed a number of experimental paradigms. One of the most popular approaches in this regard, is the matrix task, in which participants receive matrices wherein they have to find two numbers that sum to 10 (e.g., 4.81 and 5.19), under time pressure. In a next phase, participants need to report how many matrices they had solved correctly, allowing them the opportunity to cheat by exaggerating their performance in order to get a larger reward. Here, we argue, both on theoretical and empirical grounds, that the matrix task is ill-suited to study dishonest behavior, primarily because it conflates cheating with honest mistakes. We therefore recommend researchers to use different paradigms to examine dishonesty, and treat (previous) findings based on the matrix task with due caution
Subjective Dutch word ratings for semantic gender and the relationship with lexical and affective variables
We collected ratings for 24.038 Dutch words for semantic gender. Every word was rated by 10 men and 10 women and every participant rated one of four lists, ca. 6000.words. Correlations with other lexical and affective variables were low arguing it to be an independent variable.status: publishe
Predicting affective word variables
We compared two semantic models in the quality of predicting human ratings of a number of variables with an emphasis on affective variables. We used an external language model, where the meaning of words is derived from the context in which they occur in a text corpus, and an internal language model based on data from a continuous free word association task. Using these models we predict valence, arousal, and dominance (three affective variables argued to be important in semantics), and concreteness and age of acquisition (AoA). Furthermore, we also test how these models can account for the theory that abstract words make more use of affective information.
We used all Dutch words that were available in the different datasets that were involved. For these 2,831 words, we predicted our variables of interest using two methods: (1) the projected location on a direction identified as the variable of interest (using property fitting) in multidimensional representations obtained by applying multi-dimensional scaling on pairwise similarities, and (2) using the mean of the k-nearest neighbors of the target item based on pairwise similarities. We used these two methods for both semantic models varying the same set of parameters, and analyses were cross-validated using a leave-one-out approach. For all affective variables and parameter values (dimensionality of MDS, and values of k), we found higher agreement between prediction and human ratings using the association model. Using the best method and parameters, we obtained correlations of .92, .85, and .85, for valence, arousal, and dominance, respectively. The corresponding correlations based on the text corpus were .80, .74, and .67, respectively. For concreteness and AoA, the highest correlations obtained from both types of data were similar, .88 and .73, respectively.
After doing a median split on concreteness that separates the data in relatively concrete and relatively abstract words, the prediction of all affective variables for abstract words was better than the predictions for concrete words when using the association model. For the text corpus model, this was not the case: only the prediction of valence was better for abstract words. Predictions of arousal were a lot worse for abstract words. For dominance, there was no difference between abstract and concrete.
All in all, we showed that the word association model is better at capturing affective word variables than the text corpus model we used, and that predictions based on word associations align with the theory that abstract words make more use of affective information.status: publishe
Comparing predictions of lexical norm data obtained using word associations and word collocation
Introduction: We compared the quality of prediction of word variables based on a Dutch word association corpus and a text corpus. We derived estimates for three affective word variables: valence, arousal, and dominance, and two non-affective word variables: concreteness and age of acquisition (AoA). Material and methods: For 2,831 words with ratings on each variable, we used three methods to generate the predictions. All three methods rely on the similarity between pairs of words, which was obtained both using word associations and word collocation: (1) using projections on a dimension identified as the variable in question through property fitting in a multidimensional representation of the pairwise similarities, (2) using the mean of the variable for the k-nearest neighbors determined by the pairwise similarities, and (3) using the k-nearest neighbors values, weighted according to their proximity. Results and Conclusions: For all variables except concreteness and AoA, estimates were superior when based on word associations. Differences between the predictions of the three methods were small, although method three consistently yielded the best predictions. Based on the word association corpus it yielded correlations of .92, .85, and .85, for valence, arousal, and dominance, respectively. Its corresponding correlations based on the text corpus were .80, .74, and .67, respectively. For concreteness and AoA, both the association and the text corpus yielded correlations of .88 and .73, respectively. Based on these results, we believe word associations are better at capturing human ratings of affective word variables.
Email: Hendrik Vankrunkelsven, [email protected]: publishe
Predicting lexical norms using a word association corpus
Obtaining norm scores for subjective properties of words can be quite cumbersome as it requires a considerable investment proportional to the size of the word set. We present a method to predict norm scores for large word sets from a word association corpus. We use similarities between word pairs, derived from this corpus, to construct a semantic space. Starting from norm scores for a subset of the words, we retrieve the direction in the space that optimally reflects the norm data associated with the words. This direction is used to orthogonally project all the other words in the semantic space on, providing predictions of the words on the variable of interest. In this study, we predict valence, arousal, dominance, age of acquisition, and concreteness and show that the predictions correlate strongly with the judgments of human raters. Furthermore, we show that our predictions are superior to those derived using other methods.status: publishe
Recommended from our members
Comparing predictions of lexical norm data obtained using word associations andword collocation
We compared the quality of prediction of word variables based on a Dutch word association and text corpus. Wederived estimates for: valence, arousal, dominance, concreteness and age of acquisition (AoA) for 2831 words. Based on thesimilarity between words we: (1) used projections on a dimension identified as the variable in question in a multidimensionalrepresentation, (2) used the k-nearest neighbors values, weighted according to their proximity. Estimates prevailed when basedon word associations. Differences between the predictions of the two methods were small. Based on the word association corpusit yielded correlations of .92, .85, and .85, for valence, arousal, and dominance, respectively. Its corresponding correlationsbased on the text corpus were .80, .74, and .67. For concreteness and AoA, both the association and the text corpus yieldedcorrelations of .88 and .73, respectively. This suggests word associations are better at capturing human ratings of affective wordvariables
Recommended from our members
Predicting Lexical Norms Using a Word Association Corpus
Obtaining norm scores for subjective properties of words can
be quite cumbersome as it requires a considerable investment
proportional to the size of the word set. We present a method
to predict norm scores for large word sets from a word
association corpus. We use similarities between word pairs,
derived from this corpus, to construct a semantic space.
Starting from norm scores for a subset of the words, we
retrieve the direction in the space that optimally reflects the
norm data associated with the words. This direction is used to
orthogonally project all the other words in the semantic space
on, providing predictions of the words on the variable of
interest. In this study, we predict valence, arousal, dominance,
age of acquisition, and concreteness and show that the
predictions correlate strongly with the judgments of human
raters. Furthermore, we show that our predictions are superior
to those derived using other method
Sound-symbolism effects in the absence of awareness. A replication study
People have been shown to link particular sounds with particular shapes. For instance, the round-sounding non-word bouba tends to be associated with curved shapes, whereas the sharp-sounding non-word kiki is deemed to be related to angular shapes. This tendency of people to associate sounds and shapes has been observed across different languages. In the present study, we re-examined the claim of Hung, Styles, and Hsieh (2017) that such sound-shape mappings can occur before becoming aware of the visual stimuli. More precisely, we replicated their first experiment in which congruent and incongruent stimuli (e.g., bouba presented in a round or an angular shape, respectively) were rendered invisible through continuous flash suppression. The results showed that congruent combinations, on average, broke suppression faster than incongruent stimuli, thus providing converging evidence for Hung and colleagues’ assertions. Collectively, these findings now provide a solid basis from which to explore the boundary conditions of the effect