13 research outputs found
Validation of an open source, remote web‐based eye‐tracking method (WebGazer) for research in early childhood
Measuring eye movements remotely via the participant's webcam promises to be an attractive methodological addition to in-person eye-tracking in the lab. However, there is a lack of systematic research comparing remote web-based eye-tracking with in-lab eye-tracking in young children. We report a multi-lab study that compared these two measures in an anticipatory looking task with toddlers using WebGazer.js and jsPsych. Results of our remotely tested sample of 18-27-month-old toddlers (N = 125) revealed that web-based eye-tracking successfully captured goal-based action predictions, although the proportion of the goal-directed anticipatory looking was lower compared to the in-lab sample (N = 70). As expected, attrition rate was substantially higher in the web-based (42%) than the in-lab sample (10%). Excluding trials based on visual inspection of the match of time-locked gaze coordinates and the participant's webcam video overlayed on the stimuli was an important preprocessing step to reduce noise in the data. We discuss the use of this remote web-based method in comparison with other current methodological innovations. Our study demonstrates that remote web-based eye-tracking can be a useful tool for testing toddlers, facilitating recruitment of larger and more diverse samples; a caveat to consider is the larger drop-out rate
Polysemy resolution with word embedding models and data visualization : the case of adverbial postpositions -ey, -eyse, and -(u)lo in Korean
Ce projet de thèse présente des comptes rendus informatiques de la résolution de la polysémie au niveau des mots dans une langue peu étudiée—le Coréen. Les postpositions, qui se caractérisent par une correspondance forme-fonction multiple et qui sont donc polysémiques par nature, posent un défi à l'analyse automatique et à la performance des modèles pour identifier leurs fonctions. Dans ce projet, je consolide les modèles existants de classification de vecteur au niveau du mot (Positive Pointwise Mutual Information et Singular Value Decomposition; Skip-Gram and Negative Sampling) en tenant compte du Window du contexte, et j'introduis un modèle de classification de vecteur au niveau de la phrase (Bidirectional Encoder Representations from Transformers (BERT)) dans le cadre de la modélisation sémantique distributionnelle. Par ailleurs, je développe deux systèmes de visualisation qui montrent (i) les relations entre les postpositions et leurs mots co-occurrents pour les modèles de vecteur au niveau du mot, et (ii) les clusters entre les phrases pour le modèle de vecteur au niveau de la phrase. Ces systèmes de visualisation ont l'avantage de mieux comprendre comment ces modèles de classification classent les fonctions prévues de ces postpositions. Les résultats montrent que, alors que la performance des modèles de vecteur au niveau du mot est modulée par la taille des corpus d'entraînement contenant les fonctions spécifiques des postpositions, le modèle de vecteur au niveau des phrases est stable (i.e., moins affecté par la taille du corpus) et simule la façon dont les humains reconnaissent la polysémie des postpositions adverbiales coréennes de façon plus appropriée que les modèles de vecteur au niveau du mot.This dissertation reports computational accounts of resolving word-level polysemy in a lesser-studied language—Korean. Postpositions, which are characterized as multiple form-function mapping and thus polysemous in nature, pose a challenge to automatic analysis and model performance in identifying their functions. In this project, I enhance the existing word-level embedding classification models (Positive Pointwise Mutual Information and Singular Value Decomposition; Skip-Gram and Negative Sampling) with the consideration of context window, and introduce a sentence-level embedding classification model (Bidirectional Encoder Representations from Transformers (BERT)) under the scheme of Distributional Semantic Modeling. I then develop two visualization systems that show (i) relationships of the postpositions and their co- occurring words for word-level embedding models, and (ii) clusters between sentences for the sentence-level embedding model. These visualization systems have an advantage to better understand how these classification models classify the intended functions of these postpositions. Results show that, whereas the performance of the word-level embedding models is modulated by the size of training corpora containing specific functions of the postpositions, the sentence-level embedding model performs in a stable way (i.e., less affected by the corpus size) and simulates how humans recognize the polysemy involving Korean adverbial postpositions more appropriately than the word-level embedding models do
La résolution de la polysémie à l'aide de modèles de vecteur de mots et la visualisation de données : le cas des postpositions adverbiales -ey, -eyse, et -(u)lo en coréen
This dissertation reports computational accounts of resolving word-level polysemy in a lesser-studied language—Korean. Postpositions, which are characterized as multiple form-function mapping and thus polysemous in nature, pose a challenge to automatic analysis and model performance in identifying their functions. In this project, I enhance the existing word-level embedding classification models (Positive Pointwise Mutual Information and Singular Value Decomposition; Skip-Gram and Negative Sampling) with the consideration of context window, and introduce a sentence-level embedding classification model (Bidirectional Encoder Representations from Transformers (BERT)) under the scheme of Distributional Semantic Modeling. I then develop two visualization systems that show (i) relationships of the postpositions and their co- occurring words for word-level embedding models, and (ii) clusters between sentences for the sentence-level embedding model. These visualization systems have an advantage to better understand how these classification models classify the intended functions of these postpositions. Results show that, whereas the performance of the word-level embedding models is modulated by the size of training corpora containing specific functions of the postpositions, the sentence-level embedding model performs in a stable way (i.e., less affected by the corpus size) and simulates how humans recognize the polysemy involving Korean adverbial postpositions more appropriately than the word-level embedding models do.Ce projet de thèse présente des comptes rendus informatiques de la résolution de la polysémie au niveau des mots dans une langue peu étudiée—le Coréen. Les postpositions, qui se caractérisent par une correspondance forme-fonction multiple et qui sont donc polysémiques par nature, posent un défi à l'analyse automatique et à la performance des modèles pour identifier leurs fonctions. Dans ce projet, je consolide les modèles existants de classification de vecteur au niveau du mot (Positive Pointwise Mutual Information et Singular Value Decomposition; Skip-Gram and Negative Sampling) en tenant compte du Window du contexte, et j'introduis un modèle de classification de vecteur au niveau de la phrase (Bidirectional Encoder Representations from Transformers (BERT)) dans le cadre de la modélisation sémantique distributionnelle. Par ailleurs, je développe deux systèmes de visualisation qui montrent (i) les relations entre les postpositions et leurs mots co-occurrents pour les modèles de vecteur au niveau du mot, et (ii) les clusters entre les phrases pour le modèle de vecteur au niveau de la phrase. Ces systèmes de visualisation ont l'avantage de mieux comprendre comment ces modèles de classification classent les fonctions prévues de ces postpositions. Les résultats montrent que, alors que la performance des modèles de vecteur au niveau du mot est modulée par la taille des corpus d'entraînement contenant les fonctions spécifiques des postpositions, le modèle de vecteur au niveau des phrases est stable (i.e., moins affecté par la taille du corpus) et simule la façon dont les humains reconnaissent la polysémie des postpositions adverbiales coréennes de façon plus appropriée que les modèles de vecteur au niveau du mot
Recommended from our members
Limits on Neural Networks: Agent-First Strategy in Child Comprehension
This study investigates how neural networks reveal developmental trajectories of child language, focusing on the Agent-First strategy in comprehension of an active transitive construction in Korean. We develop three models (LSTM; BERT; GPT-2) and measure their classification performance on the test stimuli used in Shin (2021) involving scrambling and omission of constructional components at varying degrees. Results show that, despite some compatibility of these models’ performance with the children’s response patterns, their performance does not fully approximate the children’s utilisation of this strategy, demonstrating by-model and by-condition asymmetries. This study’s findings suggest that neural networks can utilise information about formal co-occurrences to access the intended message to a certain degree, but the outcome of this process may be substantially different from how a child (as a developing processor) engages in comprehension. This implies some limits of neural networks on revealing the developmental trajectories of child language
Recommended from our members
Neural network modelling on Korean monolingual children’s comprehension of suffixal passive construction in Korean
This study explores a GPT-2 architecture’s capacity to capture monolingual children’s comprehension behaviour in Korean, a language underexplored in this context. We examine its performance in processing a suffixal passive construction involving verbal morphology and the interpretive procedures driven by that morphology. Through model fine-tuning via patching and hyperparameter variations, we assess their classification accuracy on test items used in Shin (2022a). Results show discrepancies in simulating children’s response patterns, highlighting the limitations of neural networks in capturing child language features. This prompts further investigation into computational models’ capacity to elucidate developmental trajectories of child language that have been unveiled through corpus-based or experimental research
How can we capture Multiword Expressions?
International audienc
PreechVis: Visual profiling using multiple-word combinations
International audienceWords in the corpus include features and information, and the visualizing of such words can improve the user's understanding of them. Those words may be consist of one-word or they may be a combination of words that together. The latter is referred to as a multiword expressions (MWEs). And if we analyze both single word and multiword with visualization, we can get more accurate results and more information than when we analyze only single word from corpus. An interactive-visualization can be useful for analyzing multiword expressions, because the following features are of interest to linguistics scholars: (1) Showing the combinations of POS pattern, (2) exploring the results according to the POS combination pattern, and (3) searching the source corpus for the verification. Therefore, we propose PreechVis, an interactive visualization tool that includes all of the requisite functions for an analysis using multiwords (http://ressources.modyco.fr/sm/PreechVisMWE/). For the present study, we used a total of 957 speeches, 164,646 sentences and 3,698,617 tokens of 43 U.S. Presidents from George Washington to Barack Obama as the corpus. PreechVis is divided into two views. In the first view, the system consists of a combination of Sunburst and RadVis. Through the Sunburst, we present the POS and its combination patterns for each gram. In RadVis, the Presidents were positioned according to their frequency value. In addition, when the President was selected, the frequency value was displayed on Sunburst to improve the user's understanding. In the second view, the user can simultaneously confirm and verify the details of the result using the Wordcloud. The two different views are synchronized each other and easy to change by the selected grams, issues, and presidents. With the experiments and case studies on the U.S. President speeches, we verified the effectiveness and usability of PreechVis
Caractérisation de genres discursifs à l'aide de traits prosodiques dans un treebank de référence en français parlé
International audienceRhapsodie is a 33000-word treebank of spoken French that is annotated for syntax and prosody. It breaks down into 57 five-minute long samples produced by 89 male and female speakers. The discourse profile of each sample is captured by six variables: event structure (dialogue vs. monologue), social context (public vs. private), genre (argumentation, description, narrative, oratory, and procedural), interactivity (interactive, non-interactive, and semi-interactive), channel (broadcasting and face-to-face), and planning type (planned, semi-spontaneous, and spontaneous).The prosodic profile of each sample is captured by two sets of three variables. The first set consists of primary (i.e. structurally objective) variables, namely the mean number per second of pauses (fPauses), conversational overlaps (fOverlap), and gap fillers (fEuh). The second set is based on a model consisting of secondary variables determined a priori by the authors because they are likely to occur in certain discourse genres. They are the mean numbers per second of prosodic prominences (fProm), intonational periods (fIPE), intonation packages (fIPA).Our main research question is whether discourse types in French can be characterized and ultimately predicted by prosodic features. We also address two side questions. First, does the fact that the corpus is relatively small, heterogeneous, and not necessarily balanced affect the representativeness of our results? Second, are the secondary prosodic features representative of discourse genres?We compiled a data table that consists of 57 observations (the corpus samples) and the twelve above listed variables. We visualized the table with RhapVis, a tool we designed on purpose (http://ressources.modyco.fr/sm/RhapVis/), explored it with principal component analysis (http://ressources.modyco.fr/sm/RhapVis/PCA.html), and looked for confirmed tendencies with non-parametric one-way ANOVAs (Kruskal-Wallis H tests).Our exploration shows that argumentative and narrative sequences are prosodically marked, whereas descriptive and procedural sequences are not. A discourse genre is prosodically marked when it is characterized by a high frequency of prosodic features, namely the simultaneous occurrence of overlaps, prominences, and intonation packages. We also claim that a discourse genre is prosodically marked when it is atypical with respect to the other speech genres. This is the case with oratory speech, which is characterized by a high frequency of intonational periods and pauses and is consequently isolated from the other types.These results were partially confirmed by the ANOVAs. Focusing on primary variables, running an ANOVA on fPause showed a significant main effect of Genre (p < 0.05). Further inspection indicates that while the lowest fPause score was found in Narration (M = 0.32; SD = 0.04), the highest score was observed in Oratory (M = 0.42; SD = 0.01). For fOverlap, the main effect of Genre reached the level of significance (p < 0.001), indicating that fOverlap also varies according to Genre. The descriptive data showed that the fOverlap score was the highest for both Argumentation (M = 0.05, SD = 0.04) and Narration (M = 0.02, SD = 0.01). Conversely, no overlap was found in both Oratory and Procedural samples.ReferencesLindqvist, Christina. Corpus transcrits de quelques journaux télévisés français, Stockholm, Elanders Gotab, 2001, 289 pagesPortele T, Heuft B, Widera C, Wagner P, Wolters M (2000) Perceptual Prominence In: Speech and Signals. Aspects of Speech Synthesis and Automatic Speech Recognition. Festschrift dedicated to Wolfgang Hess on his 60th birthday. Forum Phoneticum, 69. Hektor, Frankfurt a.M.: 97-116.Wagner, P. et al. (2015b), « Disentangling and connecting different perspectives on prosodic prominence », Communication à ICPL, International Conference Prominence in Language, 2015, Cologne, ICPH, 201
Caractérisation de genres discursifs à l'aide de traits prosodiques dans un treebank de référence en français parlé
International audienceRhapsodie is a 33000-word treebank of spoken French that is annotated for syntax and prosody. It breaks down into 57 five-minute long samples produced by 89 male and female speakers. The discourse profile of each sample is captured by six variables: event structure (dialogue vs. monologue), social context (public vs. private), genre (argumentation, description, narrative, oratory, and procedural), interactivity (interactive, non-interactive, and semi-interactive), channel (broadcasting and face-to-face), and planning type (planned, semi-spontaneous, and spontaneous).The prosodic profile of each sample is captured by two sets of three variables. The first set consists of primary (i.e. structurally objective) variables, namely the mean number per second of pauses (fPauses), conversational overlaps (fOverlap), and gap fillers (fEuh). The second set is based on a model consisting of secondary variables determined a priori by the authors because they are likely to occur in certain discourse genres. They are the mean numbers per second of prosodic prominences (fProm), intonational periods (fIPE), intonation packages (fIPA).Our main research question is whether discourse types in French can be characterized and ultimately predicted by prosodic features. We also address two side questions. First, does the fact that the corpus is relatively small, heterogeneous, and not necessarily balanced affect the representativeness of our results? Second, are the secondary prosodic features representative of discourse genres?We compiled a data table that consists of 57 observations (the corpus samples) and the twelve above listed variables. We visualized the table with RhapVis, a tool we designed on purpose (http://ressources.modyco.fr/sm/RhapVis/), explored it with principal component analysis (http://ressources.modyco.fr/sm/RhapVis/PCA.html), and looked for confirmed tendencies with non-parametric one-way ANOVAs (Kruskal-Wallis H tests).Our exploration shows that argumentative and narrative sequences are prosodically marked, whereas descriptive and procedural sequences are not. A discourse genre is prosodically marked when it is characterized by a high frequency of prosodic features, namely the simultaneous occurrence of overlaps, prominences, and intonation packages. We also claim that a discourse genre is prosodically marked when it is atypical with respect to the other speech genres. This is the case with oratory speech, which is characterized by a high frequency of intonational periods and pauses and is consequently isolated from the other types.These results were partially confirmed by the ANOVAs. Focusing on primary variables, running an ANOVA on fPause showed a significant main effect of Genre (p < 0.05). Further inspection indicates that while the lowest fPause score was found in Narration (M = 0.32; SD = 0.04), the highest score was observed in Oratory (M = 0.42; SD = 0.01). For fOverlap, the main effect of Genre reached the level of significance (p < 0.001), indicating that fOverlap also varies according to Genre. The descriptive data showed that the fOverlap score was the highest for both Argumentation (M = 0.05, SD = 0.04) and Narration (M = 0.02, SD = 0.01). Conversely, no overlap was found in both Oratory and Procedural samples.ReferencesLindqvist, Christina. Corpus transcrits de quelques journaux télévisés français, Stockholm, Elanders Gotab, 2001, 289 pagesPortele T, Heuft B, Widera C, Wagner P, Wolters M (2000) Perceptual Prominence In: Speech and Signals. Aspects of Speech Synthesis and Automatic Speech Recognition. Festschrift dedicated to Wolfgang Hess on his 60th birthday. Forum Phoneticum, 69. Hektor, Frankfurt a.M.: 97-116.Wagner, P. et al. (2015b), « Disentangling and connecting different perspectives on prosodic prominence », Communication à ICPL, International Conference Prominence in Language, 2015, Cologne, ICPH, 201
Biomimetic Chitin-Silk Hybrids: An Optically Transparent Structural Platform for Wearable Devices and Advanced Electronics
The cuticles of insects and marine crustaceans are fascinating models for man-made advanced functional composites. The excellent mechanical properties of these biological structures rest on the exquisite self-assembly of natural ingredients, such as biominerals, polysaccharides, and proteins. Among them, the two commonly found building blocks in the model biocomposites are chitin nanofibers and silk-like proteins with ??-sheet structure. Despite being wholly organic, the chitinous protein complex plays a key role for the biocomposites by contributing to the overall mechanical robustness and structural integrity. Moreover, the chitinous protein complex alone without biominerals is optically transparent (e.g., dragonfly wings), thereby making it a brilliant model material system for engineering applications where optical transparency is essentially required. Here, inspired by the chitinous protein complex of arthropods cuticles, an optically transparent biomimetic composite that hybridizes chitin nanofibers and silk fibroin (??-sheet) is introduced, and its potential as a biocompatible structural platform for emerging wearable devices (e.g., smart contact lenses) and advanced displays (e.g., transparent plastic cover window) is demonstrated