13 research outputs found

    Validation of an open source, remote web‐based eye‐tracking method (WebGazer) for research in early childhood

    Get PDF
    Measuring eye movements remotely via the participant's webcam promises to be an attractive methodological addition to in-person eye-tracking in the lab. However, there is a lack of systematic research comparing remote web-based eye-tracking with in-lab eye-tracking in young children. We report a multi-lab study that compared these two measures in an anticipatory looking task with toddlers using WebGazer.js and jsPsych. Results of our remotely tested sample of 18-27-month-old toddlers (N = 125) revealed that web-based eye-tracking successfully captured goal-based action predictions, although the proportion of the goal-directed anticipatory looking was lower compared to the in-lab sample (N = 70). As expected, attrition rate was substantially higher in the web-based (42%) than the in-lab sample (10%). Excluding trials based on visual inspection of the match of time-locked gaze coordinates and the participant's webcam video overlayed on the stimuli was an important preprocessing step to reduce noise in the data. We discuss the use of this remote web-based method in comparison with other current methodological innovations. Our study demonstrates that remote web-based eye-tracking can be a useful tool for testing toddlers, facilitating recruitment of larger and more diverse samples; a caveat to consider is the larger drop-out rate

    Polysemy resolution with word embedding models and data visualization : the case of adverbial postpositions -ey, -eyse, and -(u)lo in Korean

    No full text
    Ce projet de thèse présente des comptes rendus informatiques de la résolution de la polysémie au niveau des mots dans une langue peu étudiée—le Coréen. Les postpositions, qui se caractérisent par une correspondance forme-fonction multiple et qui sont donc polysémiques par nature, posent un défi à l'analyse automatique et à la performance des modèles pour identifier leurs fonctions. Dans ce projet, je consolide les modèles existants de classification de vecteur au niveau du mot (Positive Pointwise Mutual Information et Singular Value Decomposition; Skip-Gram and Negative Sampling) en tenant compte du Window du contexte, et j'introduis un modèle de classification de vecteur au niveau de la phrase (Bidirectional Encoder Representations from Transformers (BERT)) dans le cadre de la modélisation sémantique distributionnelle. Par ailleurs, je développe deux systèmes de visualisation qui montrent (i) les relations entre les postpositions et leurs mots co-occurrents pour les modèles de vecteur au niveau du mot, et (ii) les clusters entre les phrases pour le modèle de vecteur au niveau de la phrase. Ces systèmes de visualisation ont l'avantage de mieux comprendre comment ces modèles de classification classent les fonctions prévues de ces postpositions. Les résultats montrent que, alors que la performance des modèles de vecteur au niveau du mot est modulée par la taille des corpus d'entraînement contenant les fonctions spécifiques des postpositions, le modèle de vecteur au niveau des phrases est stable (i.e., moins affecté par la taille du corpus) et simule la façon dont les humains reconnaissent la polysémie des postpositions adverbiales coréennes de façon plus appropriée que les modèles de vecteur au niveau du mot.This dissertation reports computational accounts of resolving word-level polysemy in a lesser-studied language—Korean. Postpositions, which are characterized as multiple form-function mapping and thus polysemous in nature, pose a challenge to automatic analysis and model performance in identifying their functions. In this project, I enhance the existing word-level embedding classification models (Positive Pointwise Mutual Information and Singular Value Decomposition; Skip-Gram and Negative Sampling) with the consideration of context window, and introduce a sentence-level embedding classification model (Bidirectional Encoder Representations from Transformers (BERT)) under the scheme of Distributional Semantic Modeling. I then develop two visualization systems that show (i) relationships of the postpositions and their co- occurring words for word-level embedding models, and (ii) clusters between sentences for the sentence-level embedding model. These visualization systems have an advantage to better understand how these classification models classify the intended functions of these postpositions. Results show that, whereas the performance of the word-level embedding models is modulated by the size of training corpora containing specific functions of the postpositions, the sentence-level embedding model performs in a stable way (i.e., less affected by the corpus size) and simulates how humans recognize the polysemy involving Korean adverbial postpositions more appropriately than the word-level embedding models do

    La résolution de la polysémie à l'aide de modèles de vecteur de mots et la visualisation de données : le cas des postpositions adverbiales -ey, -eyse, et -(u)lo en coréen

    No full text
    This dissertation reports computational accounts of resolving word-level polysemy in a lesser-studied language—Korean. Postpositions, which are characterized as multiple form-function mapping and thus polysemous in nature, pose a challenge to automatic analysis and model performance in identifying their functions. In this project, I enhance the existing word-level embedding classification models (Positive Pointwise Mutual Information and Singular Value Decomposition; Skip-Gram and Negative Sampling) with the consideration of context window, and introduce a sentence-level embedding classification model (Bidirectional Encoder Representations from Transformers (BERT)) under the scheme of Distributional Semantic Modeling. I then develop two visualization systems that show (i) relationships of the postpositions and their co- occurring words for word-level embedding models, and (ii) clusters between sentences for the sentence-level embedding model. These visualization systems have an advantage to better understand how these classification models classify the intended functions of these postpositions. Results show that, whereas the performance of the word-level embedding models is modulated by the size of training corpora containing specific functions of the postpositions, the sentence-level embedding model performs in a stable way (i.e., less affected by the corpus size) and simulates how humans recognize the polysemy involving Korean adverbial postpositions more appropriately than the word-level embedding models do.Ce projet de thèse présente des comptes rendus informatiques de la résolution de la polysémie au niveau des mots dans une langue peu étudiée—le Coréen. Les postpositions, qui se caractérisent par une correspondance forme-fonction multiple et qui sont donc polysémiques par nature, posent un défi à l'analyse automatique et à la performance des modèles pour identifier leurs fonctions. Dans ce projet, je consolide les modèles existants de classification de vecteur au niveau du mot (Positive Pointwise Mutual Information et Singular Value Decomposition; Skip-Gram and Negative Sampling) en tenant compte du Window du contexte, et j'introduis un modèle de classification de vecteur au niveau de la phrase (Bidirectional Encoder Representations from Transformers (BERT)) dans le cadre de la modélisation sémantique distributionnelle. Par ailleurs, je développe deux systèmes de visualisation qui montrent (i) les relations entre les postpositions et leurs mots co-occurrents pour les modèles de vecteur au niveau du mot, et (ii) les clusters entre les phrases pour le modèle de vecteur au niveau de la phrase. Ces systèmes de visualisation ont l'avantage de mieux comprendre comment ces modèles de classification classent les fonctions prévues de ces postpositions. Les résultats montrent que, alors que la performance des modèles de vecteur au niveau du mot est modulée par la taille des corpus d'entraînement contenant les fonctions spécifiques des postpositions, le modèle de vecteur au niveau des phrases est stable (i.e., moins affecté par la taille du corpus) et simule la façon dont les humains reconnaissent la polysémie des postpositions adverbiales coréennes de façon plus appropriée que les modèles de vecteur au niveau du mot

    How can we capture Multiword Expressions?

    No full text
    International audienc

    PreechVis: Visual profiling using multiple-word combinations

    No full text
    International audienceWords in the corpus include features and information, and the visualizing of such words can improve the user's understanding of them. Those words may be consist of one-word or they may be a combination of words that together. The latter is referred to as a multiword expressions (MWEs). And if we analyze both single word and multiword with visualization, we can get more accurate results and more information than when we analyze only single word from corpus. An interactive-visualization can be useful for analyzing multiword expressions, because the following features are of interest to linguistics scholars: (1) Showing the combinations of POS pattern, (2) exploring the results according to the POS combination pattern, and (3) searching the source corpus for the verification. Therefore, we propose PreechVis, an interactive visualization tool that includes all of the requisite functions for an analysis using multiwords (http://ressources.modyco.fr/sm/PreechVisMWE/). For the present study, we used a total of 957 speeches, 164,646 sentences and 3,698,617 tokens of 43 U.S. Presidents from George Washington to Barack Obama as the corpus. PreechVis is divided into two views. In the first view, the system consists of a combination of Sunburst and RadVis. Through the Sunburst, we present the POS and its combination patterns for each gram. In RadVis, the Presidents were positioned according to their frequency value. In addition, when the President was selected, the frequency value was displayed on Sunburst to improve the user's understanding. In the second view, the user can simultaneously confirm and verify the details of the result using the Wordcloud. The two different views are synchronized each other and easy to change by the selected grams, issues, and presidents. With the experiments and case studies on the U.S. President speeches, we verified the effectiveness and usability of PreechVis

    Caractérisation de genres discursifs à l'aide de traits prosodiques dans un treebank de référence en français parlé

    No full text
    International audienceRhapsodie is a 33000-word treebank of spoken French that is annotated for syntax and prosody. It breaks down into 57 five-minute long samples produced by 89 male and female speakers. The discourse profile of each sample is captured by six variables: event structure (dialogue vs. monologue), social context (public vs. private), genre (argumentation, description, narrative, oratory, and procedural), interactivity (interactive, non-interactive, and semi-interactive), channel (broadcasting and face-to-face), and planning type (planned, semi-spontaneous, and spontaneous).The prosodic profile of each sample is captured by two sets of three variables. The first set consists of primary (i.e. structurally objective) variables, namely the mean number per second of pauses (fPauses), conversational overlaps (fOverlap), and gap fillers (fEuh). The second set is based on a model consisting of secondary variables determined a priori by the authors because they are likely to occur in certain discourse genres. They are the mean numbers per second of prosodic prominences (fProm), intonational periods (fIPE), intonation packages (fIPA).Our main research question is whether discourse types in French can be characterized and ultimately predicted by prosodic features. We also address two side questions. First, does the fact that the corpus is relatively small, heterogeneous, and not necessarily balanced affect the representativeness of our results? Second, are the secondary prosodic features representative of discourse genres?We compiled a data table that consists of 57 observations (the corpus samples) and the twelve above listed variables. We visualized the table with RhapVis, a tool we designed on purpose (http://ressources.modyco.fr/sm/RhapVis/), explored it with principal component analysis (http://ressources.modyco.fr/sm/RhapVis/PCA.html), and looked for confirmed tendencies with non-parametric one-way ANOVAs (Kruskal-Wallis H tests).Our exploration shows that argumentative and narrative sequences are prosodically marked, whereas descriptive and procedural sequences are not. A discourse genre is prosodically marked when it is characterized by a high frequency of prosodic features, namely the simultaneous occurrence of overlaps, prominences, and intonation packages. We also claim that a discourse genre is prosodically marked when it is atypical with respect to the other speech genres. This is the case with oratory speech, which is characterized by a high frequency of intonational periods and pauses and is consequently isolated from the other types.These results were partially confirmed by the ANOVAs. Focusing on primary variables, running an ANOVA on fPause showed a significant main effect of Genre (p < 0.05). Further inspection indicates that while the lowest fPause score was found in Narration (M = 0.32; SD = 0.04), the highest score was observed in Oratory (M = 0.42; SD = 0.01). For fOverlap, the main effect of Genre reached the level of significance (p < 0.001), indicating that fOverlap also varies according to Genre. The descriptive data showed that the fOverlap score was the highest for both Argumentation (M = 0.05, SD = 0.04) and Narration (M = 0.02, SD = 0.01). Conversely, no overlap was found in both Oratory and Procedural samples.ReferencesLindqvist, Christina. Corpus transcrits de quelques journaux télévisés français, Stockholm, Elanders Gotab, 2001, 289 pagesPortele T, Heuft B, Widera C, Wagner P, Wolters M (2000) Perceptual Prominence In: Speech and Signals. Aspects of Speech Synthesis and Automatic Speech Recognition. Festschrift dedicated to Wolfgang Hess on his 60th birthday. Forum Phoneticum, 69. Hektor, Frankfurt a.M.: 97-116.Wagner, P. et al. (2015b), « Disentangling and connecting different perspectives on prosodic prominence », Communication à ICPL, International Conference Prominence in Language, 2015, Cologne, ICPH, 201

    Caractérisation de genres discursifs à l'aide de traits prosodiques dans un treebank de référence en français parlé

    No full text
    International audienceRhapsodie is a 33000-word treebank of spoken French that is annotated for syntax and prosody. It breaks down into 57 five-minute long samples produced by 89 male and female speakers. The discourse profile of each sample is captured by six variables: event structure (dialogue vs. monologue), social context (public vs. private), genre (argumentation, description, narrative, oratory, and procedural), interactivity (interactive, non-interactive, and semi-interactive), channel (broadcasting and face-to-face), and planning type (planned, semi-spontaneous, and spontaneous).The prosodic profile of each sample is captured by two sets of three variables. The first set consists of primary (i.e. structurally objective) variables, namely the mean number per second of pauses (fPauses), conversational overlaps (fOverlap), and gap fillers (fEuh). The second set is based on a model consisting of secondary variables determined a priori by the authors because they are likely to occur in certain discourse genres. They are the mean numbers per second of prosodic prominences (fProm), intonational periods (fIPE), intonation packages (fIPA).Our main research question is whether discourse types in French can be characterized and ultimately predicted by prosodic features. We also address two side questions. First, does the fact that the corpus is relatively small, heterogeneous, and not necessarily balanced affect the representativeness of our results? Second, are the secondary prosodic features representative of discourse genres?We compiled a data table that consists of 57 observations (the corpus samples) and the twelve above listed variables. We visualized the table with RhapVis, a tool we designed on purpose (http://ressources.modyco.fr/sm/RhapVis/), explored it with principal component analysis (http://ressources.modyco.fr/sm/RhapVis/PCA.html), and looked for confirmed tendencies with non-parametric one-way ANOVAs (Kruskal-Wallis H tests).Our exploration shows that argumentative and narrative sequences are prosodically marked, whereas descriptive and procedural sequences are not. A discourse genre is prosodically marked when it is characterized by a high frequency of prosodic features, namely the simultaneous occurrence of overlaps, prominences, and intonation packages. We also claim that a discourse genre is prosodically marked when it is atypical with respect to the other speech genres. This is the case with oratory speech, which is characterized by a high frequency of intonational periods and pauses and is consequently isolated from the other types.These results were partially confirmed by the ANOVAs. Focusing on primary variables, running an ANOVA on fPause showed a significant main effect of Genre (p < 0.05). Further inspection indicates that while the lowest fPause score was found in Narration (M = 0.32; SD = 0.04), the highest score was observed in Oratory (M = 0.42; SD = 0.01). For fOverlap, the main effect of Genre reached the level of significance (p < 0.001), indicating that fOverlap also varies according to Genre. The descriptive data showed that the fOverlap score was the highest for both Argumentation (M = 0.05, SD = 0.04) and Narration (M = 0.02, SD = 0.01). Conversely, no overlap was found in both Oratory and Procedural samples.ReferencesLindqvist, Christina. Corpus transcrits de quelques journaux télévisés français, Stockholm, Elanders Gotab, 2001, 289 pagesPortele T, Heuft B, Widera C, Wagner P, Wolters M (2000) Perceptual Prominence In: Speech and Signals. Aspects of Speech Synthesis and Automatic Speech Recognition. Festschrift dedicated to Wolfgang Hess on his 60th birthday. Forum Phoneticum, 69. Hektor, Frankfurt a.M.: 97-116.Wagner, P. et al. (2015b), « Disentangling and connecting different perspectives on prosodic prominence », Communication à ICPL, International Conference Prominence in Language, 2015, Cologne, ICPH, 201

    Biomimetic Chitin-Silk Hybrids: An Optically Transparent Structural Platform for Wearable Devices and Advanced Electronics

    No full text
    The cuticles of insects and marine crustaceans are fascinating models for man-made advanced functional composites. The excellent mechanical properties of these biological structures rest on the exquisite self-assembly of natural ingredients, such as biominerals, polysaccharides, and proteins. Among them, the two commonly found building blocks in the model biocomposites are chitin nanofibers and silk-like proteins with ??-sheet structure. Despite being wholly organic, the chitinous protein complex plays a key role for the biocomposites by contributing to the overall mechanical robustness and structural integrity. Moreover, the chitinous protein complex alone without biominerals is optically transparent (e.g., dragonfly wings), thereby making it a brilliant model material system for engineering applications where optical transparency is essentially required. Here, inspired by the chitinous protein complex of arthropods cuticles, an optically transparent biomimetic composite that hybridizes chitin nanofibers and silk fibroin (??-sheet) is introduced, and its potential as a biocompatible structural platform for emerging wearable devices (e.g., smart contact lenses) and advanced displays (e.g., transparent plastic cover window) is demonstrated
    corecore