Search CORE

4 research outputs found

Évaluation sur mesure de modèles distributionnels sur un corpus spécialisé : comparaison des approches par contextes syntaxiques et par fenêtres graphiques.

Author: Hathout Nabil
Sajous Franck
Tanguy Ludovic
Publication venue: ATALA (Association pour le Traitement Automatique des Langues)
Publication date: 01/01/2015
Field of study

International audienceDistributional semantics models can be built using simple bag-of-word representation of a word's contexts (window-based) or using more complex syntactic information (syntax-based). Previous studies have compared their relative efficiency without coming to a definitive conclusion, but such examination has never been performed on small and specialised corpora. We have run a set of such comparative experiments based on a collection of French NLP articles and a custom-made gold standard. These experiments show a better global performance of syntax-based models, as long as syntactic information is processed with appropriate care.Il est possible de construire des modèles distributionnels en ne considérant que la cooccurrence graphique entre les mots, ou bien en utilisant des relations syntaxiques de complexité variable. Si des comparaisons systématiques n'ont jamais pu trancher définitivement en faveur de l'une ou de l'autre, elles ont rarement été menées sur un corpus de taille réduite ou en langue de spécialité. Nous proposons ici une palette d'expériences visant l'observation d'un ensemble de modèles distributionnels construits à partir d'un petit corpus d'articles en français dans le domaine du TAL. Un jeu de données a été spécifiquement conçu pour l'évaluation des différentes configurations. Ces expériences montrent que les modèles qui prennent en compte de façon raisonnable les informations syntaxiques obtiennent globalement de meilleurs résultats

Scientific Publications of the University of Toulouse II Le Mirail

HAL Descartes

Analyzing Binary Relationships of Identity Labels Using Distributional Semantic Models

Author: Hunter Youngquist
Publication venue: Department of Foreign Languages and Literatures at the University of Verona
Publication date: 01/12/2023
Field of study

Following the shift towards quantitative, corpus-based analysis in queer linguistics, I examine the usage of identity labels to explore the binary relationships and predicted normative effects in the case of the online community r/lgbt, a subreddit dedicated to minority identity labels and discussion. I analyze the distribution of the most frequent identity labels of the subreddit in a 2-year period with distributional semantic models, vector-based matrices that capture word distributions as numeric representations, showing evidence for various binaries that co-construct each other within the corpus. Additionally, I utilize concordances and collocations to examine the discourses surrounding gender and sexuality in the comments and submissions subcorpora, showing a more queer-aligned perspective in the former and a label-searching perspective in the latter. Finally, the results from these techniques demonstrate the overall complex relationships between the many types of labels currently in use and between the subreddit users and their feelings about adopting specific labels to describe their identities

Directory of Open Access Journals

SEMANTIQUE DISTRIBUTIONNELLE

Author: Fabre Cécile
Lenci Alessandro
Publication venue: ATALA (Association pour le traitement automatique des langues)
Publication date: 01/01/2015
Field of study

This special issue contains state-of-the-art papers on distributional semantic

Archivio della Ricerca - Università di Pisa

Appraising the causal relationship between DNA methylation and type 2 diabetes

Author: Juvinao-Quintero Diana
Publication venue
Publication date: 07/05/2019
Field of study

Explore Bristol Research