12 research outputs found

    Extracción de patrones de subcategorización de verbos en castellano con análisis de dependencias

    Get PDF
    En este artículo presentamos los resultados de nuestros experimentos en producción automática de léxicos con información de patrones de subcategorización verbal para castellano. La investigación se llevó a cabo en el marco del proyecto PANACEA de adquisición automática de información léxica que redujera al máximo la intervención humana. En nuestros experimentos, se utilizó una cadena de diferentes herramientas que incluía ‘crawling’ de textos de un dominio particular, normalización y limpieza de los textos, segmentación, identificación de unidades, etiquetado categorial y análisis de dependencias antes de, finalmente, la extracción de los patrones de subcategorización. Los resultados obtenidos muestran una gran dependencia de la calidad de los analizadores de dependencias aunque, no obstante, están en línea con los resultados obtenidos en experimentos similares para otras lenguas.In this paper we present the results of our experiments in automatic production of verb subcategorization frame lexica for Spanish. The work was carried out in the framework of the PANACEA project aiming at the automatic acquisition of lexical information reducing at maximum human intervention. In our experiments, a chain of different tools was used: domain focused web crawling, automatic cleaning, segmentation and tokenization, PoS tagging, dependency parsing and finally SCFs extraction. The obtained results show a high dependency on the quality of the results of the intervening components, in particular of the dependency parsing, which is the focus of this paper. Nevertheless, the results achieved are in line with the state-of-the-art for other languages in similar experiments.This work was funded by the European Project PANACEA (FP7-ICT-2010- 248064)

    ALL Dolphins Are Intelligent and SOME Are Friendly: Probing BERT for Nouns’ Semantic Properties and their Prototypicality

    No full text
    International audienceLarge scale language models encode rich commonsense knowledge acquired through exposure to massive data during pre-training, but their understanding of entities and their semantic properties is unclear. We probe BERT (Devlin et al., 2019) for the properties of English nouns as expressed by adjectives that do not restrict the reference scope of the noun they modify (as in red car), but instead emphasise some inherent aspect (red strawberry). We base our study on psycholinguistics datasets that capture the association strength between nouns and their semantic features. We probe BERT using cloze tasks and in a classification setting, and show that the model has marginal knowledge of these features and their prevalence as expressed in these datasets. We discuss factors that make evaluation challenging and impede drawing general conclusions about the models' knowledge of noun properties. Finally, we show that when tested in a fine-tuning setting addressing entailment, BERT successfully leverages the information needed for reasoning about the meaning of adjective-noun constructions outperforming previous methods

    Etude comparative de plongements lexicaux et autres traits pour la détection de la complexité lexicale en français

    No full text
    Lexical complexity detection is an important step for automatic text simplification which serves to make informed lexical substitutions. In this study, we experiment with word embeddings for measuring the complexity of French words and combine them with other features that have been shown to be well-suited for complexity prediction. Our results on a synonym ranking task show that embeddings perform better than other features in isolation, but do not outperform frequency-based systems in this language.Détecter la complexité lexicale est une étape importante pour la simplification automatique de textes, servant lors de l'identification des éléments lexicaux à substituer. Dans ce travail, nous explorons l'utilité des plongements lexicaux pour mesurer la complexité de mots en français, en les combinant avec d'autres traits reconnus comme étant utiles pour cette tâche. Nos résultats sur une tâche d'ordonnancement de synonymes selon leur complexité montrent que les plongements seuls donnent de meilleurs résultats que nombreux autres traits, bien que leur performance reste inférieure à celle de systèmes basés sur la fréquence pour cette langue

    Explorer l'informativité d'une phrase

    No full text
    International audienceThis study is a preliminary exploration of the concept of informativeness \textendashhow much information a sentence gives about a word it contains\textendash and its potential benefits to building quality word representations from scarce data. We propose several sentence-level classifiers to predict informativeness, and we perform a manual annotation on a set of sentences. We conclude that these two measures correspond to different notions of informativeness. However, our experiments show that using the classifiers' predictions to train word embeddings has an impact on embedding quality

    Verb SCF extraction for Spanish with dependency parsing = Extracción de patrones de subcategorización de verbos en castellano con análisis de dependencias

    No full text
    En este artículo presentamos los resultados de nuestros experimentos en producción automática de léxicos con información de patrones de subcategorización verbal para castellano. La investigación se llevó a cabo en el marco del proyecto PANACEA de adquisición automática de información léxica que redujera al máximo la intervención humana. En nuestros experimentos, se utilizó una cadena de diferentes herramientas que incluía ‘crawling’ de textos de un dominio particular, normalización y limpieza de los textos, segmentación, identificación de unidades, etiquetado categorial y análisis de dependencias antes de, finalmente, la extracción de los patrones de subcategorización. Los resultados obtenidos muestran una gran dependencia de la calidad de los analizadores de dependencias aunque, no obstante, están en línea con los resultados obtenidos en experimentos similares para otras lenguas.In this paper we present the results of our experiments in automatic production of verb subcategorization frame lexica for Spanish. The work was carried out in the framework of the PANACEA project aiming at the automatic acquisition of lexical information reducing at maximum human intervention. In our experiments, a chain of different tools was used: domain focused web crawling, automatic cleaning, segmentation and tokenization, PoS tagging, dependency parsing and finally SCFs extraction. The obtained results show a high dependency on the quality of the results of the intervening components, in particular of the dependency parsing, which is the focus of this paper. Nevertheless, the results achieved are in line with the state-of-the-art for other languages in similar experiments.This work was funded by the European Project PANACEA (FP7-ICT-2010-248064)

    Explorer l'informativité d'une phrase

    No full text
    International audienceThis study is a preliminary exploration of the concept of informativeness \textendashhow much information a sentence gives about a word it contains\textendash and its potential benefits to building quality word representations from scarce data. We propose several sentence-level classifiers to predict informativeness, and we perform a manual annotation on a set of sentences. We conclude that these two measures correspond to different notions of informativeness. However, our experiments show that using the classifiers' predictions to train word embeddings has an impact on embedding quality

    Language disintegration under conditions of severe formal thought disorder

    No full text
    On current models of the language faculty, the language system is taken to be divided by an interface with systems of thought. However, thought of the type expressed in language is difficult to access in language-independent terms. Potential inter-dependence of the two systems can be addressed by considering language under conditions of pathological changes in the neurotypical thought process. Speech patterns seen in patients with schizophrenia and formal thought disorder (FTD) present an opportunity to do this. Here we reanalyzed a corpus of severely thought-disordered speech with a view to capture patterns of linguistic disintegration comparatively across hierarchical layers of linguistic organization: 1. Referential anomalies, subcategorized into NP type involved, 2. Argument structure, 3. Lexis, and 4. Morphosyntax. Results showed significantly higher error proportions in referential anomalies against all other domains. Morphosyntax and lexis were comparatively least affected, while argument structure was intermediate. No differential impairment was seen in definite vs. indefinite NPs, or 3rd Person pronouns vs. lexical NPs. Statistically significant differences in error proportions emerged 'within' the domain of pronominals, where covert pronouns were more affected than overt pronouns, and 3rd Person pronouns more than 1st and 2nd Person ones. Moreover, copular clauses were more often anomalous than non-copular ones. These results provide evidence of how language and thought disintegrate together in FTD, with language disintegrating along hierarchical layers of linguistic organization and affecting specific construction types. A relative intactness of language at a procedural, morphosyntactic surface level masks a profound impairment in the referential functioning of language

    Consum, residus i aigua : guia didàctica (3-5 anys)

    No full text
    Contiene una presentación de la guía, orientaciones acerca de su uso, un apartado dedicado a la educación ambiental en la educación infantil, otro apartado dedicado a los planteamientos metodológicos generales, los criterios y las actividades de evaluación. El área dedicada al agua contiene la unidad didáctica El conte de na Mariona i na Mariaigua. El área de consumo y residuos contiene tres unidades didácticas En Pap reciclat, Jugam amb les deixalles y Reciclam paper. Se presentan cuatro unidades didácticas ordenadas por temas, que permiten que los maestros seleccionen las unidades que les interesen de manera aislada o que también las puedan estructurar en una secuencia de unidades en función de su programación. Ambas áreas contienen información básica sobre las problemáticas del agua, por una parte, y del consumo y los residuos por otra, así como información para saber más..Govern de les Illes Balears, Conselleria d'Educació i CulturaBalearesES
    corecore