331 research outputs found

    How to improve robustness in Kohonen maps and display additional information in Factorial Analysis: application to text mining

    Full text link
    This article is an extended version of a paper presented in the WSOM'2012 conference [1]. We display a combination of factorial projections, SOM algorithm and graph techniques applied to a text mining problem. The corpus contains 8 medieval manuscripts which were used to teach arithmetic techniques to merchants. Among the techniques for Data Analysis, those used for Lexicometry (such as Factorial Analysis) highlight the discrepancies between manuscripts. The reason for this is that they focus on the deviation from the independence between words and manuscripts. Still, we also want to discover and characterize the common vocabulary among the whole corpus. Using the properties of stochastic Kohonen maps, which define neighborhood between inputs in a non-deterministic way, we highlight the words which seem to play a special role in the vocabulary. We call them fickle and use them to improve both Kohonen map robustness and significance of FCA visualization. Finally we use graph algorithmic to exploit this fickleness for classification of words

    Inventario de palabras clave temáticas para la clasificación automática de noticias de televisión

    Get PDF
    En el marco de un proyecto financiado por el CAC (Consell de l'Audiovisual de Catalunya), se realizó una aproximación comunicológica al problema de la selección de palabras clave para la clasificación temática de noticias de televisión a partir de sistemas de reconocimiento automático. Aplicamos análisis del discurso (entorno al concepto "tema"), teoría de la noticia y técnicas lexicométricas y de recuperación de la información, para definir un Protocolo Integral de Selección de Palabras clave. Del trabajo de 4 investigadores con este protocolo sobre una muestra transcrita de 698 noticias ha resultado un lexicon de 1000 palabras clave distribuidas en 15 temas, contrastado mediante el estadístico Lambda de Wilks.In the framework of a research project funded by CAC, a communication approach was taken to the problem of keywords selection for the themes indexing of TV news by word spotting. This is, we apply discourse theories (concept of "themes"), news theory and lexicometry and information retrieval techniques, for the definition of a complex Protocol of Keywords Selection. The work of 4 researchers with this protocol on a 698 transcript news sample resulted in a lexicon of 1000 keywords distributed in 15 themes, which is contrasted statistically with Lambda of Wilks

    Estudo lexical: "Os Canibais" de Álvaro do Carvalhal

    Get PDF
    Este trabalho pretende analisar alguns dos dados quantitativos e qualitativos do conto «Os Canibais», de Álvaro do Carvalhal, utilizando o Nooj, programa computacional de análise lexical. Elaboraremos um estudo lexical do conto, baseado na análise estatística das palavras-tema, no sentido de delimitar possíveis campos temáticos, tornando exequível a identificação de temas. O Nooj é um ambiente de desenvolvimento linguístico que permite, por um lado, construir descrições formais (dicionários e gramáticas) de ampla cobertura de linguagens naturais e, por outro, aplicar essas mesmas descrições a textos de grandes dimensões com elevada eficácia. Devido às suas potencialidades e ao livre acesso (encontra-se disponível on-line, em www.nooj4nlp.net), o Nooj apresenta-se como uma ferramenta de trabalho acessível a qualquer utilizador, uma vez que não é necessário possuir conhecimentos de programação, para produzir recursos eficazes ou desenvolver uma proficiente investigação. A lexicometria, a partir do tratamento automatizado de corpora textuais, é particularmente útil a quem se interessa pelo estudo da produção textual e abre novas perspetivas de investigação que vão para além do empirismo comum, muitas vezes, sujeito às arbitrariedades dos pontos de vista subjetivos do observador. O Nooj poderá ser encarado como uma proposta de abordagem didática, no âmbito das novas metodologias que o ensino das línguas encerra. Indubitavelmente, as potencialidades didáticas do Nooj são praticamente ilimitadas. O seu nível de proficiência depende da criatividade do professor e da curiosidade do aluno.info:eu-repo/semantics/publishedVersio

    Un siècle et demi de discours gouvernemental au Canada Contribution de la lexicométrie à l'Histoire politique

    No full text
    International audienceThe Canadian Prime Minister opens each session of the Ottawa federal parliament with a speech of the "throne". From the origin of these institutions (1867), there are 128 speeches and a total of 260,836 words. The automatic segmentation of the corpus highlights two major turning points defining three periods that are characterized by their own topics and by their own vocabularies. The first turning point occurs at the beginning of the Second World War when the central government extended considerably, and the second one occurred in 1968 with the arrival in power of Trudeau who wants to build a nation and a strong federal government. Within these three main periods, there are several important events which also delimit some sequences. Thus, lexicometry provides some useful tools for the periodization of political history.Le Premier ministre canadien ouvre chaque session du parlement d'Ottawa par un discours du "trône" soit, depuis l'origine de ces institutions (1867), 128 discours comportant 260 836 mots. La segmentation automatique de ce corpus met en lumière deux tournants majeurs délimitant trois périodes dont les thèmes propres sont déterminés grâce à leurs vocabulaires caractéristiques. Le premier tournant correspond à la seconde guerre mondiale et à l'après-guerre où les activités du gouvernement central s'étendent considérablement ; le deuxième survient en 1968 avec l'arrivée au pouvoir de Trudeau qui veut bâtir une nation canadienne et un pouvoir fédéral fort. Au sein de ces trois périodes principales, on distingue plusieurs épisodes secondaires importants dont on établit également les vocabulaires caractéristiques. Ainsi, la lexicométrie fournit des outils intéressants pour la périodisation de l'histoire politique

    Legitimizing farmers' new knowledge, learning and practices through communicative action: Application of an agro-environmental policy

    Get PDF
    International audienceThis article examines the role of communication in the process that guides economic actors to integrate the moral obligations implied by adopting sustainability principles in their action choices and to reexamine their practices. We analyze two approaches to implementing agro-environmental measures that encourage farmers to preserve water resources. Verbal interactions between farmers and agricultural advisors, who are part of these policy programs, are analyzed drawing on Jürgen Habermas's theory of communicative action. The discourse analysis used here shows that communicative action encouraged participants to reexamine the validity of the technical, experiential, and normative knowledge that legitimized their reasons for acting. This study brings to light the fact that, in the context of a business primarily oriented towards making a profit, committing to sustainable development does not only operate in technical terms; such a commitment also requires collective validation of the effectiveness of alternative farming practices

    Looking for French deverbal nouns in an evolving Web (a short history of WAC)

    Get PDF
    International audienceThis paper describes an 8-year-long research effort for automatically collecting new French deverbal nouns on the Web. The goal has remained the same: building an extensive and cumulative list of noun-verb pairs where the noun denotes the action expressed by the verb (e.g. production - produce). This list is used for both linguistic research and for NLP applications. The initial method consisted in taking advantage of the former Altavista search engine, allowing for a direct access to unknown word forms. The second technique led us to develop a specific crawler, which raised a number of technical difficulties. In the third experiment, we use a collection of web pages made available to us by a commercial search engine. Through all these stages, the general method has remained the same, and the results are similar and cumulative, although the technical environment has greatly evolved

    Ruralité et acquisition lexicale au Manitoba : le vocabulaire disponible dans les écoles Saint-Eustache (milieu rural) et Provencher (milieu urbain)

    Get PDF
    Cet article analyse les traits dominants du vocabulaire de jeunes Manitobains vivant en milieu francophone rural et urbain. Il s’appuie sur les résultats de nos enquêtes en lexicométrie (analyse statistique des langues), notamment sur nos enquêtes de « disponibilité lexicale », menées auprès de jeunes de 8 à 13 ans (1990-2006). Les corpus établis d’après les données d’enquête comprennent les mots les plus fréquemment utilisés dans seize contextes de conversation (champs lexicaux, tels que les vêtements, l’école, les métiers). C’est plus précisément sur l’enquête réalisée à l’école Saint-Eustache et à l’école Provencher (Saint-Boniface) que se fondent les résultats analysés ci-après. Parmi tous les critères constitutifs de l’analyse, nous avons choisi de présenter celui du lieu d’habitation des témoins. Les écarts entre les indices lexicométriques obtenus pour Saint-Eustache (milieu rural) et pour Provencher (milieu urbain) sont comparés d’un point de vue quantitatif (tel le nombre de mots par témoin) et qualitatif (tel le contenu notionnel). Nous réfléchissons, en conclusion, sur différents moyens de répondre à certains des besoins pédagogiques des enfants en milieu rural.This article examines salient features in the vocabulary of young Franco-Manitobans living in rural/urban areas. It is based on a series of studies in lexicometry (or statistical analysis of language), in particular on the author’s fieldwork and “lexical availability” study of 8- to 13-year olds (1990-2006). The corpus was collected and the most frequently used words were analyzed according to sixteen conversational contexts which comprised lexical fields such as clothing, school, and occupations. The lexicon of 8- to 13-year old informants was collected and analyzed for sixteen lexical fields (e.g. clothing, school, occupations). Among the various criteria included in the study, the author focuses here on the informants’ geographical origin (rural/urban). Lexicometric indexes obtained for St. Eustache (rural) and Provencher (urban) are compared for quantitative variants (e.g. number of words per informant) and qualitative variants (e.g. semantic content). To conclude, the author offers some suggestions designed to meet some of the specific pedagogical needs of rural children

    Children´s gendered ways of talking about learning to write

    Full text link
    This study attempts to integrate a gender perspective in the research of children’s conceptions about learning to write. We analyzed the individual interviews of 160 schoolchildren – equally distributed between boys and girls – in the eight grades from kindergarten to seventh grade in elementary school in Argentina, in order to explore gender-related patterns in their conceptions of learning to write. The lexicometric method was applied to the transcriptions of children’s responses. Subsequent qualitative analysis of modal responses revealed distinctive gender differences regarding both the content and the form of responses. We describe and interpret such differences within a theoretical framework that distinguishes two different modes of discourse and thought: the gendered conversational styles studied by Tannen, and the two modes of cognitive functioning proposed by Bruner. Results show that boys tended to adopt a report talk style and to present traits that are close to those proposed by Bruner in his portrait of the logicoparadigmatic mode of thought. Girls, instead, tended to adopt a rapport talk style and to integrate to a greater extent a set of procedures characterizing a narrative modality, by speaking at length of human actions, intentions and feelings. These findings underscore the educational potential of considering gender as an important (and still unexplored) aspect that influences children’s(and most probably teachers’) conceptions of how one learn

    Was Shakespeare's Vocabulary the Richest?

    No full text
    International audienceIt is generally assumed that the vocabulary of W. Shakespeare is exceptionally rich and his work contains a very large number of different words. We present a method to compare the extent of the vocabularies of several authors' works of unequal length. Applied to the theater of Shakespeare's time, it shows that the vocabulary of Shakespeare is not exceptional and that some or his contemporaries - like B. Jonson or T. Dekker - used a larger vocabulary.Il est généralement admis que le vocabulaire de W. Shakespeare est remarquablement riche. Son œuvre contiendrait un très grand nombre de mots différents. On présente une méthode qui permet de comparer Imontre que le vocabulaire de cet auteur n'a rien d'exceptionnel et que certains contemporains - comme B. Jonson ou T. Dekker - utilisaient un vocabulaire plus étendu

    The Philippine epics and ballads multimedia archive

    Get PDF
    Palawan is an island in the Philippines with remarkable heritages of both an archaeological and an intangible nature.1 Major prehistoric discoveries occurred on the island in the 1960s, and today intensive excavations are ongoing alongside progressive, interdisciplinary research employing new analytical tools.2 In May 1970 Charles Macdonald (an anthropologist) and I (trained as a linguist and an ethnologist) met the Pala'wan, and since that time, we have both regularly shared in their lives with many faithful returns.3 But during our very first week of fieldwork, we were invited to attend two simultaneous weddings where we heard for the first time Usuy, a beloved singer of tales and shaman, singing Kudaman. This lengthy narrative--which was performed that night in order to entertain the relatives and friends assembled under the roof of the large meeting house on the eve of the jural discussion related to the marriage alliances--is referred to among the Pala'wan as tultul, a genre-defining term I have proposed to translate as "epic" in contrast to the other eight defined oral genres (see Figure 1) present among the culture of the Highlanders on the southern part of this island.Not
    corecore