Search CORE

4 research outputs found

Etiquetado social y blog-scraping como alternativa para la actualización de vocabularios controlados: Aplicación práctica a un tesauro de Biblioteconomía y Documentación

Author: Mochón Bezares Gonzalo
Méndez Rodríguez Eva
Sorli Rojo Ángela
Publication venue: Instituto de Investigaciones Bibliotecológicas, INIBI. Facultad de Filosofía y Letras, Universidad de Buenos Aires
Publication date: 01/01/2017
Field of study

Etiquetado social y blog-scraping como alternativa para la actualización de vocabularios controlados: Aplicación práctica a un tesauro de Biblioteconomía y Documentación

Author: Mochón Bezares Gonzalo
Méndez Rodríguez Eva
Sorli Rojo Ángela
Publication venue: Instituto de Investigaciones Bibliotecológicas, INIBI. Facultad de Filosofía y Letras, Universidad de Buenos Aires
Publication date: 01/01/2017
Field of study

E-LIS

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

Digital.CSIC

Revistas Científicas de Filo (Facultad de Filosofía y Letras, UBA - Universidad de Buenos Aires)

Etiquetado social y blog-scraping como alternativa para la actualización de vocabularios controlados: aplicación práctica a un tesauro de Biblioteconomía y Documentación

Author: Mochón Bezares Gonzalo
Rodríguez Eva Méndez
Rojo Ángela Sorli
Publication venue: 'Editorial de la Facultad de Filosofia y Letras - Universidad de Buenos Aires'
Publication date: 17/10/2017
Field of study

The aim of this paper is to compare the use of free language tags, taken in our case from specialized blogs on information sciences, against the unstructured controlled language of keywords lists, for verifying which of them is the best source of new terminology for the Librarianship Thesaurus and Documentation. To do this, authors’ labels were extracted from 127 blogs on librarianship and information science using web scraping techniques, and were compared with descriptors and identifiers lists of the ISOC library and documentation database (ISOC-BD). The results of the analysis of authors’ tags in blogs contribute with 186 new terms, while the database lists only 130 terms. It is concluded that free language tags could be a better and faster way for contributing new terminology to controlled vocabularies than unstructured controlled language lists.El objetivo de este artículo es comparar las etiquetas en lenguaje libre, tomadas en nuestro caso de blogs especializados en ciencias de la información (information sciences), frente al lenguaje controlado no estructurado de las listas de palabras clave, con el fin de comprobar cuál de estos dos es una mejor fuente de nueva terminología para el Tesauro de Biblioteconomía y Documentación. Para ello, se extrajeron las etiquetas de autor de 127 blogs sobre biblioteconomía y documentación mediante técnicas de web scraping, y se compararon con los listados de descriptores e identificadores de la base de datos ISOC Biblioteconomía y Documentación (ISOC-BD). El análisis de las etiquetas de autor de blogs ha aportado 186 nuevos términos, mientras que los listados de la base de datos han proporcionado 130términos. Se concluye que las etiquetas en lenguaje libre pueden ser una mejor y más rápida vía de aporte de nueva terminología a los vocabularios controlados que los listados de lenguaje controlado no estructurado

Revistas Científicas de Filo (Facultad de Filosofía y Letras, UBA - Universidad de Buenos Aires)

Enriching thesauri with hierarchical relationships by pattern matching in dictionaries

Author: Araujo Lourdes
Pérez-Agüera José R.
Publication venue: Springer Verlag
Publication date: 01/01/2006
Field of study

This paper proposes a pattern matching method applied to dictionaries to identify hierarchical relationships between terms. In this work we focus on this type of relationship because we use it in the automatic generation of thesauri, which are used to improve information retrieval tasks. However the method can also be applied to identify other semantic relationships. We distinguish two kinds of patterns: structural patterns, composed of a sequence of part-of-speech tags, and key patterns, typical of dictionary entries, composed of some key terms, along with some part-of-speech tags. This kind of patterns are automatically extracted for the dictionary entries by means of stochastic techniques. The thesaurus, that has been partially constructed previously, is then extended with the new relationships obtained by applying the patterns to a dictionary. We have based the system evaluation on the results obtained with and without the thesaurus in an information retrieval task proposed by the Cross-Language Evaluation Forum (CLEF). The results of these experiments have revealed a clear improvement on the performance

E-LIS