4 research outputs found

    La visión de Google News desde la academia: scoping review

    Full text link
    Google News es un servicio de noticias en línea que recopila diariamente titulares de varias fuentes informativas y los presenta en la página principal como si fuera un periódico. Este servicio es gratuito y redistribuye información de diferentes medios de comunicación acreditados y seleccionados, lo que lo convierte en una gran fuente de lectores para estos medios. En la investigación académica, Google News ha sido objeto de estudio en diversos artículos que analizan sus efectos y su relevancia. Bajo esta premisa nace esta investigación que lleva a cabo una revisión sistemática exploratoria, también conocida como "scoping review" con la motivación de examinar el corpus de artículos académicos publicados sobre Google News. Los objetivos específicos fueron determinar las ideas clave y conceptos principales relacionados con Google News, incluyendo las metodologías más utilizadas, y ofrecer aportes basados en evidencias sobre Google News y sus efectos presentes y futuros. Los resultados de esta investigación proporcionan una visión general de los estudios científicos sobre Google News y sus resultados más significativos

    New Word Extraction Utilizing Google News Corpuses for Supporting Lexicon-based Chinese Word Segmentation Systems

    No full text
    [[abstract]]Chinese word segmentation in a Chinese sentence is an essential step in the processing of Chinese natural language because it is beneficial to the Chinese text mining and information retrieval. Currently, the lexicon-based Chinese word segmentation scheme is the most widely used method, which can correctly identify Chinese sentences as distinct words from Chinese-language texts for real-word applications. However, the word identification ability of the lexicon-based scheme is highly dependent with a well prepared lexicon with sufficient amount of lexical entries which covers all of the Chinese words. In particular, this scheme cannot perform Chinese word segmentation process well for highly changeable texts with time, such as newspaper articles and web documents. This is because highly changeable documents often contain many new words that cannot be identified by the lexicon-based Chinese word segmentation systems with a constant lexicon. Moreover, to maintain the lexicon by manpower is an inefficient and time-consuming job. Based on the problems, this study proposes a novel statistics-based scheme for new word extraction based on the categorized corpuses of Google news retrieved from the Google news site automatically to promote the word identification ability for the lexicon-based Chinese word segmentation systems. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.

    [[alternative]]New Word Extraction Utilizing Google News Corpuses for Supporting Lexicon-based Chinese Word Segmentation Systems

    No full text
    [[abstract]]Chinese word segmentation in a Chinese sentence is an essential step in the processing of Chinese natural language because it is beneficial to the Chinese text mining and information retrieval. Currently, the lexicon-based Chinese word segmentation scheme is the most widely used method, which can correctly identify Chinese sentences as distinct words from Chinese-language texts for real-word applications. However, the word identification ability of the lexicon-based scheme is highly dependent with a well prepared lexicon with sufficient amount of lexical entries which covers all of the Chinese words. In particular, this scheme cannot perform Chinese word segmentation process well for highly changeable texts with time, such as newspaper articles and web documents. This is because highly changeable documents often contain many new words that cannot be identified by the lexicon-based Chinese word segmentation systems with a constant lexicon. Moreover, to maintain the lexicon by manpower is an inefficient and time-consuming job. Based on the problems, this study proposes a novel statistics-based scheme for new word extraction based on the categorized corpuses of Google news retrieved from the Google news site automatically to promote the word identification ability for the lexicon-based Chinese word segmentation systems. Compared with another proposed method, the experimental results indicated that the proposed new word extraction scheme not only can more correctly retrieve news words from the categorized corpuses of Google news, but also obtain has larger amount of new words.