223 research outputs found

    LLM-based Comment Summarization and Topic Matching for Videos

    Get PDF
    Large language models (LLMs) have shown substantial improvements on text summarization and classification benchmarks over prior techniques. LLMs can be used for topic detection for online video which can help improve video recommendations. This disclosure describes techniques to parse comments associated with a particular video in addition to video topic detection. The automated parsing enables acquisition of auxiliary information about the context in which the video is made available. The topics/keywords can be complementary to or can extend upon topics detected from the video and/or video metadata. A combination of the video topics and comment topics can be utilized for video recommendations and can identify potential audiences that may not otherwise be matched to a video

    Towards Preemptive Text Edition using Topic Matching on Corpora

    Get PDF
    Nowadays, the results of scientific research are only recognized when published in papers for international journals or magazines of the respective area of knowledge. This perspective reflects the importance of having the work reviewed by peers. The revision encompasses a thorough analysis on the work performed, including quality of writing and whether the study advances the state-of-the-art, among other details. For these reasons, with the publishing of the document, other researchers have an assurance of the high quality of the study presented and can, therefore, make direct usage of the findings in their own work. The publishing of documents creates a cycle of information exchange responsible for speeding up the progress behind the development of new techniques, theories and technologies, resulting in added value for the entire society. Nonetheless, the existence of a detailed revision of the content sent for publication requires additional effort and dedication from its authors. They must make sure that the manuscript is of high quality, since sending a document with mistakes conveys an unprofessional image of the authors, which may result in the rejection at the journal or magazine. The objective of this work is to develop an algorithm capable of assisting in the writing of this type of documents, by proposing suggestions of possible improvements or corrections according to its specific context. The general idea for the solution proposed is for the algorithm to calculate suggestions of improvements by comparing the content of the document being written in to that of similar published documents on the field. In this context, a study on Natural Language Processing (NLP) techniques used in the creation of models for representing the document and its subjects was performed. NLP provides the tools for creating models to represent the documents and identify their topics. The main concepts include n-grams and topic modeling. The study included also an analysis of some works performed in the field of academic writing. The structure and contents of this type of documents, the presentation of some of the characteristics that are common to high quality articles, as well as the tools developed with the objective of helping in its writing were also subject of analysis. The developed algorithm derives from the combination of several tools backed up by a collection of documents, as well as the logic connecting all components, implemented in the scope of this Master’s. The collection of documents is constituted by full text of articles from different areas, including Computer Science, Physics and Mathematics, among others. The topics of these documents were extracted and stored in order to be fed to the algorithm. By comparing the topics extracted from the document under analysis with those from the documents in the collection, it is possible to select its closest documents, using them for the creation of suggestions. The algorithm is capable of proposing suggestions for word replacements which are more commonly utilized in a given field of knowledge through a set of tools used in syntactic analysis, synonyms search and morphological realization. Both objective and subjective tests were conducted on the algorithm. They demonstrate that, in some cases, the algorithm proposes suggestions which approximate the terms used in the document to the most utilized terms in the state-of-the-art of a defined scientific field. This points towards the idea that the usage of the algorithm should improve the quality of the documents, as they become more similar to the ones already published. Even though the improvements to the documents are minimal, they should be understood as a lower bound for the real utility of the algorithm. This statement is partially justified by the existence of several parsing errors both in the training and test sets, resulting from the parsing of the pdf files from the original articles, which can be improved in a production system. The main contributions of this work include the presentation of the study performed on the state of the art, the design and implementation of the algorithm and the text editor developed as a proof of concept. The analysis on the specificity of the context, which results from the tests performed on different areas of knowledge, and the large collection of documents, gathered during this Master’s program, are also important contributions of this work.Hoje em dia, a realização de uma investigação científica só é valorizada quando resulta na publicação de artigos científicos em jornais ou revistas internacionais de renome na respetiva área do conhecimento. Esta perspetiva reflete a importância de que os estudos realizados sejam validados por pares. A validação implica uma análise detalhada do estudo realizado, incluindo a qualidade da escrita e a existência de novidades, entre outros detalhes. Por estas razões, com a publicação do documento, outros investigadores têm uma garantia de qualidade do estudo realizado e podem, por isso, utilizar o conhecimento gerado para o seu próprio trabalho. A publicação destes documentos cria um ciclo de troca de informação que é responsável por acelerar o processo de desenvolvimento de novas técnicas, teorias e tecnologias, resultando na produção de valor acrescido para a sociedade em geral. Apesar de todas estas vantagens, a existência de uma verificação detalhada do conteúdo do documento enviado para publicação requer esforço e trabalho acrescentado para os autores. Estes devem assegurar-se da qualidade do manuscrito, visto que o envio de um documento defeituoso transmite uma imagem pouco profissional dos autores, podendo mesmo resultar na rejeição da sua publicação nessa revista ou ata de conferência. O objetivo deste trabalho é desenvolver um algoritmo para ajudar os autores na escrita deste tipo de documentos, propondo sugestões para melhoramentos tendo em conta o seu contexto específico. A ideia genérica para solucionar o problema passa pela extração do tema do documento a ser escrito, criando sugestões através da comparação do seu conteúdo com o de documentos científicos antes publicados na mesma área. Tendo em conta esta ideia e o contexto previamente apresentado, foi realizado um estudo de técnicas associadas à área de Processamento de Linguagem Natural (PLN). O PLN fornece ferramentas para a criação de modelos capazes de representar o documento e os temas que lhe estão associados. Os principais conceitos incluem n-grams e modelação de tópicos (topic modeling). Para concluir o estudo, foram analisados trabalhos realizados na área dos artigos científicos, estudando a sua estrutura e principais conteúdos, sendo ainda abordadas algumas características comuns a artigos de qualidade e ferramentas desenvolvidas para ajudar na sua escrita. O algoritmo desenvolvido é formado pela junção de um conjunto de ferramentas e por uma coleção de documentos, bem como pela lógica que liga todos os componentes, implementada durante este trabalho de mestrado. Esta coleção de documentos é constituída por artigos completos de algumas áreas, incluindo Informática, Física e Matemática, entre outras. Antes da análise de documentos, foi feita a extração de tópicos da coleção utilizada. Deste forma, ao extrair os tópicos do documento sob análise, é possível selecionar os documentos da coleção mais semelhantes, sendo estes utilizados para a criação de sugestões. Através de um conjunto de ferramentas para análise sintática, pesquisa de sinónimos e realização morfológica, o algoritmo é capaz de criar sugestões de substituições de palavras que são mais comummente utilizadas na área. Os testes realizados permitiram demonstrar que, em alguns casos, o algoritmo é capaz de fornecer sugestões úteis de forma a aproximar os termos utilizados no documento com os termos mais utilizados no estado de arte de uma determinada área científica. Isto constitui uma evidência de que a utilização do algoritmo desenvolvido pode melhorar a qualidade da escrita de documentos científicos, visto que estes tendem a aproximar-se daqueles já publicados. Apesar dos resultados apresentados não refletirem uma grande melhoria no documento, estes deverão ser considerados uma baixa estimativa ao valor real do algoritmo. Isto é justificado pela presença de inúmeros erros resultantes da conversão dos documentos pdf para texto, estando estes presentes tanto na coleção de documentos, como nos testes. As principais contribuições deste trabalho incluem a partilha do estudo realizado, o desenho e implementação do algoritmo e o editor de texto desenvolvido como prova de conceito. A análise de especificidade de um contexto, que advém dos testes realizados às várias áreas do conhecimento, e a extensa coleção de documentos, totalmente compilada durante este mestrado, são também contribuições do trabalho

    Management Responses to Online Reviews: Big Data From Social Media Platforms

    Get PDF
    User-generated content from virtual communities helps businesses develop and sustain competitive advantages, which leads to asking how firms can strategically manage that content. This research, which consists of two studies, discusses management response strategies for hotel firms to gain a competitive advantage and improve customer relationship management by leveraging big data, social media analytics, and deep learning techniques. Since negative reviews' harmful effects are greater than positive comments' contribution, firms must strategise their responses to intervene in and minimise those damages. Although current literature includes a sheer amount of research that presents effective response strategies to negative reviews, they mostly overlook an extensive classification of response strategies. The first study consists of two phases and focuses on comprehensive response strategies to only negative reviews. The first phase is explorative and presents a correlation analysis between response strategies and overall ratings of hotels. It also reveals the differences in those strategies based on hotel class, average customer rating, and region. The second phase investigates effective response strategies for increasing the subsequent ratings of returning customers using logistic regression analysis. It presents that responses involving statements of admittance of mistake(s), specific action, and direct contact requests help increase following ratings of previously dissatisfied returning customers. In addition, personalising the response for better customer relationship management is particularly difficult due to the significant variability of textual reviews with various topics. The second study examines the impact of personalised management responses to positive and negative reviews on rating growth, integrating a novel method of multi-topic matching approach with a panel data analysis. It demonstrates that (a) personalised responses improve future ratings of hotels; (b) the effect of personalised responses is stronger for luxury hotels in increasing future ratings. Lastly, practical insights are provided

    Tracing the evolution of service robotics : Insights from a topic modeling approach

    Get PDF
    Acord transformatiu CRUE-CSICAltres ajuts: Helmholtz Association (HIRG-0069)Altres ajuts: Russian Science Foundation (RSF grant number 19-18-00262)Taking robotic patents between 1977 and 2017 and building upon the topic modeling technique, we extract their latent topics, analyze how important these topics are over time, and how they are related to each other looking at how often they are recombined in the same patents. This allows us to differentiate between more and less important technological trends in robotics based on their stage of diffusion and position in the space of knowledge represented by a topic graph, where some topics appear isolated while others are highly interconnected. Furthermore, utilizing external reference texts that characterize service robots from a technical perspective, we propose and apply a novel approach to match the constructed topics to service robotics. The matching procedure is based on frequency and exclusivity of words overlapping between the patents and the reference texts. We identify around 20 topics belonging to service robotics. Our results corroborate earlier findings, but also provide novel insights on the content and stage of development of application areas in service robotics. With this study we contribute to a better understanding of the highly dynamic field of robotics as well as to new practices of utilizing the topic modeling approach, matching the resulting topics to external classifications and applying to them metrics from graph theory

    Tracing the evolution of service robotics : Insights from a topic modeling approach

    Get PDF
    Altres ajuts: Acord transformatiu CRUE-CSICAltres ajuts: Helmholtz Association (HIRG-0069)Altres ajuts: Russian Science Foundation (RSF grant number 19-18-00262)Taking robotic patents between 1977 and 2017 and building upon the topic modeling technique, we extract their latent topics, analyze how important these topics are over time, and how they are related to each other looking at how often they are recombined in the same patents. This allows us to differentiate between more and less important technological trends in robotics based on their stage of diffusion and position in the space of knowledge represented by a topic graph, where some topics appear isolated while others are highly interconnected. Furthermore, utilizing external reference texts that characterize service robots from a technical perspective, we propose and apply a novel approach to match the constructed topics to service robotics. The matching procedure is based on frequency and exclusivity of words overlapping between the patents and the reference texts. We identify around 20 topics belonging to service robotics. Our results corroborate earlier findings, but also provide novel insights on the content and stage of development of application areas in service robotics. With this study we contribute to a better understanding of the highly dynamic field of robotics as well as to new practices of utilizing the topic modeling approach, matching the resulting topics to external classifications and applying to them metrics from graph theory

    A Topic Coverage Approach to Evaluation of Topic Models

    Full text link
    Topic models are widely used unsupervised models of text capable of learning topics - weighted lists of words and documents - from large collections of text documents. When topic models are used for discovery of topics in text collections, a question that arises naturally is how well the model-induced topics correspond to topics of interest to the analyst. In this paper we revisit and extend a so far neglected approach to topic model evaluation based on measuring topic coverage - computationally matching model topics with a set of reference topics that models are expected to uncover. The approach is well suited for analyzing models' performance in topic discovery and for large-scale analysis of both topic models and measures of model quality. We propose new measures of coverage and evaluate, in a series of experiments, different types of topic models on two distinct text domains for which interest for topic discovery exists. The experiments include evaluation of model quality, analysis of coverage of distinct topic categories, and the analysis of the relationship between coverage and other methods of topic model evaluation. The contributions of the paper include new measures of coverage, insights into both topic models and other methods of model evaluation, and the datasets and code for facilitating future research of both topic coverage and other approaches to topic model evaluation.Comment: Results and contributions unchanged; Added new references; Improved the contextualization and the description of the work (abstr, intro, 7.1 concl, rw, concl); Moved technical details of data and model building to appendices; Improved layout
    corecore