223 research outputs found
LLM-based Comment Summarization and Topic Matching for Videos
Large language models (LLMs) have shown substantial improvements on text summarization and classification benchmarks over prior techniques. LLMs can be used for topic detection for online video which can help improve video recommendations. This disclosure describes techniques to parse comments associated with a particular video in addition to video topic detection. The automated parsing enables acquisition of auxiliary information about the context in which the video is made available. The topics/keywords can be complementary to or can extend upon topics detected from the video and/or video metadata. A combination of the video topics and comment topics can be utilized for video recommendations and can identify potential audiences that may not otherwise be matched to a video
Towards Preemptive Text Edition using Topic Matching on Corpora
Nowadays, the results of scientific research are only recognized when published in papers for international
journals or magazines of the respective area of knowledge. This perspective reflects
the importance of having the work reviewed by peers. The revision encompasses a thorough
analysis on the work performed, including quality of writing and whether the study advances
the state-of-the-art, among other details. For these reasons, with the publishing of the document,
other researchers have an assurance of the high quality of the study presented and can,
therefore, make direct usage of the findings in their own work. The publishing of documents
creates a cycle of information exchange responsible for speeding up the progress behind the
development of new techniques, theories and technologies, resulting in added value for the
entire society.
Nonetheless, the existence of a detailed revision of the content sent for publication requires
additional effort and dedication from its authors. They must make sure that the manuscript is
of high quality, since sending a document with mistakes conveys an unprofessional image of the
authors, which may result in the rejection at the journal or magazine. The objective of this
work is to develop an algorithm capable of assisting in the writing of this type of documents, by
proposing suggestions of possible improvements or corrections according to its specific context.
The general idea for the solution proposed is for the algorithm to calculate suggestions of improvements
by comparing the content of the document being written in to that of similar published
documents on the field. In this context, a study on Natural Language Processing (NLP)
techniques used in the creation of models for representing the document and its subjects was
performed. NLP provides the tools for creating models to represent the documents and identify
their topics. The main concepts include n-grams and topic modeling. The study included also an
analysis of some works performed in the field of academic writing. The structure and contents
of this type of documents, the presentation of some of the characteristics that are common to
high quality articles, as well as the tools developed with the objective of helping in its writing
were also subject of analysis.
The developed algorithm derives from the combination of several tools backed up by a collection
of documents, as well as the logic connecting all components, implemented in the scope of this
Master’s. The collection of documents is constituted by full text of articles from different areas,
including Computer Science, Physics and Mathematics, among others. The topics of these documents
were extracted and stored in order to be fed to the algorithm. By comparing the topics
extracted from the document under analysis with those from the documents in the collection,
it is possible to select its closest documents, using them for the creation of suggestions. The algorithm is capable of proposing suggestions for word replacements which are more commonly
utilized in a given field of knowledge through a set of tools used in syntactic analysis, synonyms
search and morphological realization.
Both objective and subjective tests were conducted on the algorithm. They demonstrate that,
in some cases, the algorithm proposes suggestions which approximate the terms used in the document
to the most utilized terms in the state-of-the-art of a defined scientific field. This points
towards the idea that the usage of the algorithm should improve the quality of the documents,
as they become more similar to the ones already published. Even though the improvements to
the documents are minimal, they should be understood as a lower bound for the real utility of
the algorithm. This statement is partially justified by the existence of several parsing errors
both in the training and test sets, resulting from the parsing of the pdf files from the original
articles, which can be improved in a production system.
The main contributions of this work include the presentation of the study performed on the state
of the art, the design and implementation of the algorithm and the text editor developed as a
proof of concept. The analysis on the specificity of the context, which results from the tests
performed on different areas of knowledge, and the large collection of documents, gathered
during this Master’s program, are also important contributions of this work.Hoje em dia, a realização de uma investigação científica só é valorizada quando resulta na publicação
de artigos científicos em jornais ou revistas internacionais de renome na respetiva área
do conhecimento. Esta perspetiva reflete a importância de que os estudos realizados sejam
validados por pares. A validação implica uma análise detalhada do estudo realizado, incluindo
a qualidade da escrita e a existência de novidades, entre outros detalhes. Por estas razões,
com a publicação do documento, outros investigadores têm uma garantia de qualidade do estudo
realizado e podem, por isso, utilizar o conhecimento gerado para o seu próprio trabalho.
A publicação destes documentos cria um ciclo de troca de informação que é responsável por
acelerar o processo de desenvolvimento de novas técnicas, teorias e tecnologias, resultando na
produção de valor acrescido para a sociedade em geral.
Apesar de todas estas vantagens, a existência de uma verificação detalhada do conteúdo do
documento enviado para publicação requer esforço e trabalho acrescentado para os autores.
Estes devem assegurar-se da qualidade do manuscrito, visto que o envio de um documento
defeituoso transmite uma imagem pouco profissional dos autores, podendo mesmo resultar na
rejeição da sua publicação nessa revista ou ata de conferência. O objetivo deste trabalho é
desenvolver um algoritmo para ajudar os autores na escrita deste tipo de documentos, propondo
sugestões para melhoramentos tendo em conta o seu contexto específico.
A ideia genérica para solucionar o problema passa pela extração do tema do documento a ser
escrito, criando sugestões através da comparação do seu conteúdo com o de documentos científicos
antes publicados na mesma área. Tendo em conta esta ideia e o contexto previamente
apresentado, foi realizado um estudo de técnicas associadas à área de Processamento de Linguagem
Natural (PLN). O PLN fornece ferramentas para a criação de modelos capazes de representar
o documento e os temas que lhe estão associados. Os principais conceitos incluem n-grams e
modelação de tópicos (topic modeling). Para concluir o estudo, foram analisados trabalhos
realizados na área dos artigos científicos, estudando a sua estrutura e principais conteúdos,
sendo ainda abordadas algumas características comuns a artigos de qualidade e ferramentas
desenvolvidas para ajudar na sua escrita.
O algoritmo desenvolvido é formado pela junção de um conjunto de ferramentas e por uma
coleção de documentos, bem como pela lógica que liga todos os componentes, implementada
durante este trabalho de mestrado. Esta coleção de documentos é constituída por artigos completos
de algumas áreas, incluindo Informática, Física e Matemática, entre outras. Antes da
análise de documentos, foi feita a extração de tópicos da coleção utilizada. Deste forma, ao
extrair os tópicos do documento sob análise, é possível selecionar os documentos da coleção mais semelhantes, sendo estes utilizados para a criação de sugestões. Através de um conjunto de
ferramentas para análise sintática, pesquisa de sinónimos e realização morfológica, o algoritmo
é capaz de criar sugestões de substituições de palavras que são mais comummente utilizadas na
área.
Os testes realizados permitiram demonstrar que, em alguns casos, o algoritmo é capaz de fornecer
sugestões úteis de forma a aproximar os termos utilizados no documento com os termos
mais utilizados no estado de arte de uma determinada área científica. Isto constitui uma evidência
de que a utilização do algoritmo desenvolvido pode melhorar a qualidade da escrita de
documentos científicos, visto que estes tendem a aproximar-se daqueles já publicados. Apesar
dos resultados apresentados não refletirem uma grande melhoria no documento, estes deverão
ser considerados uma baixa estimativa ao valor real do algoritmo. Isto é justificado pela presença
de inúmeros erros resultantes da conversão dos documentos pdf para texto, estando estes
presentes tanto na coleção de documentos, como nos testes.
As principais contribuições deste trabalho incluem a partilha do estudo realizado, o desenho e
implementação do algoritmo e o editor de texto desenvolvido como prova de conceito. A análise
de especificidade de um contexto, que advém dos testes realizados às várias áreas do conhecimento,
e a extensa coleção de documentos, totalmente compilada durante este mestrado, são
também contribuições do trabalho
Management Responses to Online Reviews: Big Data From Social Media Platforms
User-generated content from virtual communities helps businesses develop and sustain competitive advantages, which leads to asking how firms can strategically manage that content. This research, which consists of two studies, discusses management response strategies for hotel firms to gain a competitive advantage and improve customer relationship management by leveraging big data, social media analytics, and deep learning techniques. Since negative reviews' harmful effects are greater than positive comments' contribution, firms must strategise their responses to intervene in and minimise those damages. Although current literature includes a sheer amount of research that presents effective response strategies to negative reviews, they mostly overlook an extensive classification of response strategies. The first study consists of two phases and focuses on comprehensive response strategies to only negative reviews. The first phase is explorative and presents a correlation analysis between response strategies and overall ratings of hotels. It also reveals the differences in those strategies based on hotel class, average customer rating, and region. The second phase investigates effective response strategies for increasing the subsequent ratings of returning customers using logistic regression analysis. It presents that responses involving statements of admittance of mistake(s), specific action, and direct contact requests help increase following ratings of previously dissatisfied returning customers. In addition, personalising the response for better customer relationship management is particularly difficult due to the significant variability of textual reviews with various topics. The second study examines the impact of personalised management responses to positive and negative reviews on rating growth, integrating a novel method of multi-topic matching approach with a panel data analysis. It demonstrates that (a) personalised responses improve future ratings of hotels; (b) the effect of personalised responses is stronger for luxury hotels in increasing future ratings. Lastly, practical insights are provided
Tracing the evolution of service robotics : Insights from a topic modeling approach
Acord transformatiu CRUE-CSICAltres ajuts: Helmholtz Association (HIRG-0069)Altres ajuts: Russian Science Foundation (RSF grant number 19-18-00262)Taking robotic patents between 1977 and 2017 and building upon the topic modeling technique, we extract their latent topics, analyze how important these topics are over time, and how they are related to each other looking at how often they are recombined in the same patents. This allows us to differentiate between more and less important technological trends in robotics based on their stage of diffusion and position in the space of knowledge represented by a topic graph, where some topics appear isolated while others are highly interconnected. Furthermore, utilizing external reference texts that characterize service robots from a technical perspective, we propose and apply a novel approach to match the constructed topics to service robotics. The matching procedure is based on frequency and exclusivity of words overlapping between the patents and the reference texts. We identify around 20 topics belonging to service robotics. Our results corroborate earlier findings, but also provide novel insights on the content and stage of development of application areas in service robotics. With this study we contribute to a better understanding of the highly dynamic field of robotics as well as to new practices of utilizing the topic modeling approach, matching the resulting topics to external classifications and applying to them metrics from graph theory
Tracing the evolution of service robotics : Insights from a topic modeling approach
Altres ajuts: Acord transformatiu CRUE-CSICAltres ajuts: Helmholtz Association (HIRG-0069)Altres ajuts: Russian Science Foundation (RSF grant number 19-18-00262)Taking robotic patents between 1977 and 2017 and building upon the topic modeling technique, we extract their latent topics, analyze how important these topics are over time, and how they are related to each other looking at how often they are recombined in the same patents. This allows us to differentiate between more and less important technological trends in robotics based on their stage of diffusion and position in the space of knowledge represented by a topic graph, where some topics appear isolated while others are highly interconnected. Furthermore, utilizing external reference texts that characterize service robots from a technical perspective, we propose and apply a novel approach to match the constructed topics to service robotics. The matching procedure is based on frequency and exclusivity of words overlapping between the patents and the reference texts. We identify around 20 topics belonging to service robotics. Our results corroborate earlier findings, but also provide novel insights on the content and stage of development of application areas in service robotics. With this study we contribute to a better understanding of the highly dynamic field of robotics as well as to new practices of utilizing the topic modeling approach, matching the resulting topics to external classifications and applying to them metrics from graph theory
A Topic Coverage Approach to Evaluation of Topic Models
Topic models are widely used unsupervised models of text capable of learning
topics - weighted lists of words and documents - from large collections of text
documents. When topic models are used for discovery of topics in text
collections, a question that arises naturally is how well the model-induced
topics correspond to topics of interest to the analyst. In this paper we
revisit and extend a so far neglected approach to topic model evaluation based
on measuring topic coverage - computationally matching model topics with a set
of reference topics that models are expected to uncover. The approach is well
suited for analyzing models' performance in topic discovery and for large-scale
analysis of both topic models and measures of model quality. We propose new
measures of coverage and evaluate, in a series of experiments, different types
of topic models on two distinct text domains for which interest for topic
discovery exists. The experiments include evaluation of model quality, analysis
of coverage of distinct topic categories, and the analysis of the relationship
between coverage and other methods of topic model evaluation. The contributions
of the paper include new measures of coverage, insights into both topic models
and other methods of model evaluation, and the datasets and code for
facilitating future research of both topic coverage and other approaches to
topic model evaluation.Comment: Results and contributions unchanged; Added new references; Improved
the contextualization and the description of the work (abstr, intro, 7.1
concl, rw, concl); Moved technical details of data and model building to
appendices; Improved layout
- …