Search CORE

7 research outputs found

Enhancing API Documentation through BERTopic Modeling and Summarization

Author: Naghshzan AmirHossein
Ratte Sylvie
Publication venue
Publication date: 17/08/2023
Field of study

As the amount of textual data in various fields, including software development, continues to grow, there is a pressing demand for efficient and effective extraction and presentation of meaningful insights. This paper presents a unique approach to address this need, focusing on the complexities of interpreting Application Programming Interface (API) documentation. While official API documentation serves as a primary source of information for developers, it can often be extensive and lacks user-friendliness. In light of this, developers frequently resort to unofficial sources like Stack Overflow and GitHub. Our novel approach employs the strengths of BERTopic for topic modeling and Natural Language Processing (NLP) to automatically generate summaries of API documentation, thereby creating a more efficient method for developers to extract the information they need. The produced summaries and topics are evaluated based on their performance, coherence, and interoperability. The findings of this research contribute to the field of API documentation analysis by providing insights into recurring topics, identifying common issues, and generating potential solutions. By improving the accessibility and efficiency of API documentation comprehension, our work aims to enhance the software development process and empower developers with practical tools for navigating complex APIs

arXiv.org e-Print Archive

a framework to explore correlations between space-based and place-based user-generated content

Author: Painho Marco
Tang Vicente
Publication venue
Publication date: 03/08/2023
Field of study

Tang, V., & Painho, M. (2023). Content-location relationships: a framework to explore correlations between space-based and place-based user-generated content. International Journal Of Geographical Information Science, 37(8), 1840–1871. https://doi.org/10.1080/13658816.2023.2213869 ---The authors acknowledge the funding from the Portuguese national funding agency for science, research and technology (Fundação para a Ciência e a Tecnologia – FCT) through the CityMe project (EXPL/GES-URB/1429/2021; https://cityme.novaims.unl.pt/) and the project UIDB/04152/2020 - Centro de Investigação em Gestão de Informação (MagIC)/NOVA IMS.The use of social media and location-based networks through GPS-enabled devices provides geospatial data for a plethora of applications in urban studies. However, the extent to which information found in geo-tagged social media activity corresponds to the spatial context is still a topic of debate. In this article, we developed a framework aimed at retrieving the thematic and spatial relationships between content originated from space-based (Twitter) and place-based (Google Places and OSM) sources of geographic user-generated content based on topics identified by the embedding-based BERTopic model. The contribution of the framework lies on the combination of methods that were selected to improve previous works focused on content-location relationships. Using the city of Lisbon (Portugal) to test our methodology, we first applied the embedding-based topic model to aggregated textual data coming from each source. Results of the analysis evidenced the complexity of content-location relationships, which are mostly based on thematic profiles. Nonetheless, the framework can be employed in other cities and extended with other metrics to enrich the research aimed at exploring the correlation between online discourse and geography.publishersversionpublishe

Repositório da Universidade Nova de Lisboa

Customer Review Analysis

Author: Tueschen Philipp
Publication venue
Publication date: 24/10/2022
Field of study

Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceOver the last years, Cerascreen has grown rapidly and expanded into more than 20 countries, always focusing on offering more diverse products, supplements, and services. Unfortunately, it collected a lot of data during these years, which was not yet stored, losing valuable insights. In a new initiative Cerascreen wants to be the most trusted digital predictive health platform. Therefore, it intends to utilize its data to understand its customers better and offer superior products and services according to the customer’s needs. The focus of this internship report was to find a way to analyze Cerascreen’s customers’ reviews to understand its customers better and respond to properly the given feedback. In addition, since the reviews have not been stored before, this report also deals with review retrieval. An exploratory data analysis of the reviews’ ratings and texts was conducted to find the first significant insights. The investigation found that although the overall review consensus was positive, it differed by country, while the reviews’ length was related to their ratings. A topic model was developed to find more information on what customers are talking about. The Model was able to find several different topics, including product-, supplement-, and servicespecific reviews. Lastly, a newly created key performance indicator about customers satisfaction uses the new insights about the ratings and the review topics, which a dashboard partially visualized through a dashboard

Repositório da Universidade Nova de Lisboa

As comunidades temáticas e os discursos em torno da contracepção: análise qualitativa e de big data em ciências sociais

Author: Teixeira Maria Catarina Mateus de Azevedo do Nascimento
Publication venue
Publication date: 09/05/2022
Field of study

A presente dissertação de mestrado, realizada no âmbito do Mestrado de Comunicação de Ciência, na Faculdade de Ciências Sociais e Humanas da Universidade NOVA de Lisboa, é uma proposta de análise de discursos sobre a contracepção na rede social Instagram, escritos em português, ao longo da última década. Pretende-se, com este estudo, dar a conhecer (i) as comunidades temáticas constituídas a partir dos discursos sobre métodos contraceptivos no Instagram; (ii) os principais tópicos tratados nestas interacções; e (iii) o modo como estes são tratados e narrativizados, do ponto de vista retórico e formal. Os dados em análise – um corpus total de 103 367 ‘posts’, composto por mais de 12 milhões de palavras – abrangem um período compreendido entre 2012 e 2021, tendo sido extraídos a partir de uma selecção de 27 hashtags. O volume de dados obtidos implicou a aplicação de métodos de análise de big data em ciências sociais, tais como o processamento de linguagem natural, a teoria de redes e grafos, a modelação de tópicos e a detecção de comunidades em redes. Para tal, recorremos a um conjunto de ferramentas computacionais, entre as quais destacamos, desde logo, o Phantombuster, utilizado na extracção de ‘posts’, bem como o NLTK (Natural Language Toolkit) e o Spacy (Industrial-Strength Natural Language Processing), aplicados nas suas versões adaptadas à língua portuguesa, no pré-processamento de texto. Para a construção das redes e detecção de comunidades nestas, recorremos, num primeiro momento, ao NetworkX, um pacote Python para a criação, manipulação e análise da estrutura, dinâmica e funções de redes complexas, seguido do Gephi, uma ferramenta especializada para a acessível visualização, análise e manipulação de redes e grafos. A modelação de tópicos foi feita com recurso a uma das mais recentes ferramentas do domínio do processamento de linguagem natural, o BERTopic – uma aplicação específica da arquitectura BERT, da Google. E por fim, os resultados obtidos a partir destas técnicas e ferramentas foram objecto de uma análise qualitativa, ou humana, do conteúdo de amostras seleccionadas (i.e., purposive sampling), para interpretação e ligação entre teoria e dados. Os resultados obtidos sugerem a presença de 45 tópicos, agregados em 8 temas, entre os quais se destacam a predominância da partilha – e produção – informal de conhecimento, sendo esta mobilizada, muitas vezes, no confronto com os consensos socialmente estabelecidos (Fraser, 1990). A partilha de experiências pessoais, particularmente das negativas, com a pílula anticoncepcional é um dos muitos exemplos encontrados, e talvez o mais significativo em termos estatísticos, da utilização do Instagram como um “espaço seguro” (Clark-Parsons, 2018), propício à partilha e desestigmatização de certas temáticas relacionadas com a sexualidade feminina (Doshi, 2021) – uma das práticas manifestamente predominantes nos discursos em torno da contracepção.The present master's dissertation, held within the science master's degree of communication, at the Faculdade de Ciências Sociais e Humanas of the Universidade NOVA de Lisboa, is a proposal for discourse analysis about contraception, on the Instagram social network, written in Portuguese, throughout of the last decade. It is intended, with this study, to make known (i) the thematic communities constituted from the discourses about contraceptive methods on Instagram; (ii) the main topics treated in these interactions; and (iii) the way they are handled and narrated, from the rhetorical and formal point of view. The data under analysis – a total corpus of 103 367 ‘posts’, consisting of over 12 million words – cover the period between 2012 and 2021, having been extracted from a 27 hashtags selection. The volume of data obtained implied the application of Big Data analysis methods in Social Sciences, such as natural language processing, network and graph theory, topic modeling and community detection in networks. To this end, we resorted to a set of computational tools, among which we highlight, from the outset, the Phantombuster, used in the extraction of ‘posts’, as well as NLTK (Natural Language Toolkit) and SPACY (Industrial-Strength Natural Language Processing), in their versions specifically adapted to the Portuguese language, in the pre-processing of text. For the construction of networks and community detection in them, we have resorted, at first, to NetworkX, a Python package for the creation, manipulation and analysis of the structure, dynamics, and functions of complex networks, followed by Gephi, a specialized tool for affordable visualization, analysis and manipulation of networks and graphs. Topic modeling was made using one of the latest tools in the domain of natural language processing, BERTopic – a specific application of Google's BERT architecture. And finally, the results obtained from these techniques and tools were the subject of a qualitative or human analysis of the content of purposefully selected samples, for interpretation and connection between theory and data. The results suggest the presence of 45 topics, added in 8 themes, including the predominance of informal sharing – and producing – of knowledge, which is often assembled in confrontation with socially established consensus (Fraser, 1990). The sharing of personal experiences with the contraceptive pill, particularly the negative ones, is one of the many examples found, and perhaps the most significant in statistical terms, of the use of Instagram as a “safe space” (Clark-Parsons, 2018), conducive to sharing and deastigmatization of certain themes related to female sexuality (Doshi, 2021) – one of the manifestly predominant practices in the discourses surrounding contraception

Repositório da Universidade Nova de Lisboa

Archives, Access and Artificial Intelligence: Working with Born-Digital and Digitized Archival Collections

Author
Publication venue: 'Transcript Verlag'
Publication date: 01/01/2022
Field of study

Digital archives are transforming the Humanities and the Sciences. Digitized collections of newspapers and books have pushed scholars to develop new, data-rich methods. Born-digital archives are now better preserved and managed thanks to the development of open-access and commercial software. Digital Humanities have moved from the fringe to the center of academia. Yet, the path from the appraisal of records to their analysis is far from smooth. This book explores crossovers between various disciplines to improve the discoverability, accessibility, and use of born-digital archives and other cultural assets

SSOAR - Social Science Open Access Repository

Archives, Access and Artificial Intelligence

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 05/05/2022
Field of study

Directory of Open Access Books (DOAB)

Archives, Access and Artificial Intelligence

Author
Publication venue: 'Walter de Gruyter GmbH'
Publication date
Field of study

OAPEN Library