1,825 research outputs found

    Social Media Text Processing and Semantic Analysis for Smart Cities

    Get PDF
    With the rise of Social Media, people obtain and share information almost instantly on a 24/7 basis. Many research areas have tried to gain valuable insights from these large volumes of freely available user generated content. With the goal of extracting knowledge from social media streams that might be useful in the context of intelligent transportation systems and smart cities, we designed and developed a framework that provides functionalities for parallel collection of geo-located tweets from multiple pre-defined bounding boxes (cities or regions), including filtering of non-complying tweets, text pre-processing for Portuguese and English language, topic modeling, and transportation-specific text classifiers, as well as, aggregation and data visualization. We performed an exploratory data analysis of geo-located tweets in 5 different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and Melbourne, comprising a total of more than 43 million tweets in a period of 3 months. Furthermore, we performed a large scale topic modelling comparison between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are shared between both cities which despite being in the same country are considered very different regarding population, economy and lifestyle. We take advantage of recent developments in word embeddings and train such representations from the collections of geo-located tweets. We then use a combination of bag-of-embeddings and traditional bag-of-words to train travel-related classifiers in both Portuguese and English to filter travel-related content from non-related. We created specific gold-standard data to perform empirical evaluation of the resulting classifiers. Results are in line with research work in other application areas by showing the robustness of using word embeddings to learn word similarities that bag-of-words is not able to capture

    Exploring Sentiment Analysis on Twitter: Investigating Public Opinion on Migration in Brazil from 2015 to 2020

    Get PDF
    openTechnology has reshaped societal interaction and the expression of opinions. Migration is a prominent trend, and analysing social media discussions provides insights into societal perspectives. This thesis explores how events between 2015 and 2020 impacted Brazilian sentiment on Twitter about migrants and refugees. Its aim was to uncover the influence of key sociopolitical events on public sentiment, clarifying how these echoed in the digital realm. Four key objectives guided this research: (a) understanding public opinions on migrants and refugees, (b) investigating how events influenced Twitter sentiment, (c) identifying terms used in migration-related tweets, and (d) tracking sentiment shifts, especially concerning changes in government. Sentiment analysis using VADER (Valence Aware Dictionary and sEntiment Reasoner) was employed to analyse tweet data. The use of computational methods in social sciences is gaining traction, yet no analysis has been conducted before to understand the sentiments of the Brazilian population regarding migration. The analysis underscored Twitter's role in reflecting and shaping public discourse, offering insights into how major events influenced discussions on migration. In conclusion, this study illuminated the landscape of Brazilian sentiment on migration, emphasizing the significance of innovative social media analysis methodologies for policymaking and societal inclusivity in the digital age

    Artificial and Natural Topic Detection in Online Social Networks

    Get PDF
    Online Social Networks (OSNs), such as Twitter, offer attractive means of social interactions and communications, but also raise privacy and security issues. The OSNs provide valuable information to marketing and competitiveness based on users posts and opinions stored inside a huge volume of data from several themes, topics, and subjects. In order to mining the topics discussed on an OSN we present a novel application of Louvain method for TopicModeling based on communities detection in graphs by modularity. The proposed approach succeeded in finding topics in five different datasets composed of textual content from Twitter and Youtube. Another important contribution achieved was about the presence of texts posted by spammers. In this case, a particular behavior observed by graph community architecture (density and degree) allows the indication of a topic strength and the classification of it as natural or artificial. The later created by the spammers on OSNs

    Business Intelligence Applied to Sentiment Analysis in a Higher Education Institution

    Get PDF
    Social media allows institutions to not only publicize their work and get feedback from the community about it, but also to keep in touch with their alumni network and foster conversations between the academic community. While sentiment analysis allows a better understanding of what is being said about a brand and how to improve the use of this communication platform. The main goal of the current work is to build a Business Intelligence System for a Higher Education Institution (HEI) based on content extracted from social media. So, Posts, likes, dislikes, shares, comments and number of visits were extracted from Facebook, Google Maps Reviews, Instagram, LinkedIn, Student Forums, Twitter and YouTube. With this data and the ETL process a Data Warehouse (DW) in SQL Server and 17 Dashboards in Power BI were developed. Posts that had the most likes were about reporting a death of someone from the school, the school mascot, the pandemic or welcoming new students. Overall, the weekends were the days with more interactions. Students are concerned about accommodation, transport, and the school academic offer. This analysis allows a better understanding of what is being said about this HEI and how to improve the communication strateg

    TrollBus, An Empirical Study Of Features For Troll Detection

    Get PDF
    No atual contexto de redes sociais, a discussão política tornou-se um evento normal. Utilizadores de todos os segmentos do espetro político têm a possibilidade de expressar as suas opiniões livremente e discutir as suas visões em várias redes sociais, incluindo o Twitter. Desde 2016, um grupo de utilizadores cujo objetivo é polarizar discussões e semear a discórdia começou a ganhar notoriedade nesta rede social. Estas contas são conhecidas como Trolls, e têm sido ligadas a vários eventos na história recente, tais como a interferência em eleições e a organização de manifestações violentas. Desde a sua descoberta, vários trabalhos de investigação têm sido realizados de modo a detetar estas contas através de machine learning. As abordagens existentes usaram tipos diferentes de atributos. O objetivo deste trabalho é comparar esses grupos de atributos. Para tal, um estudo empírico foi realizado, no qual estes atributos são adaptados à comunidade portuguesa do Twitter. O objetivo deste trabalho foi de analisar as múltiplas abordagens realizadas para a deteção de trolls, com uma descrição das suas features e a sua comparação, quer individualmente quer em grupo. Para tal, um estudo empírico foi realizado, em que estas features são adaptadas à comunidade portuguesa do Twitter. Os dados para este projeto foram recolhidos através do SocialBus, uma ferramenta para a recolha, processamento e armazenamento de dados de redes sociais, nomeadamente do Twitter. O conjunto de contas usado para a recolha de dados foi obtido a partir de jornalistas de política portugueses, e a anotação de trolls foi realizada através de um conjunto restrito de regras comportamentais, auxiliada por uma função de pontuação. Um novo módulo para esta plataforma foi desenvolvido, chamado Trollbus, que realiza a deteção de trolls em tempo real. Um dataset público foi também disponibilizado. Os atributos do melhor modelo combinam os metadados do perfil de uma conta com os aspetos superficiais presentes no seu texto. O grupo de atributos mais importantes revelou ser os aspetos numéricos dos dados, com o mais importante a revelar ser a presença de insultos políticos.In today's social network context, the discussion of politics online has become a normal event. Users from all sides of the political spectrum are able to express their opinions freely and discuss their views in various social networks, including Twitter. From 2016 onward, a group of users whose objective is to polarize discussions and sow discord began to gain notoriety in this social network. These accounts are known as Trolls, and they have been linked to several events in recent history such as the influencing of elections and the organizing of violent protests. Since their discovery, several approaches have been developed to detect these accounts using machine learning techniques. Existing approaches have used different types of features. The goal of this work is to compare those different sets of features. To do so, an empirical study was performed, which adapts these features to the Portuguese Twitter community. The necessary data was collected through SocialBus, a tool for the collection, processing and storage of data from social networks, namely Twitter. The set of accounts used to collect the data were obtained from Portuguese political journalists and the labelling of trolls was performed with a strict set of behavioural rules, aided by a scoring function. A new module for SocialBus was developed, called Trollbus, which performs troll detection in real time. A public dataset was also released. The features of the best model obtained combine an account's profile metadata with the superficial aspects present in its text. The most important feature set noted to be the numerical aspects of the text, with the most important feature revealing to be the presence of political insults

    Towards Automated Recipe Genre Classification using Semi-Supervised Learning

    Full text link
    Sharing cooking recipes is a great way to exchange culinary ideas and provide instructions for food preparation. However, categorizing raw recipes found online into appropriate food genres can be challenging due to a lack of adequate labeled data. In this study, we present a dataset named the ``Assorted, Archetypal, and Annotated Two Million Extended (3A2M+) Cooking Recipe Dataset" that contains two million culinary recipes labeled in respective categories with extended named entities extracted from recipe descriptions. This collection of data includes various features such as title, NER, directions, and extended NER, as well as nine different labels representing genres including bakery, drinks, non-veg, vegetables, fast food, cereals, meals, sides, and fusions. The proposed pipeline named 3A2M+ extends the size of the Named Entity Recognition (NER) list to address missing named entities like heat, time or process from the recipe directions using two NER extraction tools. 3A2M+ dataset provides a comprehensive solution to the various challenging recipe-related tasks, including classification, named entity recognition, and recipe generation. Furthermore, we have demonstrated traditional machine learning, deep learning and pre-trained language models to classify the recipes into their corresponding genre and achieved an overall accuracy of 98.6\%. Our investigation indicates that the title feature played a more significant role in classifying the genre

    Dangerous Dice: Playing with Artificial Intelligence and Populism during Brazil\u27s 2018 Election

    Get PDF
    With the advent of artificial intelligence and the resurgence of populism, in particular right-wing populism, we see nationalist parties that were once on the fringes of mainstream politics gain power around the world. Putting under the limelight the recent electoral victories of world leaders riding this new wave of populism, we recognize a troubling new reality: the confluence of artificial intelligence and populism allows for election interference through the spread of disinformation, propaganda, and emotionally charged populist rhetoric on social media. This tectonic shift in election tactics used by extreme nationalists presents an existential threat to democracy, with the potential to lead to a dystopian society where the will of the people is replaced by the will of algorithms. The victory of Brazilian President Jair Bolsonaro during the 2018 election and his subsequent presidency brought into focus this new dynamism of political forces: emotionally charged populist rhetoric and AI-manipulated social media. In order to combat this new danger posed by digital populists, such as the danger posed by Bolsonaro to Brazil’s democracy, new policies on artificial intelligence (AI) must be implemented to protect elections. To shape policy on this new emerging technology, it is imperative that governments understand the nature of AI and in particular, the different ways it can be weaponized during election campaigns. However, it is even more critical to inform society as a whole about the consequences AI can cause as despots can use its power to keep the people under draconian control

    A model to improve the Evaluation and Selection of Public Contest´s Candidates (Police Officers) based on AI technologies

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Business AnalyticsThe number of candidates applying to Public Contests is increasing compared to the number of Human Resources employees required for selecting them for Police Forces. This work intends to perceive how those Public Institutions can evaluate and select their candidates efficiently during the different phases of the recruitment process, and for achieving this purpose AI approaches will be studied. This paper presents two research questions and introduces a corresponding systematic literature review, focusing on AI technologies, so the reader is able to understand which are most used and more appropriate to be applied to Police Forces as a complementary recruitment strategy of the National Criminal Investigation Police agency of Portugal – Polícia Judiciária. Design Science Research (DSR) was the methodological approach chosen. The suggestion of a theoretical framework is the main contribution of this study in pair with the segmentation of the candidates (future Criminal Inspectors). It also helped to comprehend the most important facts facing Public Institutions regarding the usage of AI technologies, to make decisions about evaluating and selecting candidates. Following the PRISMA methodology guidelines, a systematic literature review and meta-analyses method was adopted to identify how can the usage and exploitation of transparent AI have a positive impact on the recruitment process of a Public Institution, resulting in an analysis of 34 papers published between 2017 and 2021. The AI-based theoretical framework, applicable within the analysis of literature papers, solves the problem of how the Institutions can gain insights about their candidates while profiling them; how to obtain more accurate information from the interview phase; and how to reach a more rigorous assessment of their emotional intelligence providing a better alignment of moral values. This way, this work aims to advise the improvement of the decision making to be taken by a recruiter of a Police Force Institution, turning it into a more automated and evidence-based decision when it comes to recruiting the adequate candidate for the place
    corecore