37 research outputs found
Characterizing Geo-located Tweets in Brazilian Megacities
This work presents a framework for collecting, processing and mining
geo-located tweets in order to extract meaningful and actionable knowledge in
the context of smart cities. We collected and characterized more than 9M tweets
from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We
performed topic modeling using the Latent Dirichlet Allocation model to produce
an unsupervised distribution of semantic topics over the stream of geo-located
tweets as well as a distribution of words over those topics. We manually
labeled and aggregated similar topics obtaining a total of 29 different topics
across both cities. Results showed similarities in the majority of topics for
both cities, reflecting similar interests and concerns among the population of
Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more
predominant in one of the cities
Characterizing Geo-located Tweets in Brazilian Megacities
This work presents a framework for collecting, processing and mining
geo-located tweets in order to extract meaningful and actionable knowledge in
the context of smart cities. We collected and characterized more than 9M tweets
from the two biggest cities in Brazil, Rio de Janeiro and S\~ao Paulo. We
performed topic modeling using the Latent Dirichlet Allocation model to produce
an unsupervised distribution of semantic topics over the stream of geo-located
tweets as well as a distribution of words over those topics. We manually
labeled and aggregated similar topics obtaining a total of 29 different topics
across both cities. Results showed similarities in the majority of topics for
both cities, reflecting similar interests and concerns among the population of
Rio de Janeiro and S\~ao Paulo. Nevertheless, some specific topics are more
predominant in one of the cities
Social Media Text Processing and Semantic Analysis for Smart Cities
With the rise of Social Media, people obtain and share information almost
instantly on a 24/7 basis. Many research areas have tried to gain valuable
insights from these large volumes of freely available user generated content.
With the goal of extracting knowledge from social media streams that might be
useful in the context of intelligent transportation systems and smart cities,
we designed and developed a framework that provides functionalities for
parallel collection of geo-located tweets from multiple pre-defined bounding
boxes (cities or regions), including filtering of non-complying tweets, text
pre-processing for Portuguese and English language, topic modeling, and
transportation-specific text classifiers, as well as, aggregation and data
visualization.
We performed an exploratory data analysis of geo-located tweets in 5
different cities: Rio de Janeiro, S\~ao Paulo, New York City, London and
Melbourne, comprising a total of more than 43 million tweets in a period of 3
months. Furthermore, we performed a large scale topic modelling comparison
between Rio de Janeiro and S\~ao Paulo. Interestingly, most of the topics are
shared between both cities which despite being in the same country are
considered very different regarding population, economy and lifestyle.
We take advantage of recent developments in word embeddings and train such
representations from the collections of geo-located tweets. We then use a
combination of bag-of-embeddings and traditional bag-of-words to train
travel-related classifiers in both Portuguese and English to filter
travel-related content from non-related. We created specific gold-standard data
to perform empirical evaluation of the resulting classifiers. Results are in
line with research work in other application areas by showing the robustness of
using word embeddings to learn word similarities that bag-of-words is not able
to capture
Health-Related Emergency Disaster Risk Management (Health-EDRM)
Disasters such as earthquakes, cyclones, floods, heat waves, nuclear accidents, and large scale pollution incidents take lives and cause exceptionally large health problems. The majority of large-scale disasters affect the most vulnerable populations, which are often comprised of people of extreme ages, in remote living areas, with endemic poverty, and with low literacy. Health-related emergency disaster risk management (Health-EDRM) [1] refers to the systematic analysis and management of health risks surrounding emergencies and disasters; it plays an important role in reducing hazards and vulnerability along with extending preparedness, response, and recovery measures. This concept encompasses risk analyses and interventions, such as accessible early warning systems, timely deployment of relief workers, and the provision of suitable drugs and medical equipment, to decrease the impact of disaster on people before, during, and after disaster events. Disaster risk profiling and interventions can be at the personal/household, community, and system/political levels; they can be targeted at specific health risks including respiratory issues caused by indoor burning, re-emergence of infectious disease due to low vaccination coverage, and gastrointestinal problems resulting from unregulated waste management. Unfortunately, there has been a major gap in the scientific literature regarding Health-EDRM. The aim of this Special Issue of IJERPH was to present papers describing/reporting the latest disaster and health risk analyses, as well as interventions for health-related disaster risk management, in an effort to address this gap and facilitate major global policies and initiatives for disaster risk reduction
Extracción automática de categorias en tuits
[ES] El presente proyecto aborda la creación de un clasificador para discernir de manera
automática de qué temas se están hablando en Twitter. A partir de el algoritmo Latent
Dirichlet Allocation se obtienen una serie de agrupaciones de palabras. Sin embargo, no se
proporciona el tema asociado a cada grupo de palabras. En este proyecto se propone un
clasificador entrenado con Wikipedia para discernir de qué tratan los temas de la salida de
LDA. El clasificador se ha aplicado a un dataset de Tuits de ciudaddes de EE.UU. para la
extracción de las categor¿¿as de las que más hablan los usuarios.[EN] The present project addresses the creation of a classifier to automatically discern which
topics are being discussed on Twitter. A series of groupings of words are obtained from
the Latent Dirichlet Allocation algorithm. However, the theme associated with each group
of words is not provided. In this project a classifier trained with Wikipedia is proposed to
discern what the topics of the LDA exit are about. The classifier has been applied to a
tweet dataset of US cities. for the extraction of the categories that most users talk about.Villar Lafuente, CJ. (2018). Extracción automática de categorias en tuits. http://hdl.handle.net/10251/147062TFG
Global Art Market in the Aftermath of COVID-19
Although the global art market has often been resilient to international economic and political events, it has recently faced some of its biggest challenges under the influence of COVID-19. Among others, the pandemic and the accompanying restrictive administrative measures taken by world governments have significantly influenced such key economic indicators as gallery employment, art sales, and the organization of international art fairs. The Special Issue "Global Art Market in the Aftermath of COVID-19" studies various economic, social, and political impacts of the COVID-19 pandemic on the global art market’s current state and future evolution