2,686 research outputs found
Describing and Understanding Neighborhood Characteristics through Online Social Media
Geotagged data can be used to describe regions in the world and discover
local themes. However, not all data produced within a region is necessarily
specifically descriptive of that area. To surface the content that is
characteristic for a region, we present the geographical hierarchy model (GHM),
a probabilistic model based on the assumption that data observed in a region is
a random mixture of content that pertains to different levels of a hierarchy.
We apply the GHM to a dataset of 8 million Flickr photos in order to
discriminate between content (i.e., tags) that specifically characterizes a
region (e.g., neighborhood) and content that characterizes surrounding areas or
more general themes. Knowledge of the discriminative and non-discriminative
terms used throughout the hierarchy enables us to quantify the uniqueness of a
given region and to compare similar but distant regions. Our evaluation
demonstrates that our model improves upon traditional Naive Bayes
classification by 47% and hierarchical TF-IDF by 27%. We further highlight the
differences and commonalities with human reasoning about what is locally
characteristic for a neighborhood, distilled from ten interviews and a survey
that covered themes such as time, events, and prior regional knowledgeComment: Accepted in WWW 2015, 2015, Florence, Ital
A Machine Learning Approach to Predicting Alcohol Consumption in Adolescents From Historical Text Messaging Data
Techniques based on artificial neural networks represent the current state-of-the-art in machine learning due to the availability of improved hardware and large data sets. Here we employ doc2vec, an unsupervised neural network, to capture the semantic content of text messages sent by adolescents during high school, and encode this semantic content as numeric vectors. These vectors effectively condense the text message data into highly leverageable inputs to a logistic regression classifier in a matter of hours, as compared to the tedious and often quite lengthy task of manually coding data. Using our machine learning approach, we are able to train a logistic regression model to predict adolescents\u27 engagement in substance abuse during distinct life phases with accuracy ranging from 76.5% to 88.1%. We show the effects of grade level and text message aggregation strategy on the efficacy of document embedding generation with doc2vec. Additional examination of the vectorizations for specific terms extracted from the text message data adds quantitative depth to this analysis. We demonstrate the ability of the method used herein to overcome traditional natural language processing concerns related to unconventional orthography. These results suggest that the approach described in this thesis is a competitive and efficient alternative to existing methodologies for predicting substance abuse behaviors. This work reveals the potential for the application of machine learning-based manipulation of text messaging data to development of automatic intervention strategies against substance abuse and other adolescent challenges
Recommended from our members
Semantic Sentiment Analysis of Microblogs
Microblogs and social media platforms are now considered among the most popular forms of online communication. Through a platform like Twitter, much information reflecting people's opinions and attitudes is published and shared among users on a daily basis. This has recently brought great opportunities to companies interested in tracking and monitoring the reputation of their brands and businesses, and to policy makers and politicians to support their assessment of public opinions about their policies or political issues.
A wide range of approaches to sentiment analysis on Twitter, and other similar microblogging platforms, have been recently built. Most of these approaches rely mainly on the presence of affect words or syntactic structures that explicitly and unambiguously reflect sentiment (e.g., "great'', "terrible''). However, these approaches are semantically weak, that is, they do not account for the semantics of words when detecting their sentiment in text. This is problematic since the sentiment of words, in many cases, is associated with their semantics, either along the context they occur within (e.g., "great'' is negative in the context "pain'') or the conceptual meaning associated with the words (e.g., "Ebola" is negative when its associated semantic concept is "Virus").
This thesis investigates the role of words' semantics in sentiment analysis of microblogs, aiming mainly at addressing the above problem. In particular, Twitter is used as a case study of microblogging platforms to investigate whether capturing the sentiment of words with respect to their semantics leads to more accurate sentiment analysis models on Twitter. To this end, several approaches are proposed in this thesis for extracting and incorporating two types of word semantics for sentiment analysis: contextual semantics (i.e., semantics captured from words' co-occurrences) and conceptual semantics (i.e., semantics extracted from external knowledge sources).
Experiments are conducted with both types of semantics by assessing their impact in three popular sentiment analysis tasks on Twitter; entity-level sentiment analysis, tweet-level sentiment analysis and context-sensitive sentiment lexicon adaptation. Evaluation under each sentiment analysis task includes several sentiment lexicons, and up to 9 Twitter datasets of different characteristics, as well as comparing against several state-of-the-art sentiment analysis approaches widely used in the literature.
The findings from this body of work demonstrate the value of using semantics in sentiment analysis on Twitter. The proposed approaches, which consider words' semantics for sentiment analysis at both, entity and tweet levels, surpass non-semantic approaches in most datasets
Sentiment analysis in hospitality using text mining: the case of a Portuguese eco-hotel
Jel Classification System: Z32 Tourism and Development; M30 Marketing and AdvertisingThe rapid development of the Internet and mobile devices enabled the emergence of travel and hospitality review sites, leading to a large number of customer opinion posts. While such comments may influence future demand of the targeted hotels, they can also be used by hotel managers for improving customer experience. Nevertheless, this trend poses a problem, considering information is widely scattered, making almost impossible to extract from it useful knowledge.
In this study, with the aim of facilitating this process, sentiment classification of an eco-hotel is assessed through a text mining approach using several different sources of customer reviews. Two dictionaries are compiled for building the lexicon used to parse the 401 reviews collected from a Portuguese eco-hotel between January and August of 2015. Then, the latent Dirichlet allocation (LDA) modeling algorithm is applied to gather relevant topics that characterize a given hospitality issue by a sentiment.
Findings of this study state that accuracy is influenced by interaction between LDA generated topic models and the correct construction of both dictionaries. These results also reveal that text mining can generate new insights into variables that have been extensively studied in hospitality industry, including that hotel food generates ordinary positive sentiments for the case studied, while hospitality generates both ordinary and strong positive feelings. Such results are valuable for hospitality management, validating the approach proposed.O rápido desenvolvimento da Internet e dos dispositivos móveis possibilitou o aparecimento de sites de viagens e sites de opinião na indústria hoteleira, levando a um grande número opiniões publicadas por parte do cliente. Embora, esses comentários possam influenciar a procura futura de certos hotéis, estes também podem ser usados pelos gestores dos hotéis para melhorar a experiência do cliente. No entanto, esta tendência representa um problema, uma vez que hoje em dia a informação se apresenta bastante ampla e dispersa, tornando quase impossível analisar todas as opiniões de clientes.
Neste estudo, com o objetivo de facilitar este processo, a classificação de sentimentos de um hotel ecológico é avaliada através de uma abordagem de “text mining” usando diversas fontes de comentários de clientes. Dois dicionários foram compilados para a construção do léxico usado para analisar os 401 comentários recolhidos a partir de um Eco hotel português entre janeiro e agosto de 2015. Em seguida, o algoritmo de modelação “latent Dirichlet allocation” (LDA) é aplicado para reunir tópicos relevantes que caracterizam uma determinada questão de hospitalidade por um sentimento.
Os resultados apurados neste estudo focam essencialmente que a precisão do mesmo é influenciada pela interação entre o modelo LDA, neste caso entre os tópicos por ele gerados e a correta construção de ambos os dicionários. Estes resultados revelam também que o “text mining” pode gerar novas perspetivas acerca de variáveis que têm sido extensivamente estudadas na indústria hoteleira, incluindo, no caso estudado, que a comida do hotel gera sentimentos positivos comuns, enquanto a hospitalidade gera ambos os sentimentos: positivos comuns e positivos fortes. Tais resultados são valiosos para a gestão hoteleira validando a abordagem proposta
Community-Based Behavioral Understanding of Mobility Trends and Public Attitude through Transportation User and Agency Interactions on Social Media in the Emergence of Covid-19
The increased availability of technology-enabled transportation options and modern communication devices (smartphones, in particular) is transforming travel-related decision-making in the population differently at different places, points in time, modes of transportation, and socio-economic groups. The emergence of COVID-19 made the dynamics of passenger travel behavior more complex, forcing a worldwide, unparalleled change in human travel behavior and introducing a new normal into their existence. This dissertation explores the potential of social media platforms (SMPs) as a viable alternative to traditional approaches (e.g., travel surveys) to understand the complex dynamics of people’s mobility patterns in the emergence of COVID-19. In this dissertation, we focus on three objectives. First, a novel approach to developing comparative infographics of emerging transportation trends is introduced by natural language processing and data-driven techniques using large-scale social media data. Second, a methodology has been developed to model community-based travel behavior under different socioeconomic and demographic factors at the community level in the emergence of COVID-19 on Twitter, inferring users’ demographics to overcome sampling bias. Third, the communication patterns of different transportation agencies on Twitter regarding message kinds, communication sufficiency, consistency, and coordination were examined by applying text mining techniques and dynamic network analysis.
The methodologies and findings of the dissertation will allow real-time monitoring of transportation trends by agencies, researchers, and professionals. Potential applications of the work may include: (1) identifying spatial diversity of public mobility needs and concerns through social media platforms; (2) developing new policies that would satisfy the diverse needs at different locations; (3) introducing new plans to support and celebrate equity, diversity, and inclusion in the transportation sector that would improve the efficient flow of goods and services; (4) designing new methods to model community-based travel behavior at different scales (e.g., census block, zip code, etc.) using social media data inferring users’ socio-economic and demographic properties; and (5) implementing efficient policies to improve existing communication plans, critical information dissemination efficacy, and coordination of different transportation actors to raise awareness among passengers in general and during unprecedented health crises in the fragmented communication world
- …