2,686 research outputs found

    Describing and Understanding Neighborhood Characteristics through Online Social Media

    Full text link
    Geotagged data can be used to describe regions in the world and discover local themes. However, not all data produced within a region is necessarily specifically descriptive of that area. To surface the content that is characteristic for a region, we present the geographical hierarchy model (GHM), a probabilistic model based on the assumption that data observed in a region is a random mixture of content that pertains to different levels of a hierarchy. We apply the GHM to a dataset of 8 million Flickr photos in order to discriminate between content (i.e., tags) that specifically characterizes a region (e.g., neighborhood) and content that characterizes surrounding areas or more general themes. Knowledge of the discriminative and non-discriminative terms used throughout the hierarchy enables us to quantify the uniqueness of a given region and to compare similar but distant regions. Our evaluation demonstrates that our model improves upon traditional Naive Bayes classification by 47% and hierarchical TF-IDF by 27%. We further highlight the differences and commonalities with human reasoning about what is locally characteristic for a neighborhood, distilled from ten interviews and a survey that covered themes such as time, events, and prior regional knowledgeComment: Accepted in WWW 2015, 2015, Florence, Ital

    A Machine Learning Approach to Predicting Alcohol Consumption in Adolescents From Historical Text Messaging Data

    Get PDF
    Techniques based on artificial neural networks represent the current state-of-the-art in machine learning due to the availability of improved hardware and large data sets. Here we employ doc2vec, an unsupervised neural network, to capture the semantic content of text messages sent by adolescents during high school, and encode this semantic content as numeric vectors. These vectors effectively condense the text message data into highly leverageable inputs to a logistic regression classifier in a matter of hours, as compared to the tedious and often quite lengthy task of manually coding data. Using our machine learning approach, we are able to train a logistic regression model to predict adolescents\u27 engagement in substance abuse during distinct life phases with accuracy ranging from 76.5% to 88.1%. We show the effects of grade level and text message aggregation strategy on the efficacy of document embedding generation with doc2vec. Additional examination of the vectorizations for specific terms extracted from the text message data adds quantitative depth to this analysis. We demonstrate the ability of the method used herein to overcome traditional natural language processing concerns related to unconventional orthography. These results suggest that the approach described in this thesis is a competitive and efficient alternative to existing methodologies for predicting substance abuse behaviors. This work reveals the potential for the application of machine learning-based manipulation of text messaging data to development of automatic intervention strategies against substance abuse and other adolescent challenges

    Sentiment analysis in hospitality using text mining: the case of a Portuguese eco-hotel

    Get PDF
    Jel Classification System: Z32 Tourism and Development; M30 Marketing and AdvertisingThe rapid development of the Internet and mobile devices enabled the emergence of travel and hospitality review sites, leading to a large number of customer opinion posts. While such comments may influence future demand of the targeted hotels, they can also be used by hotel managers for improving customer experience. Nevertheless, this trend poses a problem, considering information is widely scattered, making almost impossible to extract from it useful knowledge. In this study, with the aim of facilitating this process, sentiment classification of an eco-hotel is assessed through a text mining approach using several different sources of customer reviews. Two dictionaries are compiled for building the lexicon used to parse the 401 reviews collected from a Portuguese eco-hotel between January and August of 2015. Then, the latent Dirichlet allocation (LDA) modeling algorithm is applied to gather relevant topics that characterize a given hospitality issue by a sentiment. Findings of this study state that accuracy is influenced by interaction between LDA generated topic models and the correct construction of both dictionaries. These results also reveal that text mining can generate new insights into variables that have been extensively studied in hospitality industry, including that hotel food generates ordinary positive sentiments for the case studied, while hospitality generates both ordinary and strong positive feelings. Such results are valuable for hospitality management, validating the approach proposed.O rápido desenvolvimento da Internet e dos dispositivos móveis possibilitou o aparecimento de sites de viagens e sites de opinião na indústria hoteleira, levando a um grande número opiniões publicadas por parte do cliente. Embora, esses comentários possam influenciar a procura futura de certos hotéis, estes também podem ser usados pelos gestores dos hotéis para melhorar a experiência do cliente. No entanto, esta tendência representa um problema, uma vez que hoje em dia a informação se apresenta bastante ampla e dispersa, tornando quase impossível analisar todas as opiniões de clientes. Neste estudo, com o objetivo de facilitar este processo, a classificação de sentimentos de um hotel ecológico é avaliada através de uma abordagem de “text mining” usando diversas fontes de comentários de clientes. Dois dicionários foram compilados para a construção do léxico usado para analisar os 401 comentários recolhidos a partir de um Eco hotel português entre janeiro e agosto de 2015. Em seguida, o algoritmo de modelação “latent Dirichlet allocation” (LDA) é aplicado para reunir tópicos relevantes que caracterizam uma determinada questão de hospitalidade por um sentimento. Os resultados apurados neste estudo focam essencialmente que a precisão do mesmo é influenciada pela interação entre o modelo LDA, neste caso entre os tópicos por ele gerados e a correta construção de ambos os dicionários. Estes resultados revelam também que o “text mining” pode gerar novas perspetivas acerca de variáveis que têm sido extensivamente estudadas na indústria hoteleira, incluindo, no caso estudado, que a comida do hotel gera sentimentos positivos comuns, enquanto a hospitalidade gera ambos os sentimentos: positivos comuns e positivos fortes. Tais resultados são valiosos para a gestão hoteleira validando a abordagem proposta

    Community-Based Behavioral Understanding of Mobility Trends and Public Attitude through Transportation User and Agency Interactions on Social Media in the Emergence of Covid-19

    Get PDF
    The increased availability of technology-enabled transportation options and modern communication devices (smartphones, in particular) is transforming travel-related decision-making in the population differently at different places, points in time, modes of transportation, and socio-economic groups. The emergence of COVID-19 made the dynamics of passenger travel behavior more complex, forcing a worldwide, unparalleled change in human travel behavior and introducing a new normal into their existence. This dissertation explores the potential of social media platforms (SMPs) as a viable alternative to traditional approaches (e.g., travel surveys) to understand the complex dynamics of people’s mobility patterns in the emergence of COVID-19. In this dissertation, we focus on three objectives. First, a novel approach to developing comparative infographics of emerging transportation trends is introduced by natural language processing and data-driven techniques using large-scale social media data. Second, a methodology has been developed to model community-based travel behavior under different socioeconomic and demographic factors at the community level in the emergence of COVID-19 on Twitter, inferring users’ demographics to overcome sampling bias. Third, the communication patterns of different transportation agencies on Twitter regarding message kinds, communication sufficiency, consistency, and coordination were examined by applying text mining techniques and dynamic network analysis. The methodologies and findings of the dissertation will allow real-time monitoring of transportation trends by agencies, researchers, and professionals. Potential applications of the work may include: (1) identifying spatial diversity of public mobility needs and concerns through social media platforms; (2) developing new policies that would satisfy the diverse needs at different locations; (3) introducing new plans to support and celebrate equity, diversity, and inclusion in the transportation sector that would improve the efficient flow of goods and services; (4) designing new methods to model community-based travel behavior at different scales (e.g., census block, zip code, etc.) using social media data inferring users’ socio-economic and demographic properties; and (5) implementing efficient policies to improve existing communication plans, critical information dissemination efficacy, and coordination of different transportation actors to raise awareness among passengers in general and during unprecedented health crises in the fragmented communication world
    corecore