5 research outputs found
Text Analytics and Spatial Visualization of Social Media Data during the Various Stages of a Disaster: The Case of Hurricane Dorian
Social media platforms have become increasingly prevalent means of human communication ever since they grew in popularity across the world. As a result, more text data is available than ever before. This abundance of information is advancing the field of text data analysis and natural language processing by allowing the evaluation of more massive data sets within a specific context. One specific context that has not been as explored is the evaluation of text data generated during a natural disaster. Having access to the right information at the right time is critical during any phase of the disaster. Data created and shared on social networks can be as essential as other supplies like bottled water and batteries (Aldrich, 2017). One example of using social media is during the 2016 flood in Louisiana. A flood inundation map was created with the help of the Baton Rouge fire department, public officials, NOAA and air patrol imagery, and the social media users and was further edited by the social media users (Kim & Hastak, 2018). The focus of this project is to analyze a dataset of tweets collected before, during, and after Hurricane Dorian related to “HurricaneDorian.” The goal is to provide a framework for disaster management through text mining and spatial analysis techniques using Twitter data. Specifically, this research aims to answer the following research questions: How can social media (Twitter) be utilized to identify, process, and comprehend critical elements of an incident or situation during a natural disaster? How do the features of tweets’ (topic, sentiment, etc.) and characteristics of users (influence, category, etc.) impact the level of attention received by tweets? Do these characteristics change based on the distance (temporal and spatial) from a natural disaster
Applying Text Data Analytics Techniques to Wine Reviews
Abstract When purchasing a bottle of wine, we usually rely on the smell and flavor descriptions from experts’ wine reviews. Just like restaurants, critics’ reviews count as a marketing strategy in the wine industry. If your wine consistently impresses the tasters and makes their top picks, there is a good chance customer will follow and buy it. A study by Friberg and Gronqvist (2012) found that the effect of a favorable review peaks in the week after publication with an increase in demand of 6 percent, and the effect remains significant for more than 20 weeks (Friberg & Grönqvist, 2012). This project is part of a student paper for a Text Data Analytics course. The goal of this study is to evaluate the description of wines provided by the tasters. The data was scraped from Wine Enthusiast (https://www.winemag.com/), a world-known American multi-channel marketer during the week of June 15th, 2017. The dataset includes fields such as the type of grapes used to make the wine, the country that the wine is from, the number of points Wine Enthusiast rated the wine on a scale of 1-100, the cost for a bottle of the wine, taster full name, description of the wine by a sommelier, etc. We used R programming to analyze the dataset. We will apply exploratory data analysis methods to find insights and interesting facts about this dataset. Specifically, we will count the number of reviews based on variety and flavors. Then, we will look at the total number of reviews by the taster. Tasters with at least 100,000 reviews can be categorized as master reviewers. We will use this knowledge to categorize reviews based on their expertise and experiences. We will also examine which variety of wine received the best score by reviewers and identify the reviewers’ favorite types of wine. We will also calculate the point distribution based on reviewers. Concerning the description column, the first step will be to pre-process and clean the text to deal with the missing values, remove the special characters, etc. We will apply descriptive text analytics techniques to understand the tasters’ reviews better. We will calculate the most frequently used words in the descriptions, as well as the most important terms based on TF-IDF. We will also apply sentiment analysis and topic modeling techniques to categorize the tasters’ reviews. Sentiment analysis enables us to extract insights from qualitative data, such as wine reviews. These insights can be used by the winemaker to respond to their clients’ needs. By detecting positive, neutral, and negative opinions within the text, we can understand how tasters feel about a particular wine and make data-driven decisions. We will identify the polarity of reviews based on the taster’s name, the varieties of wine, and the country. Polarity in sentiment analysis refers to identifying sentiment orientation (positive, neutral, and negative). Using it for our reviews could potentially yield exciting results. Finally, we will use classification methods to identify the common topics in the tasters’ wine descriptions
The Spatial, Temporal, and Individual Dimensions of Child Maltreatment Recurrence in the United States: A Survival Analysis
The current research aims at improving the efficacy of applying data analytics methods such as survival analysis (Coeurderoy, Guilmot, & Vas, 2014) on big human data to create insights into complex social problems recurrent child maltreatment as a representative case. Social welfare agencies collect and produce vast volumes of data from various sources that can be utilized to illuminate social issues and facilitate effective solutions (Coulton, Goerge, Putnam-Hornstein, & de Haan, 2015). However, social welfare agencies face several challenges in converting data into analytical power. First, effective analysis of a massive amount of data that requires recording a large number of features associated with diverse individuals can become challenging. Furthermore, even though the models produced by data analytics are inherently predictive, taking primitive action at the individual level in most social problems is very difficult, if not impossible. Additionally, the interaction between humans and society can be highly intricate, making it difficult to determine if certain features of an individual are essential regarding the occurrence of an event. Responding to these challenges, the current study proposes three inter-connected categories of features available in most big human data sets: spatial, temporal, and individual/event-related features. These three categories of features are recognized based on two theoretical frameworks, the Routine Activities Theory (Felson & Cohen, 1980; Miró, 2014) and Fogg Behavior Model (Fogg, 2009). Supported by the results of an empirical study on an extensive, national data set of child maltreatment cases in the United States (US Department of Health and Human Services, 2017), we argue that features in each of these categories can have a strong indication of social welfare-related occurrence events. In contrast, analysis of these features—individually or jointly—can reveal spatial, temporal, and individual patterns. Therefore, the current study aims to answer the following research questions: (1) What are the spatial, temporal, and individual-related features to be considered to predict recurrent child maltreatment patterns? (2) How is the relative significance of these features when victims of different geographical locations, time frames, demographic groups, and maltreatment types are considered? Survival analysis was conducted in each of or across the victim groups above. Victims’ survival rate of recurrent child maltreatment was examined as the prediction target. Based on the results of survival analysis, the discussion was made on how to improve the predictive capability of individual features in the use of big human data. The results of our study can be helpful for both researchers and policymakers. Researchers can apply the results to improve the efficacy of data analytics methods on big human data. Researchers can also correlate the observed trends with other social and economic factors to explain and predict the target problem\u27s prevalence. Policymakers may use the resulting temporal trends to fine-tune the child welfare policies for the current and subsequent years to optimize child welfare resource allocation
Sharing Economy: Application of Structural Topic Models
The sharing economy is known as collaborative consumption or the peer-to-peer based activity of acquiring, providing, or sharing goods and services. To improve the consumer-based sharing economy, researchers study customer reviews about their experiences of provided services. Expectation-confirmation theory (ECT) suggests that customers use mental comparison standards to evaluate the real performance of provided services, which ultimately influences customer satisfaction. Recommendations made in customer reviews have emerged as a critical feature of a business-to-consumer website, however there is a lack of empirical evidence supporting their influence on customer satisfaction. In this study, we contribute to the sharing economy knowledge base by adding a consumer recommendation component to the original EC