1,878 research outputs found

    A Practical Guide for the Effective Evaluation of Twitter User Geolocation

    Get PDF
    Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this paper, we propose a guide for a standardized evaluation of Twitter user geolocation by analyzing fifteen models and two baselines in a controlled experimental setting. Models are evaluated using ten metrics over four geographic granularities. We use rank correlations to assess the effectiveness of these metrics. Our results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. We show that for general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Given the global geographic coverage of this task, we specifically recommend evaluation at micro versus macro levels to measure the impact of the bias in distribution over locations. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. We propose a suite of statistical analysis tests, based on the employed metric, to ensure that the results are not coincidental.Comment: Accepted in the journal of ACM Transactions on Social Computing (TSC). Extended version of the ASONAM 2018 short paper. Please cite the TSC/ASONAM version and not the arxiv versio

    Influence of geographic biases on geolocation prediction in Twitter

    Get PDF
    Geolocating Twitter users --- the task of identifying their home locations --- serves a wide range of community and business applications such as managing natural crises, journalism, and public health. While users can record their location on their profiles, more than 34% record fake or sarcastic locations. Twitter allows users to GPS locate their content, however, less than 1% of tweets are geotagged. Therefore, inferring user location has been an important field of investigation since 2010. This thesis investigates two of the most important factors which can affect the quality of inferring user location: (i) the influence of tweet-language; and (ii) the effectiveness of the evaluation process. Previous research observed that Twitter users writing in some languages appeared to be easier to locate than those writing in others. They speculated that the geographic coverage of a language (language bias) --- represented by the number of locations where the tweets of a specific language come from --- played an important role in determining location accuracy. So important was this role that accuracy might be largely predictable by considering language alone. In this thesis, I investigate the influence of language bias on the accuracy of geolocating Twitter users. The analysis, using a large corpus of tweets written in thirteen languages and a re-implemented state-of-the-art geolocation model back at the time, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance in the distribution of Twitter users over locations (population bias) has a greater impact on accuracy than language bias. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. The results suggest both averaging approaches should be used to effectively evaluate geolocation. Many approaches have been proposed for automatically geolocating users; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this thesis, I provide a standardized evaluation framework for geolocation systems. The framework is employed to analyze fifteen Twitter user geolocation models and two baselines in a controlled experimental setting. The models are composed of the re-implemented model and a variation of it, two locally retrained open source models and the results of eleven models submitted to a shared task. Models are evaluated using ten metrics --- out of fourteen employed in previous research --- over four geographic granularities. Rank correlations and thorough statistical analysis are used to assess the effectiveness of these metrics. The results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. For general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. A suite of statistical analysis tests is proposed, based on the employed metric, to ensure that the results are not coincidental

    Designing Chatbots for Crises: A Case Study Contrasting Potential and Reality

    No full text
    Chatbots are becoming ubiquitous technologies, and their popularity and adoption are rapidly spreading. The potential of chatbots in engaging people with digital services is fully recognised. However, the reputation of this technology with regards to usefulness and real impact remains rather questionable. Studies that evaluate how people perceive and utilise chatbots are generally lacking. During the last Kenyan elections, we deployed a chatbot on Facebook Messenger to help people submit reports of violence and misconduct experienced in the polling stations. Even though the chatbot was visited by more than 3,000 times, there was a clear mismatch between the users’ perception of the technology and its design. In this paper, we analyse the user interactions and content generated through this application and discuss the challenges and directions for designing more effective chatbots

    MANTRA: A Topic Modeling-Based Tool to Support Automated Trend Analysis on Unstructured Social Media Data

    Get PDF
    The early identification of new and auspicious ideas leads to competitive advantages for companies. Thereby, topic modeling can serve as an effective analytical approach for the automated investigation of trends from unstructured social media data. However, existing trend analysis tools do not meet the requirements regarding (a) Product Development, (b) Customer Behavior Analysis, and (c) Market-/Brand-Monitoring as reflected within extant literature. Thus, based on the requirements for each of these common marketing-related use cases, we derived design principles following design science research and instantiated the artifact “MANTRA” (MArketiNg TRend Analysis). We demonstrated MANTRA on a real-world data set (~1.03 million Yelp reviews) and hereby could confirm remarkable trends of vegan and global cuisine. In particular, the importance of meeting all specific requirements of the respective use cases and especially flexibly incorporating several external parameters into the trend analysis is exemplified

    Using Physical and Social Sensors in Real-Time Data Streaming for Natural Hazard Monitoring and Response

    Get PDF
    Technological breakthroughs in computing over the last few decades have resulted in important advances in natural hazards analysis. In particular, integration of a wide variety of information sources, including observations from spatially-referenced physical sensors and new social media sources, enables better estimates of real-time hazard. The main goal of this work is to utilize innovative streaming algorithms for improved real-time seismic hazard analysis by integrating different data sources and processing tools into cloud applications. In streaming algorithms, a sequence of items from physical and social sensors can be processed in as little as one pass with no need to store the data locally. Massive data volumes can be analyzed in near-real time with reasonable limits on storage space, an important advantage for natural hazard analysis. Seismic hazard maps are used by policymakers to set earthquake resistant construction standards, by insurance companies to set insurance rates and by civil engineers to estimate stability and damage potential. This research first focuses on improving probabilistic seismic hazard map production. The result is a series of maps for different frequency bands at significantly increased resolution with much lower latency time that includes a range of high-resolution sensitivity tests. Second, a method is developed for real-time earthquake intensity estimation using joint streaming analysis from physical and social sensors. Automatically calculated intensity estimates from physical sensors such as seismometers use empirical relationships between ground motion and intensity, while those from social sensors employ questionaries that evaluate ground shaking levels based on personal observations. Neither is always sufficiently precise and/or timely. Results demonstrate that joint processing can significantly reduce the response time to a damaging earthquake and estimate preliminary intensity levels during the first ten minutes after an event. The combination of social media and network sensor data, in conjunction with innovative computing algorithms, provides a new paradigm for real-time earthquake detection, facilitating rapid and inexpensive risk reduction. In particular, streaming algorithms are an efficient method that addresses three major problems in hazard estimation by improving resolution, decreasing processing latency to near real-time standards and providing more accurate results through the integration of multiple data sets

    From social networks to emergency operation centers: A semantic visualization approach

    Get PDF
    Social networks are commonly used by citizens as a communication channel for sharing their messages about a crisis situation and by emergency operation centers as a source of information for improving their situation awareness. However, to utilize this source of information, emergency operators and decision makers have to deal with large and unstructured data, the content, reliability, quality, and relevance of which may vary greatly. In this paper, to address this challenge, we propose a visual analytics solution that filters and visualizes relevant information extracted from Twitter. The tool offers multiple visualizations to provide emergency operators with different points of view for exploring the data in order to gain a better understanding of the situation and take informed courses of action. We analyzed the scope of the problem through an exploratory study in which 20 practitioners answered questions about the integration of social networks in the emergency management process. This study inspired the design of a visualization tool, which was evaluated in a controlled experiment to assess its effectiveness for exploring spatial and temporal data. During the experiment, we asked 12 participants to perform 5 tasks related to data exploration and fill a questionnaire about their experience using the tool. One of the most interesting results obtained from the evaluation concerns the effectiveness of combining several visualization techniques to support different strategies for solving a problem and making decisions.This work was supported by the project PACE grant funded by the Spanish Ministry of Economy and Competitivity [TIN2016-77690-R]

    Location-based Social Network for Cities & Neighbourhood Sustainable Development

    Get PDF
    Online Social Network (OSN) is categorized as Web 2.0 which is defined by O'Reilly in 2004, is the idea of mutually maximizing collective intelligence and added value for each participant by dynamic information sharing and creation. Current trend sbows that the next big thing in OSN is Location-based Social Networking (LBSN) which is the composite of OSN and Location-based Service (LBS). The goal of this paper is to study on Malaysian online social behaviour and to explore what are the key technologies of LBSN to support the development of neighbourhoods where residents feel a sense of connection to their local community and ability to engage in that community. Problems and opportunities identified are: I) Lack of research has been done to nnderstand Malaysian online social behavior in the context of cities & neighbourhood development, 2) Modem societies are said to lives in a condition of individualism and 3) Malaysia has strong networked community and there are a number of social Application Programming Interface (API) which provide a great opportunities for developers to create an application which can support the idea of smart, liveable and sustainable cities. The objectives of the research are: 1) To study on Malaysian social behavior in using Location-based Social Network (LBSN) , motivation for participation and pattern of use, 2) To identifY and understand key technologies of LBSN, and 3) To design an engaging LBSN which leverage on key technologies for neighbourhood and cities' sustainable development. Survey instrument is used as data collection tool to investigate the Malaysian online social behaviour and gauge their views on civil issues such as crime in their residential. Interview also is carried out to the owner of existing crime mapping system to identifY the gaps and opportunities for improvements. This research discovers that Malaysians are socially active in online community network and have strong civic conscious to make our neighbourhood works better. Govermnent shonld look forward into open data for beneficial of public. With proper neighbourhood planning, it will contribute to sustainable community which can help country's development
    corecore