816 research outputs found

    Statistical Methods for the Forensic Analysis of Geolocated Event Data

    Get PDF
    A common question in forensic analysis is whether two observed data sets originated from the same source or from different sources. Statistical approaches to addressing this question have been widely adopted within the forensics community, particularly for DNA evidence. Here we investigate the application of statistical approaches to same-source forensic questions for spatial event data, such as determining the likelihood that two sets of observed GPS locations were generated by the same individual. We develop two approaches to quantify the strength of evidence in this setting. The first is a likelihood ratio approach based on modeling the spatial event data directly. The second approach is to instead measure the similarity of the two observed data sets via a score function and then assess the strength of the observed score resulting in the score-based likelihood ratio. A comparative evaluation using geolocated Twitter event data from two large metropolitan areas shows the potential efficacy of such techniques

    Influence of geographic biases on geolocation prediction in Twitter

    Get PDF
    Geolocating Twitter users --- the task of identifying their home locations --- serves a wide range of community and business applications such as managing natural crises, journalism, and public health. While users can record their location on their profiles, more than 34% record fake or sarcastic locations. Twitter allows users to GPS locate their content, however, less than 1% of tweets are geotagged. Therefore, inferring user location has been an important field of investigation since 2010. This thesis investigates two of the most important factors which can affect the quality of inferring user location: (i) the influence of tweet-language; and (ii) the effectiveness of the evaluation process. Previous research observed that Twitter users writing in some languages appeared to be easier to locate than those writing in others. They speculated that the geographic coverage of a language (language bias) --- represented by the number of locations where the tweets of a specific language come from --- played an important role in determining location accuracy. So important was this role that accuracy might be largely predictable by considering language alone. In this thesis, I investigate the influence of language bias on the accuracy of geolocating Twitter users. The analysis, using a large corpus of tweets written in thirteen languages and a re-implemented state-of-the-art geolocation model back at the time, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance in the distribution of Twitter users over locations (population bias) has a greater impact on accuracy than language bias. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. The results suggest both averaging approaches should be used to effectively evaluate geolocation. Many approaches have been proposed for automatically geolocating users; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this thesis, I provide a standardized evaluation framework for geolocation systems. The framework is employed to analyze fifteen Twitter user geolocation models and two baselines in a controlled experimental setting. The models are composed of the re-implemented model and a variation of it, two locally retrained open source models and the results of eleven models submitted to a shared task. Models are evaluated using ten metrics --- out of fourteen employed in previous research --- over four geographic granularities. Rank correlations and thorough statistical analysis are used to assess the effectiveness of these metrics. The results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. For general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. A suite of statistical analysis tests is proposed, based on the employed metric, to ensure that the results are not coincidental

    Multi-class twitter data categorization and geocoding with a novel computing framework

    Get PDF
    This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labeled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets are analyzed to extract relevant transportation-related information for one week. The SVM classifier achieves \u3e 85% accuracy in identifying transportation-related tweets from structured data. To further categorize the transportation-related tweets into sub-classes: incident, congestion, construction, special events, and other events, three supervised classifiers are used: L-LDA, SVM, and L-LDA incorporated SVM. Findings from this study demonstrate that the analytical framework, which uses the L-LDA incorporated SVM, can classify roadway transportation-related data from Twitter with over 98.3% accuracy, which is significantly higher than the accuracies achieved by standalone L-LDA and SVM

    Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance

    Get PDF
    We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet’s tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions

    The behavioral response to Location Based Services: An examination of the influence of social and environmental benefits, and privacy

    Get PDF
    Given the importance tourism has in many economies, this research was designed to study how the social and environmental beneïŹts of Location Based Services (LBS) in the tourism sector inïŹ‚uence user behavior and thus contribute to sustainable development. The objective has been to study LBS as a solution that makes the deployment of tourism activities easier, more useful and improves attitudes towards it, but in a context where trust in privacy and beneïŹts-based sustainable social and environmental development are key. To achieve this, this research identiïŹes what could be the inïŹ‚uence factors in the adoption of mobile applications with Location Based Services from the point of view of the tourism sector, especially if the social and environmental beneïŹts of LBS can help improve usage behavior. We investigated the technological acceptance of LBS in tourism, using Technology Acceptance Model (TAM) as a solid model to explain its adoption. Nine hypotheses were investigatedbycarryingoutasurveyoftravelers(n=277)duringtheirvisittoSeville(Spain). Totest theconceptualmodel’shypotheses,thePartialLeastSquares(PLS)techniquewasappliedtoestimate variance-based structural equations models (SEM).The results of this study indicated that tourists are willing to accept these LBS services within a particular adoption model, where trust in privacy and social and environmental beneïŹts are paramount

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Methods and applications of social media monitoring of mental health during disasters : scoping review

    Get PDF
    Background: With the increasing frequency and magnitude of disasters internationally, there is growing research and clinical interest in the application of social media sites for disaster mental health surveillance. However, important questions remain regarding the extent to which unstructured social media data can be harnessed for clinically meaningful decision-making. Objective: This comprehensive scoping review synthesizes interdisciplinary literature with a particular focus on research methods and applications. Methods: A total of 6 health and computer science databases were searched for studies published before April 20, 2021, resulting in the identification of 47 studies. Included studies were published in peer-reviewed outlets and examined mental health during disasters or crises by using social media data. Results: Applications across 31 mental health issues were identified, which were grouped into the following three broader themes: estimating mental health burden, planning or evaluating interventions and policies, and knowledge discovery. Mental health assessments were completed by primarily using lexical dictionaries and human annotations. The analyses included a range of supervised and unsupervised machine learning, statistical modeling, and qualitative techniques. The overall reporting quality was poor, with key details such as the total number of users and data features often not being reported. Further, biases in sample selection and related limitations in generalizability were often overlooked. Conclusions: The application of social media monitoring has considerable potential for measuring mental health impacts on populations during disasters. Studies have primarily conceptualized mental health in broad terms, such as distress or negative affect, but greater focus is required on validating mental health assessments. There was little evidence for the clinical integration of social media-based disaster mental health monitoring, such as combining surveillance with social media-based interventions or developing and testing real-world disaster management tools. To address issues with study quality, a structured set of reporting guidelines is recommended to improve the methodological quality, replicability, and clinical relevance of future research on the social media monitoring of mental health during disasters. © 2022 Samantha J Teague, Adrian B R Shatte, Emmelyn Weller, Matthew Fuller-Tyszkiewicz, Delyse M Hutchinson
    • 

    corecore