Search CORE

816 research outputs found

Statistical Methods for the Forensic Analysis of Geolocated Event Data

Author: Galbraith Christopher
Smyth Padhraic
Stern Hal S.
Publication venue: Iowa State University Digital Repository
Publication date: 01/06/2020
Field of study

A common question in forensic analysis is whether two observed data sets originated from the same source or from different sources. Statistical approaches to addressing this question have been widely adopted within the forensics community, particularly for DNA evidence. Here we investigate the application of statistical approaches to same-source forensic questions for spatial event data, such as determining the likelihood that two sets of observed GPS locations were generated by the same individual. We develop two approaches to quantify the strength of evidence in this setting. The first is a likelihood ratio approach based on modeling the spatial event data directly. The second approach is to instead measure the similarity of the two observed data sets via a score function and then assess the strength of the observed score resulting in the score-based likelihood ratio. A comparative evaluation using geolocated Twitter event data from two large metropolitan areas shows the potential efficacy of such techniques

Digital Repository @ Iowa State University (ISU)

Influence of geographic biases on geolocation prediction in Twitter

Author: Mourad A
Publication venue: RMIT University
Publication date
Field of study

Geolocating Twitter users --- the task of identifying their home locations --- serves a wide range of community and business applications such as managing natural crises, journalism, and public health. While users can record their location on their profiles, more than 34% record fake or sarcastic locations. Twitter allows users to GPS locate their content, however, less than 1% of tweets are geotagged. Therefore, inferring user location has been an important field of investigation since 2010. This thesis investigates two of the most important factors which can affect the quality of inferring user location: (i) the influence of tweet-language; and (ii) the effectiveness of the evaluation process. Previous research observed that Twitter users writing in some languages appeared to be easier to locate than those writing in others. They speculated that the geographic coverage of a language (language bias) --- represented by the number of locations where the tweets of a specific language come from --- played an important role in determining location accuracy. So important was this role that accuracy might be largely predictable by considering language alone. In this thesis, I investigate the influence of language bias on the accuracy of geolocating Twitter users. The analysis, using a large corpus of tweets written in thirteen languages and a re-implemented state-of-the-art geolocation model back at the time, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance in the distribution of Twitter users over locations (population bias) has a greater impact on accuracy than language bias. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. The results suggest both averaging approaches should be used to effectively evaluate geolocation. Many approaches have been proposed for automatically geolocating users; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this thesis, I provide a standardized evaluation framework for geolocation systems. The framework is employed to analyze fifteen Twitter user geolocation models and two baselines in a controlled experimental setting. The models are composed of the re-implemented model and a variation of it, two locally retrained open source models and the results of eleven models submitted to a shared task. Models are evaluated using ten metrics --- out of fourteen employed in previous research --- over four geographic granularities. Rank correlations and thorough statistical analysis are used to assess the effectiveness of these metrics. The results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. For general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. A suite of statistical analysis tests is proposed, based on the employed metric, to ensure that the results are not coincidental

RMIT Research Repository

Multi-class twitter data categorization and geocoding with a novel computing framework

Author: Apon Amy
Chowdhury Mashrur
Khan Sakib Mahmud
Ngo Linh B.
Publication venue: Digital Commons @ West Chester University
Publication date: 28/08/2019
Field of study

This study details the progress in transportation data analysis with a novel computing framework in keeping with the continuous evolution of the computing technology. The computing framework combines the Labeled Latent Dirichlet Allocation (L-LDA)-incorporated Support Vector Machine (SVM) classifier with the supporting computing strategy on publicly available Twitter data in determining transportation-related events to provide reliable information to travelers. The analytical approach includes analyzing tweets using text classification and geocoding locations based on string similarity. A case study conducted for the New York City and its surrounding areas demonstrates the feasibility of the analytical approach. Approximately 700,010 tweets are analyzed to extract relevant transportation-related information for one week. The SVM classifier achieves \u3e 85% accuracy in identifying transportation-related tweets from structured data. To further categorize the transportation-related tweets into sub-classes: incident, congestion, construction, special events, and other events, three supervised classifiers are used: L-LDA, SVM, and L-LDA incorporated SVM. Findings from this study demonstrate that the analytical framework, which uses the L-LDA incorporated SVM, can classify roadway transportation-related data from Twitter with over 98.3% accuracy, which is significantly higher than the accuracies achieved by standalone L-LDA and SVM

arXiv.org e-Print Archive

Digital Commons @ West Chester University

Twitter mining using semi-supervised classification for relevance filtering in syndromic surveillance

Author: De La Iglesia Beatriz
Edeghere Obaghe
Edo-Osagie Oduwa
Lake Iain
Smith Gillian
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/07/2019
Field of study

We investigate the use of Twitter data to deliver signals for syndromic surveillance in order to assess its ability to augment existing syndromic surveillance efforts and give a better understanding of symptomatic people who do not seek healthcare advice directly. We focus on a specific syndrome—asthma/difficulty breathing. We outline data collection using the Twitter streaming API as well as analysis and pre-processing of the collected data. Even with keyword-based data collection, many of the tweets collected are not be relevant because they represent chatter, or talk of awareness instead of an individual suffering a particular condition. In light of this, we set out to identify relevant tweets to collect a strong and reliable signal. For this, we investigate text classification techniques, and in particular we focus on semi-supervised classification techniques since they enable us to use more of the Twitter data collected while only doing very minimal labelling. In this paper, we propose a semi-supervised approach to symptomatic tweet classification and relevance filtering. We also propose alternative techniques to popular deep learning approaches. Additionally, we highlight the use of emojis and other special features capturing the tweet’s tone to improve the classification performance. Our results show that negative emojis and those that denote laughter provide the best classification performance in conjunction with a simple word-level n-gram approach. We obtain good performance in classifying symptomatic tweets with both supervised and semi-supervised algorithms and found that the proposed semi-supervised algorithms preserve more of the relevant tweets and may be advantageous in the context of a weak signal. Finally, we found some correlation (r = 0.414, p = 0.0004) between the Twitter signal generated with the semi-supervised system and data from consultations for related health conditions

University of Liverpool Repository

University of East Anglia digital repository

The behavioral response to Location Based Services: An examination of the influence of social and environmental benefits, and privacy

Author: Campón Cerro Ana María
Hernández Mogollón Jose Manuel
Palos Sánchez Pedro Ramiro
Publication venue: 'MDPI AG'
Publication date: 01/01/2017
Field of study

Given the importance tourism has in many economies, this research was designed to study how the social and environmental beneﬁts of Location Based Services (LBS) in the tourism sector inﬂuence user behavior and thus contribute to sustainable development. The objective has been to study LBS as a solution that makes the deployment of tourism activities easier, more useful and improves attitudes towards it, but in a context where trust in privacy and beneﬁts-based sustainable social and environmental development are key. To achieve this, this research identiﬁes what could be the inﬂuence factors in the adoption of mobile applications with Location Based Services from the point of view of the tourism sector, especially if the social and environmental beneﬁts of LBS can help improve usage behavior. We investigated the technological acceptance of LBS in tourism, using Technology Acceptance Model (TAM) as a solid model to explain its adoption. Nine hypotheses were investigatedbycarryingoutasurveyoftravelers(n=277)duringtheirvisittoSeville(Spain). Totest theconceptualmodel’shypotheses,thePartialLeastSquares(PLS)techniquewasappliedtoestimate variance-based structural equations models (SEM).The results of this study indicated that tourists are willing to accept these LBS services within a particular adoption model, where trust in privacy and social and environmental beneﬁts are paramount

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

idUS. Depósito de Investigación Universidad de Sevilla

Dehesa. Repositorio Institucional de la Universidad de Extremadura

Recommended from our members

Where are you talking about? Advances and Challenges of Geographic Analysis of Text with Application to Disease Monitoring

Author: Gritta Milan
Publication venue: University of Cambridge
Publication date: 16/07/2019
Field of study

The Natural Language Processing task we focus on in this thesis is Geoparsing. Geoparsing is the process of extraction and grounding of toponyms (place names). Consider this sentence: "The victims of the Spanish earthquake off the coast of Malaga were of American and Mexican origin." Four toponyms will be extracted (called Geotagging) and grounded to their geographic coordinates (called Toponym Resolution). However, our research goes further than any previous work by showing how to distinguish the literal place(s) of the event (Spain, Malaga) from other linguistic types/uses such as nationalities (Mexican, American), improving downstream task accuracy. We consolidate and extend the Standard Evaluation Framework, discuss key research problems, then present concrete solutions in order to advance each stage of geoparsing. For geotagging, as well as training a SOTA neural Location-NER tagger, we simplify Metonymy Resolution with a novel minimalist feature extraction combined with an LSTM-based classifier, matching SOTA results. For toponym resolution, we deploy the latest deep learning methods to achieve SOTA performance by augmenting neural models with hitherto unused geographic features called Map Vectors. With each research project, we provide high-quality datasets and system prototypes, further building resources in this field. We then show how these geoparsing advances coupled with our proposed Intra-Document Analysis can be used to associate news articles with locations in order to monitor the spread of public health threats. To this end, we evaluate our research contributions with production data from a real-time downstream application to improve geolocation of news events for disease monitoring. The data was made available to us by the Joint Research Centre (JRC), which operates one such system called MediSys that processes incoming news articles in order to monitor threats to public health and make these available to a variety of governmental, business and non-profit organisations. We also discuss steps towards an end-to-end, automated news monitoring system and make actionable recommendations for future work. In summary, the thesis aims are twofold: (1) Generate original geoparsing research aimed at advancing each stage of the pipeline by addressing pertinent challenges with concrete solutions and actionable proposals. (2) Demonstrate how this research can be applied to news event monitoring to increase the efficacy of existing biosurveillance systems, e.g. European Commission’s MediSys.I was generously funded by DREAM CDT, which was funded by NERC of UKRI

Apollo (Cambridge)

State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

Author: Jamie Bartlett
Louis Reynolds
Publication venue: Demos
Publication date
Field of study

Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

Analysis and Policy Observatory (APO)

Methods and applications of social media monitoring of mental health during disasters : scoping review

Author: Fuller-Tyszkiewicz Matthew
Hutchinson Delyse
Shatte Adrian
Teague Samantha
Weller Emmelyn
Publication venue: 'JMIR Publications Inc.'
Publication date: 01/01/2022
Field of study

Background: With the increasing frequency and magnitude of disasters internationally, there is growing research and clinical interest in the application of social media sites for disaster mental health surveillance. However, important questions remain regarding the extent to which unstructured social media data can be harnessed for clinically meaningful decision-making. Objective: This comprehensive scoping review synthesizes interdisciplinary literature with a particular focus on research methods and applications. Methods: A total of 6 health and computer science databases were searched for studies published before April 20, 2021, resulting in the identification of 47 studies. Included studies were published in peer-reviewed outlets and examined mental health during disasters or crises by using social media data. Results: Applications across 31 mental health issues were identified, which were grouped into the following three broader themes: estimating mental health burden, planning or evaluating interventions and policies, and knowledge discovery. Mental health assessments were completed by primarily using lexical dictionaries and human annotations. The analyses included a range of supervised and unsupervised machine learning, statistical modeling, and qualitative techniques. The overall reporting quality was poor, with key details such as the total number of users and data features often not being reported. Further, biases in sample selection and related limitations in generalizability were often overlooked. Conclusions: The application of social media monitoring has considerable potential for measuring mental health impacts on populations during disasters. Studies have primarily conceptualized mental health in broad terms, such as distress or negative affect, but greater focus is required on validating mental health assessments. There was little evidence for the clinical integration of social media-based disaster mental health monitoring, such as combining surveillance with social media-based interventions or developing and testing real-world disaster management tools. To address issues with study quality, a structured set of reporting guidelines is recommended to improve the methodological quality, replicability, and clinical relevance of future research on the social media monitoring of mental health during disasters. © 2022 Samantha J Teague, Adrian B R Shatte, Emmelyn Weller, Matthew Fuller-Tyszkiewicz, Delyse M Hutchinson

Federation ResearchOnline