1,065 research outputs found

    Towards Real-Time, Country-Level Location Classification of Worldwide Tweets

    Get PDF
    In contrast to much previous work that has focused on location classification of tweets restricted to a specific country, here we undertake the task in a broader context by classifying global tweets at the country level, which is so far unexplored in a real-time scenario. We analyse the extent to which a tweet's country of origin can be determined by making use of eight tweet-inherent features for classification. Furthermore, we use two datasets, collected a year apart from each other, to analyse the extent to which a model trained from historical tweets can still be leveraged for classification of new tweets. With classification experiments on all 217 countries in our datasets, as well as on the top 25 countries, we offer some insights into the best use of tweet-inherent features for an accurate country-level classification of tweets. We find that the use of a single feature, such as the use of tweet content alone -- the most widely used feature in previous work -- leaves much to be desired. Choosing an appropriate combination of both tweet content and metadata can actually lead to substantial improvements of between 20\% and 50\%. We observe that tweet content, the user's self-reported location and the user's real name, all of which are inherent in a tweet and available in a real-time scenario, are particularly useful to determine the country of origin. We also experiment on the applicability of a model trained on historical tweets to classify new tweets, finding that the choice of a particular combination of features whose utility does not fade over time can actually lead to comparable performance, avoiding the need to retrain. However, the difficulty of achieving accurate classification increases slightly for countries with multiple commonalities, especially for English and Spanish speaking countries.Comment: Accepted for publication in IEEE Transactions on Knowledge and Data Engineering (IEEE TKDE

    Twitter user geolocation using web country noun searches

    Get PDF
    Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the anonymous reviewers for their helpful suggestions

    Validation of an indirect data collection method to assess airport pavement condition

    Get PDF
    The authors acknowledge University of Beira Interior, CERIS - Civil Engineering Research and Innovation for Sustainability (ECI/04625), GEOBIOTEC - GeoBioSciences, GeoTechnologies and GeoEngineering (GEO/04035) and ASA SA, Cape Verde Airports and Air Safety for support and funding of this study.In this study the authors compare two methods for airport asphalt pavement distress data collection applied on the main runway of AmĂ­lcar Cabral international airport, located at Sal Island in Cape Verde. The two methods used for testing were traditional visual inspection (on-foot) and an indirect method using a vehicle equipped with image capture and recording, lasers and geolocation devices (in-vehicle inspection). The aim of this research is to contribute to the validation of the proposed low-cost in-vehicle pavement distress inspection system with semiautomatic data processing in order to be considered in the implementation of the pavement condition assessment component of airport pavement management systems (APMS). This is a particularly important component as from the collected distress data it is possible to assess the condition of the pavements and define intervention strategies. Validation of the indirect data collection method is evaluated by statistical comparison of the collected distress data and pavement condition index (PCI) obtained from both methods. Statistically non-significant differences between the result sets validate the proposed indirect method, however the analysis evidenced two aspects that need improvement in the proposed system, namely the quality of the captured images to identify distresses with lower severity level and inspector training for proper allocation of severity levels during image analysis. This results in significant advantages considering that the total amount of the runway pavement area is inspected. Inspection time is reduced and data collection cost can be reduced. Processing and results visualization on GIS environment allows revaluation of the dataset on the in-vehicle method. Data interpretation and measurements quality control becomes simpler and faster.info:eu-repo/semantics/publishedVersio

    OSINT from a UK perspective: considerations from the law enforcement and military domains

    Get PDF
    Both law enforcement and the military have incorporated the use of open source intelligence (OSINT) into their daily operations. Whilst there are observable similarities in how these organisations employ OSINT there are also differences between military and policing approaches towards the understanding of open source information and the goals for the intelligence gathered from it. In particular, we focus on evaluating potential similarities and differences between understandings and approaches of operational OSINT between British law enforcement agencies and UK based MoD researchers and investigators. These observations are gathered towards the aim of increasing interoperability as well as creating opportunities for specific strengths and competencies of particular organisational approaches to be shared and utilised by both the military and law enforcement

    Managing Quality of Crowdsourced Data

    Get PDF
    The Web is the central medium for discovering knowledge via various sources such as blogs, social media, and wikis. It facilitates access to contents provided by a large number of users, regardless of their geographical locations or cultural backgrounds. Such user-generated content is often referred to as \emph{crowdsourced data}, which provides informational benefit in terms of variety and scale. Yet, the quality of the crowdsourced data is hard to manage, due to the inherent uncertainty and heterogeneity of the Web. In this proposal, we summarize prior work on crowdsourced data that studies quality dimensions and techniques to assess data quality. However, they often lack mechanisms to collect data with high quality guarantee and to improve data quality. To overcome such limitations, we propose a research direction that emphasises on (1) guaranteeing the data quality at collection time, and (2) using expert knowledge to improve data quality for the cases where data is already collected
    • …
    corecore