7 research outputs found

    Authenticity of Geo-Location and Place Name in Tweets

    Get PDF
    The place name and geo-coordinates of tweets are supposed to represent the possible location of the user at the time of posting that tweet. However, our analysis over a large collection of tweets indicates that these fields may not give the correct location of the user at the time of posting that tweet. Our investigation reveals that the tweets posted through third party applications such as Instagram or Swarmapp contain the geo-coordinate of the user specified location, not his current location. Any place name can be entered by a user to be displayed on a tweet. It may not be same as his/her exact location. Our analysis revealed that around 12% of tweets contains place names which are different from their real location. The findings of this research can be used as caution while designing location-based services using social media

    Rumour Veracity Estimation with Deep Learning for Twitter

    Get PDF
    Part 4: Security, Privacy, Ethics and MisinformationInternational audienceTwitter has become a fertile ground for rumours as information can propagate to too many people in very short time. Rumours can create panic in public and hence timely detection and blocking of rumour information is urgently required. We proposed and compare machine learning classifiers with a deep learning model using Recurrent Neural Networks for classification of tweets into rumour and non-rumour classes. A total thirteen features based on tweet text and user characteristics were given as input to machine learning classifiers. Deep learning model was trained and tested with textual features and five user characteristic features. The findings indicate that our models perform much better than machine learning based models

    An intelligent auto-response short message service categorization model using semantic index

    Get PDF
    Short message service (SMS) is one of the quickest and easiest ways used for communication, used by businesses, government organizations, and banks to send short messages to large groups of people. Categorization of SMS under different message types in their inboxes will provide a concise view for receivers. Former studies on the said problem are at the binary level as ham or spam which triggered the masking of specific messages that were useful to the end user but were treated as spam. Further, it is extended with multi labels such as ham, spam, and others which is not sufficient to meet all the necessities of end users. Hence, a multi-class SMS categorization is needed based on the semantics (information) embedded in it. This paper introduces an intelligent auto-response model using a semantic index for categorizing SMS messages into 5 categories: ham, spam, info, transactions, and one time password’s, using the multi-layer perceptron (MLP) algorithm. In this approach, each SMS is classified into one of the predefined categories. This experiment was conducted on the “multi-class SMS dataset” with 7,398 messages, which are differentiated into 5 classes. The accuracy obtained from the experiment was 97%

    A deep multi-modal neural network for informative Twitter content classification during emergencies

    Get PDF
    YesPeople start posting tweets containing texts, images, and videos as soon as a disaster hits an area. The analysis of these disaster-related tweet texts, images, and videos can help humanitarian response organizations in better decision-making and prioritizing their tasks. Finding the informative contents which can help in decision making out of the massive volume of Twitter content is a difficult task and require a system to filter out the informative contents. In this paper, we present a multi-modal approach to identify disaster-related informative content from the Twitter streams using text and images together. Our approach is based on long-short-term-memory (LSTM) and VGG-16 networks that show significant improvement in the performance, as evident from the validation result on seven different disaster-related datasets. The range of F1-score varied from 0.74 to 0.93 when tweet texts and images used together, whereas, in the case of only tweet text, it varies from 0.61 to 0.92. From this result, it is evident that the proposed multi-modal system is performing significantly well in identifying disaster-related informative social media contents

    Helve'tweet: exploration d'un million de tweets géolocalisés en Suisse, février-août 2017

    Get PDF
    Réseau social utilisé activement par 8% de la population suisse, Twitter permet à ses utilisateurs de géolocaliser leurs messages. Cette étude exploratoire quantitative, basée sur des messages géolocalisés en Suisse écrits entre le 18 février et le 31 août 2017, fait suite au projet GEoTweet consacré aux tweets genevois en 2014-2015. Elle se propose de répondre à trois questions de recherche pour évaluer les possibilités et les limites de l’utilisation des données fournies par l’API de Twitter lors des recherches sur la Suisse, dans les domaines de la sociologie des données et des sciences de l’information. Le focus est porté plus spécifiquement sur l’exploitation des données de géolocalisation, sur la problématique de l’identification des langues et sur les critères définissant un tweet suisse dans une perspective d’archivage. Après l’introduction et la revue de littérature, le rapport présente la méthodologie utilisée, les biais identifiés et les outils créés pour les mesurer, les éviter ou du moins les minimiser. Une concordance a ainsi été créée entre les place.id de Twitter et la liste officielle des communes suisses pour pallier au caractère non vérifié (en partie obsolètes, en partie erronées) des données géographiques fournies par Twitter. Trois séries de tests ont également été menés pour vérifier la fiabilité de l’algorithme de reconnaissance de langue de Twitter pour l’échantillon. Ils montrent une marge d’erreur de 4,25% sur les grandes langues européennes, mais qui peut monter jusqu’à 92% pour une langue « exotique » comme l’indonésien. Les analyses des tweets et des twittos ont permis de dégager des résultats importants. D’une part, elles montrent les fortes variations de leur nombre et de leur diversité linguistique à travers l’espace et le temps (p.ex. plus de comptes actifs en Suisse alémanique, mais plus de tweets en français dans l’ensemble ; plus de tweets pendant les périodes de vacances, mais baisse de la proportion des tweets et des twittos en langues nationales et en anglais). D’autre part la durée et l’étendue géographique de leur activité sont très variables (p.ex. 82% des comptes avec moins de 10 tweets, 68% actifs pendant un seul mois et 71% dans un seul canton). Des hypothèses ont été formulées et vérifiées pour expliquer ces résultats qui relèvent de la propension élevée des germanophones à twitter en anglais et de l’effet positif des loisirs sur l’envie et l’opportunité de twitter avec géolocalisation. Dans la dernière partie, l’étude propose des pistes afin d’établir des critères pour reconnaître un tweet suisse, en se basant sur les analyses menées préalablement ainsi que sur les expériences menées dans d’autres pays du monde. Le contexte international et suisse de l’archivage des tweets est abordé, sans prétention de vouloir proposer une méthode, au vu de la complexité des enjeux sociologiques, techniques et légaux

    Location reference recognition from texts: A survey and comparison

    Get PDF
    A vast amount of location information exists in unstructured texts, such as social media posts, news stories, scientific articles, web pages, travel blogs, and historical archives. Geoparsing refers to the process of recognizing location references from texts and identifying their geospatial representations. While geoparsing can benefit many domains, a summary of the specific applications is still missing. Further, there lacks a comprehensive review and comparison of existing approaches for location reference recognition, which is the first and a core step of geoparsing. To fill these research gaps, this review first summarizes seven typical application domains of geoparsing: geographic information retrieval, disaster management, disease surveillance, traffic management, spatial humanities, tourism management, and crime management. We then review existing approaches for location reference recognition by categorizing these approaches into four groups based on their underlying functional principle: rule-based, gazetteer matching-based, statistical learning-based, and hybrid approaches. Next, we thoroughly evaluate the correctness and computational efficiency of the 27 most widely used approaches for location reference recognition based on 26 public datasets with different types of texts (e.g., social media posts and news stories) containing 39,736 location references across the world. Results from this thorough evaluation can help inform future methodological developments for location reference recognition, and can help guide the selection of proper approaches based on application needs
    corecore