6 research outputs found

    Twitter user geolocation using web country noun searches

    Get PDF
    Several Web and social media analytics require user geolocation data. Although Twitter is a powerful source for social media analytics, its user geolocation is a nontrivial task. This paper presents a purely word distribution method for Twitter user country geolocation. In particular, we focus on the frequencies of tweet nouns and their statistical matches with Google Trends world country distributions (GTN method). Several experiments were conducted, using a recently created dataset of 744,830 tweets produced by 3298 users from 54 countries and written in 48 languages. Overall, the proposed GTN approach is competitive when compared with a state-of-the-art world distribution geolocation method. To reduce the number of Google Trends queries, we also tested a machine learning variant (GTN2) that is capable of matching the GTN responses with an 80% accuracy while being much faster than GTN.Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia. The work of P. Cortez was supported by FCT - Fundacao para a Ciencia e Tecnologia within the Project Scope UID/CEC/00319/2019. We would also like to thank the anonymous reviewers for their helpful suggestions

    Comparing Methods to Retrieve Tweets: a Sentiment Approach

    Full text link
    [EN] In current times Internet and social media have become almost unavoidabletools to support research and decision making processes in various fields.Nevertheless, the collection and use of data retrieved from these types ofsources pose different challenges. In a previous paper we compared theefficiency of three alternative methods used to retrieve geolocated tweets overan entire country (United Kingdom). One method resulted as the bestcompromise in terms of both the effort needed to set it and quantity/quality ofdata collected. In this work we further check, in term of content, whether thethree compared methods are able to produce “similar information”. Inparticular, we aim at checking whether there are differences in the level ofsentiment estimated using tweets coming from the three methods. In doing so,we take into account both a cross-section and a longitudinal perspective. Ourresults confirm that our current best option does not show any significantdifference in the sentiment, producing globally scores in between the scoresobtained using the two alternative methods. Thus, such a flexible and reliablemethod can be implemented in the data collection of geolocated tweets in othercountries and for other studies based on the sentiment analysis.Schlosser, S.; Toninelli, D.; Cameletti, M. (2020). Comparing Methods to Retrieve Tweets: a Sentiment Approach. Editorial Universitat Politècnica de València. 299-306. https://doi.org/10.4995/CARMA2020.2020.11653OCS29930

    A space-time model for analyzing contagious people based on geolocation data using inverse graphs

    Get PDF
    Los dispositivos móviles nos proporcionan una importante fuente de datos que capturan los movimientos espaciales de los individuos y nos permiten derivar patrones generales de movilidad para una población a lo largo del tiempo. En este artículo, presentamos una base matemática que nos permite armonizar datos de geolocalización móvil utilizando geometría diferencial y teoría de grafos para identificar patrones de comportamiento espacial. En particular, nos centramos en modelos programados utilizando Sistemas de Álgebra Informática y basados en un modelo espacio-temporal que permite describir los patrones de contagio a través de patrones de movimiento espacial. Además, mostramos cómo se puede utilizar el enfoque para desarrollar algoritmos para encontrar el "paciente cero" o, respectivamente, para identificar la selección de candidatos que tienen más probabilidades de ser contagiosos

    How Fair Is IS Research?

    Full text link
    While both information systems and machine learning are not neutral, the identification of discrimination is more difficult if a system learns from data and discrimination can be introduced at several stages. Therefore, this article investigates if IS Research has taken up with this topic. A literature analysis is conducted and its discussion shows that technology, organization, and human aspects have to be considered, making it a topic not only for data scientist or computer scientist, but for information systems researchers as well

    Twitter alloy steel disambiguation and user relevance via one-class and two-class news titles classifiers

    Get PDF
    This paper addresses the nontrivial task of Twitter financial disam- biguation (TFD), which is relevant to filter financial domain tweets (e.g., alloy steel or coffee prices) when no unique identifiers (e.g., cashtags) are adopted. To automate TFD, we propose a transfer learning approach that uses freely labeled news titles to train diverse one-class and two-class classification methods. These include different text handling transforms, adaptations of statistical measures and modern machine learning methods, including support vector machines (SVM), deep autoencoders and multilayer perceptrons. As a case study, we analyzed the domain of alloy steel prices, collecting a recent Twitter dataset. Overall, the best results were achieved by a two-class SVM fed with TFD statistical measures and topic model features, obtaining an 80% and 71% discrimination level when tested with 11,081 and 3,000 manually labeled tweets. The best one-class performance (78% and 69% for the same test tweets) was obtained by a term frequency-inverse document frequency classifier (TF-IDFC). These models were further used to gen- erate a Financial User Relevance rank (FUR) score, aiming to filter relevant users. The SVM and TF-IDFC FUR models obtained a predictive user discrimination level of 80% and 75% when tested with a manually labeled test sample of 418 users. These results confirm the proposed joint TFD-FUR approach as a valuable tool for the selection of Twitter texts and users for financial social media analytics (e.g., sentiment analysis, detection of influential users).Research carried out with the support of resources of Big and Open Data Innovation Laboratory (BODaI-Lab), University of Brescia, granted by Fondazione Cariplo and Regione Lombardia

    3rd International Conference on Advanced Research Methods and Analytics (CARMA 2020)

    Full text link
    Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information.As these sources, methods, and applications become more interdisciplinary, the 3rd International Conference on Advanced Research Methods and Analytics (CARMA) is an excellent forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges.Doménech I De Soria, J.; Vicente Cuervo, MR. (2020). 3rd International Conference on Advanced Research Methods and Analytics (CARMA 2020). Editorial Universitat Politècnica de València. http://hdl.handle.net/10251/149510EDITORIA
    corecore