98 research outputs found

    Language influences on tweeter geolocation

    Get PDF
    We investigate the influence of language on the accuracy of geolocating Twitter users. Our analysis, using a large corpus of tweets written in thirteen languages, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance has a greater impact on accuracy than geographical coverage. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. Our results suggest both averaging approaches should be used to effectively evaluate geolocation

    A Modified Boosted Ensemble Classifier on Location Based Social Networking

    Get PDF
    One of the research issues that researchers are interested in is unbalanced data classification techniques. Boosting approaches like Wang\u27s Boosting and Modified Boosted SVM (MBSVM) have been demonstrated to be more effective for unbalanced data. Our proposal The Modified Boosted Random Forest (MBRF) classifier is a Random Forest classifier that uses the Boosting approach. The main motivation of the study is to analyze sentiment of geotagged tweets understanding the state of mind of people at FIFA and Olympics datasets. Tree based model Random Forest algorithm using boosting approach classifies the tweets to build a recommendation system with an idea of providing commercial suggestions to participants, recommending local places to visit or perform activities. MBRF employs various strategies: i) a distance-based weight-update method based on K-Medoids ii) a sign-based classifier elimination technique. We have equally partitioned the datasets as 70% of data allocated for training and the remaining 30% data as test data. Our imbalanced data ratio measured 3.1666 and 4.6 for FIFA and Olympics datasets. We looked at accuracy, precision, recall and ROC curves for each event. The average AUC achieved by MBRF on FIFA dataset is 0.96 and Olympics is 0.97. A comparison of MBRF and Decision tree model using \u27Entropy\u27 proved MBRF better

    Where are my followers? Understanding the Locality Effect in Twitter

    Full text link
    Twitter is one of the most used applications in the current Internet with more than 200M accounts created so far. As other large-scale systems Twitter can obtain enefit by exploiting the Locality effect existing among its users. In this paper we perform the first comprehensive study of the Locality effect of Twitter. For this purpose we have collected the geographical location of around 1M Twitter users and 16M of their followers. Our results demonstrate that language and cultural characteristics determine the level of Locality expected for different countries. Those countries with a different language than English such as Brazil typically show a high intra-country Locality whereas those others where English is official or co-official language suffer from an external Locality effect. This is, their users have a larger number of followers in US than within their same country. This is produced by two reasons: first, US is the dominant country in Twitter counting with around half of the users, and second, these countries share a common language and cultural characteristics with US

    Report on the Information Retrieval Festival (IRFest2017)

    Get PDF
    The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017

    Geospatial data analysis in Russia’s geoweb

    Get PDF
    The chapter examines the role of geospatial data in Russia’s online ecosystem. Facilitated by the rise of geographic information systems and user-generated content, the distribution of geospatial data has blurred the line between physical spaces and their virtual representations. The chapter discusses different sources of these data available for Digital Russian Studies (e.g., social data and crowdsourced databases) together with the novel techniques for extracting geolocation from various data formats (e.g., textual documents and images). It also scrutinizes different ways of using these data, varying from mapping the spatial distribution of social and political phenomena to investigating the use of geotag data for cultural practices’ digitization to exploring the use of geoweb for narrating individual and collective identities online

    Analysis of Twitter activity by country during competitive events

    Get PDF
    The subject of this work is an attempt to analyze data. Due to its popularity, availability, and the ability to comment on public events, it was decided to use Twitter data as data for analysis. Data obtained in this way are particularly interesting because of the large social and age cross-section that can be observed among Twitter users, so they give a fairly good statistical sample for a given location. The aim was to create a solution to present the number of tweets posted by users from different countries during media events. Geolocation is usually not included in the tweets, so it was decided to work on the location given by the user [1]. The main tools used in the work are the Twitter API, the non-relational database management system MongoDB and the Python high-level programming language. It was also necessary to use external databases to convert location to coordinates. Thanks to the use of solutions presented in the work, it is possible to present the activity of twitter users from different countries through different visualization methods. In order to analyze the data in the best way, it was decided to use the tweets chart in time, maps showing the total intensity of tweets for the 20 most popular locations for the entire event, and a movie based on maps in minute intervals. It was important for the results to be legible and visually appealing, understandable also for people not related to the project. For the analysis, it was decided to choose popular events on an international scale, the course of which will be possible to reproduce after a while during which it will be possible to observe certain culminating points. During the tests, it was checked how the project deals with two events, the Eurovision Song Contest and the UEFA Champions League match between Real Madrid and Paris Saint Germain, the analysis of the data obtained was also carried out and attempts were made to extract conclusions from them. Thanks to the proposed methods, it was possible to obtain very interesting results, among others, to observe how the culmination such as scoring a goal in a football match influences the activity of users on Twitter and for which nations the given event is the most important

    Influence of geographic biases on geolocation prediction in Twitter

    Get PDF
    Geolocating Twitter users --- the task of identifying their home locations --- serves a wide range of community and business applications such as managing natural crises, journalism, and public health. While users can record their location on their profiles, more than 34% record fake or sarcastic locations. Twitter allows users to GPS locate their content, however, less than 1% of tweets are geotagged. Therefore, inferring user location has been an important field of investigation since 2010. This thesis investigates two of the most important factors which can affect the quality of inferring user location: (i) the influence of tweet-language; and (ii) the effectiveness of the evaluation process. Previous research observed that Twitter users writing in some languages appeared to be easier to locate than those writing in others. They speculated that the geographic coverage of a language (language bias) --- represented by the number of locations where the tweets of a specific language come from --- played an important role in determining location accuracy. So important was this role that accuracy might be largely predictable by considering language alone. In this thesis, I investigate the influence of language bias on the accuracy of geolocating Twitter users. The analysis, using a large corpus of tweets written in thirteen languages and a re-implemented state-of-the-art geolocation model back at the time, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance in the distribution of Twitter users over locations (population bias) has a greater impact on accuracy than language bias. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. The results suggest both averaging approaches should be used to effectively evaluate geolocation. Many approaches have been proposed for automatically geolocating users; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this thesis, I provide a standardized evaluation framework for geolocation systems. The framework is employed to analyze fifteen Twitter user geolocation models and two baselines in a controlled experimental setting. The models are composed of the re-implemented model and a variation of it, two locally retrained open source models and the results of eleven models submitted to a shared task. Models are evaluated using ten metrics --- out of fourteen employed in previous research --- over four geographic granularities. Rank correlations and thorough statistical analysis are used to assess the effectiveness of these metrics. The results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. For general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. A suite of statistical analysis tests is proposed, based on the employed metric, to ensure that the results are not coincidental

    Mobagogy- mobile learning for a higher education community

    Full text link
    This paper reports on a project in which a learning community of higher educators was formed to investigate how best to use mobile technologies in their own learning and teaching. Activities of this group included investigating best practice approaches by interviewing experts in the field, exploring the literature on mobile learning and then initiating and testing some mobile learning pedagogies in the context of their own higher education subjects. The community met regularly to discuss emerging issues and applications. The paper shares some of the findings gained both from the expert interviews and from the experiences of members of the community, and discusses the challenges and constraints that were experienced. We conclude with recommendations for promoting mobile learning communities in higher education. © 2010 IADIS

    Beyond the culture effect on credibility perception on microblogs

    Get PDF
    We investigated the credibility perception of tweet readers from the USA and by readers from eight Arabic countries; our aim was to understand if credibility was affected by country and/or by culture. Results from a crowd-sourcing experiment, showed a wide variety of factors affected credibility perception, including a tweet author's gender, profile image, username style, location, and social network overlap with the reader. We found that culture determines readers' credibility perception, but country has no effect. We discuss the implications of our findings for user interface design and social media systems

    A Practical Guide for the Effective Evaluation of Twitter User Geolocation

    Get PDF
    Geolocating Twitter users---the task of identifying their home locations---serves a wide range of community and business applications such as managing natural crises, journalism, and public health. Many approaches have been proposed for automatically geolocating users based on their tweets; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this paper, we propose a guide for a standardized evaluation of Twitter user geolocation by analyzing fifteen models and two baselines in a controlled experimental setting. Models are evaluated using ten metrics over four geographic granularities. We use rank correlations to assess the effectiveness of these metrics. Our results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. We show that for general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Given the global geographic coverage of this task, we specifically recommend evaluation at micro versus macro levels to measure the impact of the bias in distribution over locations. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. We propose a suite of statistical analysis tests, based on the employed metric, to ensure that the results are not coincidental.Comment: Accepted in the journal of ACM Transactions on Social Computing (TSC). Extended version of the ASONAM 2018 short paper. Please cite the TSC/ASONAM version and not the arxiv versio
    corecore