98 research outputs found
Language influences on tweeter geolocation
We investigate the influence of language on the accuracy of geolocating Twitter users. Our analysis, using a large corpus of tweets written in thirteen languages, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance has a greater impact on accuracy than geographical coverage. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. Our results suggest both averaging approaches should be used to effectively evaluate geolocation
A Modified Boosted Ensemble Classifier on Location Based Social Networking
One of the research issues that researchers are interested in is unbalanced data classification techniques. Boosting approaches like Wang\u27s Boosting and Modified Boosted SVM (MBSVM) have been demonstrated to be more effective for unbalanced data. Our proposal The Modified Boosted Random Forest (MBRF) classifier is a Random Forest classifier that uses the Boosting approach. The main motivation of the study is to analyze sentiment of geotagged tweets understanding the state of mind of people at FIFA and Olympics datasets. Tree based model Random Forest algorithm using boosting approach classifies the tweets to build a recommendation system with an idea of providing commercial suggestions to participants, recommending local places to visit or perform activities. MBRF employs various strategies: i) a distance-based weight-update method based on K-Medoids ii) a sign-based classifier elimination technique. We have equally partitioned the datasets as 70% of data allocated for training and the remaining 30% data as test data. Our imbalanced data ratio measured 3.1666 and 4.6 for FIFA and Olympics datasets. We looked at accuracy, precision, recall and ROC curves for each event. The average AUC achieved by MBRF on FIFA dataset is 0.96 and Olympics is 0.97. A comparison of MBRF and Decision tree model using \u27Entropy\u27 proved MBRF better
Where are my followers? Understanding the Locality Effect in Twitter
Twitter is one of the most used applications in the current Internet with
more than 200M accounts created so far. As other large-scale systems Twitter
can obtain enefit by exploiting the Locality effect existing among its users.
In this paper we perform the first comprehensive study of the Locality effect
of Twitter. For this purpose we have collected the geographical location of
around 1M Twitter users and 16M of their followers. Our results demonstrate
that language and cultural characteristics determine the level of Locality
expected for different countries. Those countries with a different language
than English such as Brazil typically show a high intra-country Locality
whereas those others where English is official or co-official language suffer
from an external Locality effect. This is, their users have a larger number of
followers in US than within their same country. This is produced by two
reasons: first, US is the dominant country in Twitter counting with around half
of the users, and second, these countries share a common language and cultural
characteristics with US
Report on the Information Retrieval Festival (IRFest2017)
The Information Retrieval Festival took place in April 2017 in Glasgow. The focus of the workshop was to bring together IR researchers from the various Scottish universities and beyond in order to facilitate more awareness, increased interaction and reflection on the status of the field and its future. The program included an industry session, research talks, demos and posters as well as two keynotes. The first keynote was delivered by Prof. Jaana Kekalenien, who provided a historical, critical reflection of realism in Interactive Information Retrieval Experimentation, while the second keynote was delivered by Prof. Maarten de Rijke, who argued for more Artificial Intelligence usage in IR solutions and deployments. The workshop was followed by a "Tour de Scotland" where delegates were taken from Glasgow to Aberdeen for the European Conference in Information Retrieval (ECIR 2017
Geospatial data analysis in Russia’s geoweb
The chapter examines the role of geospatial data in Russia’s online ecosystem. Facilitated by the rise of geographic information systems and user-generated content, the distribution of geospatial data has blurred the line between physical spaces and their virtual representations. The chapter discusses different sources of these data available for Digital Russian Studies (e.g., social data and crowdsourced databases) together with the novel techniques for extracting geolocation from various data formats (e.g., textual documents and images). It also scrutinizes different ways of using these data, varying from mapping the spatial distribution of social and political phenomena to investigating the use of geotag data for cultural practices’ digitization to exploring the use of geoweb for narrating individual and collective identities online
Analysis of Twitter activity by country during competitive events
The subject of this work is an attempt to analyze data. Due to its popularity, availability, and the ability to comment on public events, it was decided to use Twitter data as data for analysis. Data obtained in this way are particularly interesting because of the large social and age cross-section that can be observed among Twitter users, so they give a fairly good statistical sample for a given location. The aim was to create a solution to present the number of tweets posted by users from different countries during media events. Geolocation is usually not included in the tweets, so it was decided to work on the location given by the user [1]. The main tools used in the work are the Twitter API, the non-relational database management system MongoDB and the Python high-level programming language. It was also necessary to use external databases to convert location to coordinates. Thanks to the use of solutions presented in the work, it is possible to present the activity of twitter users from different countries through different visualization methods. In order to analyze the data in the best way, it was decided to use the tweets chart in time, maps showing the total intensity of tweets for the 20 most popular locations for the entire event, and a movie based on maps in minute intervals. It was important for the results to be legible and visually appealing, understandable also for people not related to the project. For the analysis, it was decided to choose popular events on an international scale, the course of which will be possible to reproduce after a while during which it will be possible to observe certain culminating points. During the tests, it was checked how the project deals with two events, the Eurovision Song Contest and the UEFA Champions League match between Real Madrid and Paris Saint Germain, the analysis of the data obtained was also carried out and attempts were made to extract conclusions from them. Thanks to the proposed methods, it was possible to obtain very interesting results, among others, to observe how the culmination such as scoring a goal in a football match influences the activity of users on Twitter and for which nations the given event is the most important
Influence of geographic biases on geolocation prediction in Twitter
Geolocating Twitter users --- the task of identifying their home locations --- serves a wide range of community and business applications such as managing natural crises, journalism, and public health. While users can record their location on their profiles, more than 34% record fake or sarcastic locations. Twitter allows users to GPS locate their content, however, less than 1% of tweets are geotagged. Therefore, inferring user location has been an important field of investigation since 2010. This thesis investigates two of the most important factors which can affect the quality of inferring user location: (i) the influence of tweet-language; and (ii) the effectiveness of the evaluation process. Previous research observed that Twitter users writing in some languages appeared to be easier to locate than those writing in others. They speculated that the geographic coverage of a language (language bias) --- represented by the number of locations where the tweets of a specific language come from --- played an important role in determining location accuracy. So important was this role that accuracy might be largely predictable by considering language alone. In this thesis, I investigate the influence of language bias on the accuracy of geolocating Twitter users. The analysis, using a large corpus of tweets written in thirteen languages and a re-implemented state-of-the-art geolocation model back at the time, provides a new understanding of the reasons behind reported performance disparities between languages. The results show that data imbalance in the distribution of Twitter users over locations (population bias) has a greater impact on accuracy than language bias. A comparison between micro and macro averaging demonstrates that existing evaluation approaches are less appropriate than previously thought. The results suggest both averaging approaches should be used to effectively evaluate geolocation. Many approaches have been proposed for automatically geolocating users; at the same time, various evaluation metrics have been proposed to measure the effectiveness of these approaches, making it challenging to understand which of these metrics is the most suitable for this task. In this thesis, I provide a standardized evaluation framework for geolocation systems. The framework is employed to analyze fifteen Twitter user geolocation models and two baselines in a controlled experimental setting. The models are composed of the re-implemented model and a variation of it, two locally retrained open source models and the results of eleven models submitted to a shared task. Models are evaluated using ten metrics --- out of fourteen employed in previous research --- over four geographic granularities. Rank correlations and thorough statistical analysis are used to assess the effectiveness of these metrics. The results demonstrate that the choice of effectiveness metric can have a substantial impact on the conclusions drawn from a geolocation system experiment, potentially leading experimenters to contradictory results about relative effectiveness. For general evaluations, a range of performance metrics should be reported, to ensure that a complete picture of system effectiveness is conveyed. Although a lot of complex geolocation algorithms have been applied in recent years, a majority class baseline is still competitive at coarse geographic granularity. A suite of statistical analysis tests is proposed, based on the employed metric, to ensure that the results are not coincidental
Mobagogy- mobile learning for a higher education community
This paper reports on a project in which a learning community of higher educators was formed to investigate how best to use mobile technologies in their own learning and teaching. Activities of this group included investigating best practice approaches by interviewing experts in the field, exploring the literature on mobile learning and then initiating and testing some mobile learning pedagogies in the context of their own higher education subjects. The community met regularly to discuss emerging issues and applications. The paper shares some of the findings gained both from the expert interviews and from the experiences of members of the community, and discusses the challenges and constraints that were experienced. We conclude with recommendations for promoting mobile learning communities in higher education. © 2010 IADIS
Beyond the culture effect on credibility perception on microblogs
We investigated the credibility perception of tweet readers from the USA and by readers from eight Arabic countries; our aim was to understand if credibility was affected by country and/or by culture. Results from a crowd-sourcing experiment, showed a wide variety of factors affected credibility perception, including a tweet author's gender, profile image, username style, location, and social network overlap with the reader. We found that culture determines readers' credibility perception, but country has no effect. We discuss the implications of our findings for user interface design and social media systems
A Practical Guide for the Effective Evaluation of Twitter User Geolocation
Geolocating Twitter users---the task of identifying their home
locations---serves a wide range of community and business applications such as
managing natural crises, journalism, and public health. Many approaches have
been proposed for automatically geolocating users based on their tweets; at the
same time, various evaluation metrics have been proposed to measure the
effectiveness of these approaches, making it challenging to understand which of
these metrics is the most suitable for this task. In this paper, we propose a
guide for a standardized evaluation of Twitter user geolocation by analyzing
fifteen models and two baselines in a controlled experimental setting. Models
are evaluated using ten metrics over four geographic granularities. We use rank
correlations to assess the effectiveness of these metrics.
Our results demonstrate that the choice of effectiveness metric can have a
substantial impact on the conclusions drawn from a geolocation system
experiment, potentially leading experimenters to contradictory results about
relative effectiveness. We show that for general evaluations, a range of
performance metrics should be reported, to ensure that a complete picture of
system effectiveness is conveyed. Given the global geographic coverage of this
task, we specifically recommend evaluation at micro versus macro levels to
measure the impact of the bias in distribution over locations. Although a lot
of complex geolocation algorithms have been applied in recent years, a majority
class baseline is still competitive at coarse geographic granularity. We
propose a suite of statistical analysis tests, based on the employed metric, to
ensure that the results are not coincidental.Comment: Accepted in the journal of ACM Transactions on Social Computing
(TSC). Extended version of the ASONAM 2018 short paper. Please cite the
TSC/ASONAM version and not the arxiv versio
- …