12,758 research outputs found

    Finding Street Gang Members on Twitter

    Full text link
    Most street gang members use Twitter to intimidate others, to present outrageous images and statements to the world, and to share recent illegal activities. Their tweets may thus be useful to law enforcement agencies to discover clues about recent crimes or to anticipate ones that may occur. Finding these posts, however, requires a method to discover gang member Twitter profiles. This is a challenging task since gang members represent a very small population of the 320 million Twitter users. This paper studies the problem of automatically finding gang members on Twitter. It outlines a process to curate one of the largest sets of verifiable gang member profiles that have ever been studied. A review of these profiles establishes differences in the language, images, YouTube links, and emojis gang members use compared to the rest of the Twitter population. Features from this review are used to train a series of supervised classifiers. Our classifier achieves a promising F1 score with a low false positive rate.Comment: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016

    Fact Checking in Community Forums

    Full text link
    Community Question Answering (cQA) forums are very popular nowadays, as they represent effective means for communities around particular topics to share information. Unfortunately, this information is not always factual. Thus, here we explore a new dimension in the context of cQA, which has been ignored so far: checking the veracity of answers to particular questions in cQA forums. As this is a new problem, we create a specialized dataset for it. We further propose a novel multi-faceted model, which captures information from the answer content (what is said and how), from the author profile (who says it), from the rest of the community forum (where it is said), and from external authoritative sources of information (external support). Evaluation results show a MAP value of 86.54, which is 21 points absolute above the baseline.Comment: AAAI-2018; Fact-Checking; Veracity; Community-Question Answering; Neural Networks; Distributed Representation

    Niche as a determinant of word fate in online groups

    Get PDF
    Patterns of word use both reflect and influence a myriad of human activities and interactions. Like other entities that are reproduced and evolve, words rise or decline depending upon a complex interplay between {their intrinsic properties and the environments in which they function}. Using Internet discussion communities as model systems, we define the concept of a word niche as the relationship between the word and the characteristic features of the environments in which it is used. We develop a method to quantify two important aspects of the size of the word niche: the range of individuals using the word and the range of topics it is used to discuss. Controlling for word frequency, we show that these aspects of the word niche are strong determinants of changes in word frequency. Previous studies have already indicated that word frequency itself is a correlate of word success at historical time scales. Our analysis of changes in word frequencies over time reveals that the relative sizes of word niches are far more important than word frequencies in the dynamics of the entire vocabulary at shorter time scales, as the language adapts to new concepts and social groupings. We also distinguish endogenous versus exogenous factors as additional contributors to the fates of words, and demonstrate the force of this distinction in the rise of novel words. Our results indicate that short-term nonstationarity in word statistics is strongly driven by individual proclivities, including inclinations to provide novel information and to project a distinctive social identity.Comment: Supporting Information is available here: http://www.plosone.org/article/fetchSingleRepresentation.action?uri=info:doi/10.1371/journal.pone.0019009.s00

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Site Selection Using Geo-Social Media: A Study For Eateries In Lisbon

    Get PDF
    Dissertation submitted in partial fulfilment of the requirements for the Degree of Master of Science in Geospatial TechnologiesThe rise in the influx of multicultural societies, studentification, and overall population growth has positively impacted the local economy of eateries in Lisbon, Portugal. However, this has also increased retail competition, especially in tourism. The overall increase in multicultural societies has also led to an increase in multiple smaller hotspots of human-urban attraction, making the concept of just one downtown in the city a little vague. These transformations of urban cities pose a big challenge for upcoming retail and eateries store owners in finding the most optimal location to set up their shops. An optimal site selection strategy should recommend new locations that can maximize the revenues of a business. Unfortunately, with dynamically changing human-urban interactions, traditional methods like relying on census data or surveys to understand neighborhoods and their impact on businesses are no more reliable or scalable. This study aims to address this gap by using geo-social data extracted from social media platforms like Twitter, Flickr, Instagram, and Google Maps, which then acts as a proxy to the real population. Seven variables are engineered at a neighborhood level using this data: business interest, age, gender, spatial competition, spatial proximity to stores, homogeneous neighborhoods, and percentage of the native population. A Random Forest based binary classification method is then used to predict whether a Point of Interest (POI) can be a part of any neighborhood n. The results show that using only these 7 variables, an F1-Score of 83% can be achieved in classifying whether a neighborhood is good for an “eateries” POI. The methodology used in this research is made to work with open data and be generic and reproducible to any city worldwide

    Native language identification of fluent and advanced non-native writers

    Get PDF
    This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202 The accepted version of the publication may differ from the final published version.Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages.Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University | Digital Economy Promotion Agency (# MP-62-0003) | Thailand Research Funds (MRG6180266 and MRG6280175).Published versio

    The Rhetorical Impact of Polylingualism Employed by Lithuanian Politicians on Facebook

    Get PDF
    The aim of the research presented in this article aims to determine the impact of polylingualism on the effectiveness of political rhetoric in Lithuania. The study focuses on elements borrowed from other languages and used by Lithuanian politicians in their Facebook posts. In addition, the motivation behind such use is explored, aiming to establish whether polylingualism is part of a conscious effort of political communication in order to build a positive image. Within the scope of this research are Facebook posts containing cases of polylingualism, specifically, English-language inserts. The authors of these posts are prominent politicians who are native Lithuanian speakers engaged in active communication on social media. Collected during the period of 2018–2021, the research material was examined using the method of rhetorical discourse analysis, resulting in the identification of characteristic instruments of persuasion, i.e. the tools which help enhance the effectiveness of certain discourse. The researchers aimed to determine the general patterns and dominant tendencies of mixed speech within the political discourse on social media. The research reveals the use of polylingualism as a stylistic tool imitating informal speaking and creating contextual discourse
    corecore