4,240 research outputs found

    State of the art 2015: a literature review of social media intelligence capabilities for counter-terrorism

    Get PDF
    Overview This paper is a review of how information and insight can be drawn from open social media sources. It focuses on the specific research techniques that have emerged, the capabilities they provide, the possible insights they offer, and the ethical and legal questions they raise. These techniques are considered relevant and valuable in so far as they can help to maintain public safety by preventing terrorism, preparing for it, protecting the public from it and pursuing its perpetrators. The report also considers how far this can be achieved against the backdrop of radically changing technology and public attitudes towards surveillance. This is an updated version of a 2013 report paper on the same subject, State of the Art. Since 2013, there have been significant changes in social media, how it is used by terrorist groups, and the methods being developed to make sense of it.  The paper is structured as follows: Part 1 is an overview of social media use, focused on how it is used by groups of interest to those involved in counter-terrorism. This includes new sections on trends of social media platforms; and a new section on Islamic State (IS). Part 2 provides an introduction to the key approaches of social media intelligence (henceforth ‘SOCMINT’) for counter-terrorism. Part 3 sets out a series of SOCMINT techniques. For each technique a series of capabilities and insights are considered, the validity and reliability of the method is considered, and how they might be applied to counter-terrorism work explored. Part 4 outlines a number of important legal, ethical and practical considerations when undertaking SOCMINT work

    Creating extended gender labelled datasets of Twitter users

    Get PDF
    The gender information of a Twitter user is not known a priori when analysing Twitter data, because user registration does not include gender information. This paper proposes an approach for creating extended gender labelled datasets of Twitter users. The process involves creating a smaller database of active Twitter users and to manually label the gender. The process follows by extracting features from unstructured information found on each user profile and by creating a gender classification model. The model is then applied to a larger dataset, thus providing automatic labels and corresponding confidence scores, which can be used to estimate the most accurately labeled users. The resulting databases can be further enriched with additional information extracted, for example, from the profile picture and from the user location. The proposed approach was successfully applied to English and Portuguese users, leading to two large datasets containing more than 57K labeled users each.info:eu-repo/semantics/acceptedVersio

    Personality Based Recommendation System Using Social Media

    Get PDF
    Recommendation system is the reason of success for most of the social media companies as well as e-commerce sites. Giving recommendation to the uses is one of the interesting and challenging tasks nowadays, it helps to generate revenue, to increase number of users, to reduce the searching time for particular item. Recommendation system helps for making interest in user and eventually it increases the popularity of any site. Huge number of items (product, users, movies, songs, hotels etc.) and its feature sets makes it hard to predict the accurate items to the user. It is important to keep all historic data of user as well as all information about the items to generate recommendation. In this paper, the personality of the user is used with the combination on the most popular recommendation techniques like collaborative filtering (CF) and content based filtering (CB) proposed on the amazon review data set. In the first model the personality of the user is calculated by using the big five model on the twitter account. In the second module Collaborative filtering is used to generate the recommendation based on the historic information of the user wherries in third module, Content based filtering is uses to generate recommendation based on the feature set of the item. Pearson-correlation algorithm is applied on both modules and ranking are generated. Finally union of the both vector space are taken as the final recommendation

    Analyzing Social and Stylometric Features to Identify Spear phishing Emails

    Full text link
    Spear phishing is a complex targeted attack in which, an attacker harvests information about the victim prior to the attack. This information is then used to create sophisticated, genuine-looking attack vectors, drawing the victim to compromise confidential information. What makes spear phishing different, and more powerful than normal phishing, is this contextual information about the victim. Online social media services can be one such source for gathering vital information about an individual. In this paper, we characterize and examine a true positive dataset of spear phishing, spam, and normal phishing emails from Symantec's enterprise email scanning service. We then present a model to detect spear phishing emails sent to employees of 14 international organizations, by using social features extracted from LinkedIn. Our dataset consists of 4,742 targeted attack emails sent to 2,434 victims, and 9,353 non targeted attack emails sent to 5,912 non victims; and publicly available information from their LinkedIn profiles. We applied various machine learning algorithms to this labeled data, and achieved an overall maximum accuracy of 97.76% in identifying spear phishing emails. We used a combination of social features from LinkedIn profiles, and stylometric features extracted from email subjects, bodies, and attachments. However, we achieved a slightly better accuracy of 98.28% without the social features. Our analysis revealed that social features extracted from LinkedIn do not help in identifying spear phishing emails. To the best of our knowledge, this is one of the first attempts to make use of a combination of stylometric features extracted from emails, and social features extracted from an online social network to detect targeted spear phishing emails.Comment: Detection of spear phishing using social media feature

    A survey of location inference techniques on Twitter

    Get PDF
    The increasing popularity of the social networking service, Twitter, has made it more involved in day-to-day communications, strengthening social relationships and information dissemination. Conversations on Twitter are now being explored as indicators within early warning systems to alert of imminent natural disasters such as earthquakes and aid prompt emergency responses to crime. Producers are privileged to have limitless access to market perception from consumer comments on social media and microblogs. Targeted advertising can be made more effective based on user profile information such as demography, interests and location. While these applications have proven beneficial, the ability to effectively infer the location of Twitter users has even more immense value. However, accurately identifying where a message originated from or an author’s location remains a challenge, thus essentially driving research in that regard. In this paper, we survey a range of techniques applied to infer the location of Twitter users from inception to state of the art. We find significant improvements over time in the granularity levels and better accuracy with results driven by refinements to algorithms and inclusion of more spatial features

    Detecting portuguese and english Twitter users’ gender

    Get PDF
    Existing social networking services provide means for people to communicate and express their feelings in a easy way. Such user generated content contains clues of user’s behaviors and preferences, as well as other metadata information that is now available for scientific research. Twitter, in particular, has become a relevant source for social networking studies, mainly because: it provides a simple way for users to express their feelings, ideas, and opinions; makes the user generated content and associated metadata available to the community; and furthermore provides easy-to-use web interfaces and application programming interfaces (API) to access data. For many studies, the available information about a user is relevant. However, the gender attribute is not provided when creating a Twitter account. The main focus of this study is to infer the users’ gender from other available information. We propose a methodology for gender detection of Twitter users, using unstructured information found on Twitter profile, user generated content, and later using the user’s profile picture. In previous studies, one of the challenges presented was the labor-intensive task of manually labelling datasets. In this study, we propose a method for creating extended labelled datasets in a semi-automatic fashion. With the extended labelled datasets, we associate the users’ textual content with their gender and created gender models, based on the users’ generated content and profile information. We explore supervised and unsupervised classifiers and evaluate the results in both Portuguese and English Twitter user datasets. We obtained an accuracy of 93.2% with English users and an accuracy of 96.9% with Portuguese users. The proposed methodology of our research is language independent, but our focus was given to Portuguese and English users.Os serviços de redes sociais existentes proporcionam meios para as pessoas comunicarem e exprimirem os seus sentimentos de uma forma fĂĄcil. O conteĂșdo gerado por estes utilizadores contĂ©m indĂ­cios dos seus comportamentos e preferĂȘncias, bem como outros metadados que estĂŁo agora disponĂ­veis para investigação cientĂ­fica. O Twitter em particular, tornou-se uma fonte importante para estudos das redes socias, sobretudo porque fornece um modo simples para os utilizadores expressarem os seus sentimentos, ideias e opiniĂ”es; disponibiliza o conteĂșdo gerado pelos utilizadores e os metadados associados Ă  comunidade; e fornece interfaces web e interfaces de programação de aplicaçÔes (API) para acesso aos dados de fĂĄcil utilização. Para muitos estudos, a informação disponĂ­vel sobre um utilizador Ă© relevante. No entanto, o atributo de gĂ©nero nĂŁo Ă© fornecido ao criar uma conta no Twitter. O foco principal deste estudo Ă© inferir o gĂ©nero dos utilizadores atravĂ©s da informação disponĂ­vel. Propomos uma metodologia para a detecção de gĂ©nero de utilizadores do Twitter, usando informação nĂŁo estruturada encontrada no perfil do Twitter, no conteĂșdo gerado pelo utilizador, e mais tarde usando a imagem de perfil do utilizador. Em estudos anteriores, um dos desafios apresentados foi a tarefa de etiquetar manualmente dados, que revelou exigir bastante trabalho. Neste estudo, propomos um mĂ©todo para a criação de conjuntos de dados etiquetados de uma forma semi-automĂĄtica, utilizando um conjunto de atributos com base na informação nĂŁo estruturada de perfil. Utilizando os conjuntos de dados etiquetados, associamos conteĂșdo textual ao seu gĂ©nero e criamos modelos, com base no conteĂșdo gerado pelos utilizadores, e na informação de perfil. Exploramos classificadores supervisionados e nĂŁo supervisionados e avaliamos os resultados em ambos os conjuntos de dados de utilizadores Portugueses e Ingleses do Twitter. Obtivemos uma precisĂŁo de 93,2% com utilizadores Ingleses e uma precisĂŁo de 96,9% com utilizadores Portugueses. A metodologia proposta Ă© independente do idioma, mas o foco foi dado a utilizadores Portugueses e Ingleses
    • 

    corecore