150,745 research outputs found

    Organised crime and social media: detecting and corroborating weak signals of human trafficking online

    Get PDF
    This paper describes an approach for detecting the presence or emergence of Organised Crime (OC) signals on Social Media. It shows how words and phrases, used by members of the public in Social Media, can be treated as weak signals of OC, enabling information to be classified according to a taxonomy of OC. Formal Concept Analysis is used to group information sources, according to Crime and Location, thus providing a means of corroboration and creating OC Concepts that can be used to alert police analysts to the possible presence of OC. The analyst is able to `drill down' into an OC Concept of interest, discovering additional information that may be pertinent to the crime. The paper describes the implementation of this approach into a fully-functional prototype software system, incorporating a Social Media Scanning System and a map-based user interface. The approach and system are illustrated using the Trafficking of Human Beings as an example. Real data is used to obtain results that show that weak signals of OC have been detected and corroborated, thus alerting to the possible presence of OC. Keyword : organised crime, social media, formal concept analysis

    Organised crime and social media; a system for detecting, corroborating and visualising weak signals of organised crime online

    Get PDF
    This paper describes an approach for detecting the presence or emergence of Organised Crime (OC) signals on Social Media. It shows how words and phrases, used by members of the public in Social Media posts, can be treated as weak signals of OC, enabling information to be classi�ed according to a taxonomy. Formal Concept Analysis (FCA) is used to group information sources, according to Crime-type and Location, thus providing a means of corroboration and creating OC Concepts that can be used to alert police analysts to the possible presence of OC. The analyst is able to `drill down' into an OC Concept of interest, discovering additional information that may be pertinent to the crime. The paper describes the implementation of this approach into a fully-functional prototype software system, incorporating a Social Media scanning system and a map-based user interface. The approach and system are illustrated using Human Tra�cking and Modern Slavery as an example. Real data is used to obtain results that show that weak signals of OC have been detected and corroborated, thus alerting to the possible presence of OC

    Social User Mining: User Profiling of Social Media Network Based on Multimedia Data Mining

    Get PDF
    In recent years, the pervasive use of social media has generated extraordinary amounts of data that has started to gain an increasing amount of attention. Each social media source utilizes different data types such as textual and visual. For example, Twitter is used to transmit short text messages, whereas Flickr is used to convey images and videos. Moreover, Facebook uses all of these data types. From the social media users’ standpoint, it is highly desirable to find patterns from different data formats. The result of the huge amount of data from different sources or types has provided many opportunities for researchers in the fields of data mining and data analytics. Not only the methods and tools to organize and manage such data have become extremely important, but also methods and tools to discover hidden knowledge from such data, which can be used for a variety of applications. For example, the mining of a user's profile on social media could help to discover any missing information, including the user's location or gender information. However, the task of developing such methods and tools is very challenging. Social media data is unstructured and different from traditional data because of its privacy settings, data noise, and large capacity of data. Moreover, combining image features and text information annotated by users reveals interesting properties of social user mining, and serves as a useful tool for discovering unknown information about the users. Minimal research has been conducted on the combination of image and text data for social user mining. To address these challenges and to discover unknown information about users, we proposed a novel mining framework for social user mining that includes: 1) a data assemble module for different media source, 2) a data integration module, and 3) mining applications. First, we introduced a data assemble module in order to process both the textual and the visual information from different media sources, and evaluated the appropriate multimedia features for social user mining. Then, we proposed a new data integration method in order to integrate the textual and the visual data. Unlike the previous approaches that used a content based approach to merge multiple types of features, our main approach is based on image semantics through a semi-automatic image tagging system. Lastly, we presented two different application as an example of social user mining, gender classification and user location

    Confounds and Consequences in Geotagged Twitter Data

    Full text link
    Twitter is often used in quantitative studies that identify geographically-preferred topics, writing styles, and entities. These studies rely on either GPS coordinates attached to individual messages, or on the user-supplied location field in each profile. In this paper, we compare these data acquisition techniques and quantify the biases that they introduce; we also measure their effects on linguistic analysis and text-based geolocation. GPS-tagging and self-reported locations yield measurably different corpora, and these linguistic differences are partially attributable to differences in dataset composition by age and gender. Using a latent variable model to induce age and gender, we show how these demographic variables interact with geography to affect language use. We also show that the accuracy of text-based geolocation varies with population demographics, giving the best results for men above the age of 40.Comment: final version for EMNLP 201

    A Local-Global LDA Model for Discovering Geographical Topics from Social Media

    Full text link
    Micro-blogging services can track users' geo-locations when users check-in their places or use geo-tagging which implicitly reveals locations. This "geo tracking" can help to find topics triggered by some events in certain regions. However, discovering such topics is very challenging because of the large amount of noisy messages (e.g. daily conversations). This paper proposes a method to model geographical topics, which can filter out irrelevant words by different weights in the local and global contexts. Our method is based on the Latent Dirichlet Allocation (LDA) model but each word is generated from either a local or a global topic distribution by its generation probabilities. We evaluated our model with data collected from Weibo, which is currently the most popular micro-blogging service for Chinese. The evaluation results demonstrate that our method outperforms other baseline methods in several metrics such as model perplexity, two kinds of entropies and KL-divergence of discovered topics
    corecore