524 research outputs found

    Detecting Illicit Drug Ads in Google+ Using Machine Learning

    Get PDF
    Opioid abuse epidemics is a major public health emergency in the US. Social media platforms have facilitated illicit drug trading, with significant amount of drug advertisement and selling being carried out online. In order to understand dynamics of drug abuse epidemics and design efficient public health interventions, it is essential to extract and analyze data from online drug markets. In this paper, we present a computational framework for automatic detection of illicit drug ads in social media, with Google+ being used for a proof-of-concept. The proposed SVM- and CNN-based methods have been extensively validated on the large dataset containing millions of posts collected using Google+ API. Experimental results demonstrate that our methods can efficiently identify illicit drug ads with high accuracy. Both approaches have been extensively validated using the dataset containing millions of posts collected using Google+ API. Experimental results demonstrate that both methods allow for accurate identification of illicit drug ads

    Understanding the difference in malicious activity between Surface Web and Dark Web

    Get PDF
    The world has seen a dramatic increase in illegal activities on the Internet. Prior research has investigated different types of cybercrime, especially in the Surface Web, which is the portion of the content on the World Wide Web that popular engines may index. At the same time, evidence suggests cybercriminals are moving their operations to the Dark Web. This portion is not indexed by conventional search engines and is accessed through network overlays such as The Onion Router network. Since the Dark Web provides anonymity, cybercriminals use this environment to avoid getting caught or blocked, which represents a significant challenge for researchers. This research project investigates the modus operandi of cybercriminals on the Surface Web and the Dark Web to understand how cybercrime unfolds in different layers of the Web. Honeypots, specialised crawlers and extraction tools are used to analyse different types of online crimes. In addition, quantitative analysis is performed to establish comparisons between the two Web environments. This thesis is comprised of three studies. The first examines the use of stolen account credentials leaked in different outlets on the Surface and Dark Web to understand how cybercriminals interact with stolen credentials in the wild. In the second study, malvertising is analysed from the user's perspective to understand whether using different technologies to access the Web could influence the probability of malware infection. In the final study, underground forums on the Surface and Dark Web are analysed to observe differences in trading patterns in both environments. Understanding how criminals operate in different Web layers is essential to developing policies and countermeasures to prevent cybercrime more efficiently

    Emergent Medical Data: Health Information Inferred by Artificial Intelligence

    Get PDF
    Artificial intelligence (AI) can infer health data from people’s behavior even when their behavior has no apparent connection to their health. AI can monitor one’s location to track the spread of infectious disease, scrutinize retail purchases to identify pregnant customers, and analyze social media to predict who might attempt suicide. These feats are possible because, in modern societies, people continuously interact with internet-enabled software and devices. Smartphones, wearables, and online platforms monitor people’s actions and produce digital traces, the electronic remnants of their behavior. In their raw form, digital traces might not be very interesting or useful; one’s location, retail purchases, and internet browsing habits are relatively mundane data points. However, AI can enhance the value of digital traces by transforming them into something more useful—emergent medical data (EMD). EMD is health information inferred by artificial intelligence from otherwise trivial digital traces. This Article describes how EMD-based profiling is increasingly promoted as a solution to public health crises such as the COVID-19 pandemic, gun violence, and the opioid crisis. However, there is little evidence to show that EMD-based profiling works. Even worse, it can cause significant harm, and current privacy and data protection laws contain loopholes that allow public and private entities to mine EMD without people’s knowledge or consent. After describing the risks and benefits of EMD mining and profiling, the Article proposes six different ways of conceptualizing these practices. It concludes with preliminary recommendations for effective regulation. Potential options include banning or restricting the collection of digital traces, regulating EMD mining algorithms, and restricting how EMD can be used once it is produced

    On the Social and Technical Challenges of Web Search Autosuggestion Moderation

    Full text link
    Past research shows that users benefit from systems that support them in their writing and exploration tasks. The autosuggestion feature of Web search engines is an example of such a system: It helps users in formulating their queries by offering a list of suggestions as they type. Autosuggestions are typically generated by machine learning (ML) systems trained on a corpus of search logs and document representations. Such automated methods can become prone to issues that result in problematic suggestions that are biased, racist, sexist or in other ways inappropriate. While current search engines have become increasingly proficient at suppressing such problematic suggestions, there are still persistent issues that remain. In this paper, we reflect on past efforts and on why certain issues still linger by covering explored solutions along a prototypical pipeline for identifying, detecting, and addressing problematic autosuggestions. To showcase their complexity, we discuss several dimensions of problematic suggestions, difficult issues along the pipeline, and why our discussion applies to the increasing number of applications beyond web search that implement similar textual suggestion features. By outlining persistent social and technical challenges in moderating web search suggestions, we provide a renewed call for action.Comment: 17 Pages, 4 images displayed within 3 latex figure

    Organised crime and social media; a system for detecting, corroborating and visualising weak signals of organised crime online

    Get PDF
    This paper describes an approach for detecting the presence or emergence of Organised Crime (OC) signals on Social Media. It shows how words and phrases, used by members of the public in Social Media posts, can be treated as weak signals of OC, enabling information to be classi�ed according to a taxonomy. Formal Concept Analysis (FCA) is used to group information sources, according to Crime-type and Location, thus providing a means of corroboration and creating OC Concepts that can be used to alert police analysts to the possible presence of OC. The analyst is able to `drill down' into an OC Concept of interest, discovering additional information that may be pertinent to the crime. The paper describes the implementation of this approach into a fully-functional prototype software system, incorporating a Social Media scanning system and a map-based user interface. The approach and system are illustrated using Human Tra�cking and Modern Slavery as an example. Real data is used to obtain results that show that weak signals of OC have been detected and corroborated, thus alerting to the possible presence of OC

    The Dark Web and Human Trafficking

    Get PDF
    This is a quantitative-comparative analysis that focuses on Artificial Intelligence (AI) platforms that assist law enforcement agencies as they combat human trafficking. Human trafficking is a Transnational Organized Crime (TOC) which means it can impact every country in the world, and in doing so, impact every person in the world. AI uses machine-learning capabilities to identify clusters, odd and/or unusual font, words, numbers, and other markers in advertisements that promote the sale of human beings. Human trafficking affects males, females, and children of all ages and can include different types of trafficking such as sex and labor trafficking. By using these AI platforms, law enforcement officers are able to identify and help more human beings than ever before in a quicker timeframe. This quantitative-comparative analysis compared Spotlight, Traffick Jam, Traffick Cam, and Domain Insight Graph (DIG) to determine if these platforms were helping law enforcement. The study revolved around the questions of accuracy, consistency, and effectiveness with each platform and found that the majority of AI platforms led the way to promote better, more efficient platforms by the same companies that learned how changes could assist law enforcement more in the future. While each platform assisted in their own ways, there were deltas in each area that leads to the need for future research in the area of AI and how it can be used to help victims of human trafficking and convict human traffickers more in later years

    Applying Data Mining Algorithms on Open Source Intelligence to Combat Cyber Crime

    Get PDF
    In this dissertation, we investigate the applications of data mining algorithms on online criminal information. Ever since the entry of the information era, the development of the world wide web makes the convenience of peoples\u27 lives to the next level. However, at the same time, the web is utilized by criminals for illegal activities like drug smuggling and online fraudulence. Cryptomarkets and instant message software are the most popular two online platforms for criminal activities. Here, we try to extract useful information from related open source intelligence in these two platforms with data mining algorithms. Cryptomarkets (or darknet markets) are commercial hidden-service websites that operate on The Onion Router (Tor) anonymity network, which have grown rapidly in recent years. In this dissertation, we discover interesting characteristics of Bitcoin transaction patterns in cryptomarkets. We present a method to identify vendors\u27 Bitcoin addresses by matching vendors\u27 feedback reviews with Bitcoin transactions in the public ledger. We further propose a cost-effective algorithm to accelerate both steps effectively. Comprehensive experimental results have demonstrated the effectiveness and efficiency of the proposed method. Instant message(IM) software is another base for these criminal activities. Users of IM applications can easily hide their identities while interacting with strangers online. In this dissertation, we propose an effective model to discover hidden networks of influence between members in a group chat. By transferring the whole chat history to sequential events, we can model message sequences to a multi-dimensional Hawkes process and learn the Granger Causality between different individuals. We learn the influence graph by applying an expectation–maximization(EM) algorithm on our text biased multi-dimensional Hawkes Process. Users in IM software normally maintain multiple accounts. We propose a model to cluster the accounts that belong to the same user
    corecore