51 research outputs found

    Mining criminal networks from chat log

    Get PDF
    Cyber criminals exploit opportunities for anonymity and masquerade in web-based communication to conduct illegal activities such as phishing, spamming, cyber predation, cyber threatening, blackmail, and drug trafficking. One way to fight cyber crime is to collect digital evidence from online documents and to prosecute cyber criminals in the court of law. In this paper, we propose a unified framework using data mining and natural language processing techniques to analyze online messages for the purpose of crime investigation. Our framework takes the chat log from a confiscated computer as input, extracts the social networks from the log, summarizes chat conversations into topics, identifies the information relevant to crime investigation, and visualizes the knowledge for an investigator. To ensure that the implemented framework meets the needs of law enforcement officers in real-life investigation, we closely collaborate with the cyber crime unit of a law enforcement agency in Canada. Both the feedback from the law enforcement officers and experimental results suggest that the proposed chat log mining framework is effective for crime investigation. © 2012 IEEE

    Mining known attack patterns from security-related events

    Get PDF
    Managed Security Services (MSS) have become an essential asset for companies to have in order to protect their infrastructure from hacking attempts such as unauthorized behaviour, denial of service (DoS), malware propagation, and anomalies. A proliferation of attacks has determined the need for installing more network probes and collecting more security-related events in order to assure the best coverage, necessary for generating incident responses. The increase in volume of data to analyse has created a demand for specific tools that automatically correlate events and gather them in pre-defined scenarios of attacks. Motivated by Above Security, a specialized company in the sector, and by National Research Council Canada (NRC), we propose a new data mining system that employs text mining techniques to dynamically relate security-related events in order to reduce analysis time, increase the quality of the reports, and automatically build correlated scenarios

    E-mail authorship attribution using customized associative classification

    Get PDF
    E-mail communication is often abused for conducting social engineering attacks including spamming, phishing, identity theft and for distributing malware. This is largely attributed to the problem of anonymity inherent in the standard electronic mail protocol. In the literature, authorship attribution is studied as a text categorization problem where the writing styles of individuals are modeled based on their previously written sample documents. The developed model is employed to identify the most plausible writer of the text. Unfortunately, most existing studies focus solely on improving predictive accuracy and not on the inherent value of the evidence collected. In this study, we propose a customized associative classification technique, a popular data mining method, to address the authorship attribution problem. Our approach models the unique writing style features of a person, measures the associativity of these features and produces an intuitive classifier. The results obtained by conducting experiments on a real dataset reveal that the presented method is very effective

    Security and privacy challenges in smart cities

    Get PDF
    © 2018 Elsevier Ltd The construction of smart cities will bring about a higher quality of life to the masses through digital interconnectivity, leading to increased efficiency and accessibility in cities. Smart cities must ensure individual privacy and security in order to ensure that its citizens will participate. If citizens are reluctant to participate, the core advantages of a smart city will dissolve. This article will identify and offer possible solutions to five smart city challenges, in hopes of anticipating destabilizing and costly disruptions. The challenges include privacy preservation with high dimensional data, securing a network with a large attack surface, establishing trustworthy data sharing practices, properly utilizing artificial intelligence, and mitigating failures cascading through the smart network. Finally, further research directions are provided to encourage further exploration of smart city challenges before their construction

    Enabling Secure Trustworthiness Assessment and Privacy Protection in Integrating Data for Trading Person-Specific Information

    Get PDF
    IEEE With increasing adoption of cloud services in the e-market, collaboration between stakeholders is easier than ever. Consumer stakeholders demand data from various sources to analyze trends and improve customer services. Data-as-a-service enables data integration to serve the demands of data consumers. However, the data must be of good quality and trustful for accurate analysis and effective decision making. In addition, a data custodian or provider must conform to privacy policies to avoid potential penalties for privacy breaches. To address these challenges, we propose a twofold solution: 1) we present the first information entropy-based trust computation algorithm, IEB_Trust, that allows a semitrusted arbitrator to detect the covert behavior of a dishonest data provider and chooses the qualified providers for a data mashup and 2) we incorporate the Vickrey–Clarke–Groves (VCG) auction mechanism for the valuation of data providers’ attributes into the data mashup process. Experiments on real-life data demonstrate the robustness of our approach in restricting dishonest providers from participation in the data mashup and improving the efficiency in comparison to provenance-based approaches. Furthermore, we derive the monetary shares for the chosen providers from their information utility and trust scores over the differentially private release of the integrated dataset under their joint privacy requirements

    Of Stances, Themes, and Anomalies in COVID-19 Mask-Wearing Tweets

    Get PDF
    COVID-19 is an opportunity to study public acceptance of a ‘‘new’’ healthcare intervention, universal masking, which unlike vaccination, is mostly alien to the Anglosphere public despite being practiced in ages past. Using a collection of over two million tweets, we studied the ways in which proponents and opponents of masking vied for influence as well as the themes driving the discourse. Pro-mask tweets encouraging others to mask up dominated Twitter early in the pandemic though its continued dominance has been eroded by anti-mask tweets criticizing others for their masking behavior. Engagement, represented by the counts of likes, retweets, and replies, and controversiality and disagreeableness, represented by ratios of the aforementioned counts, favored pro-mask tweets initially but with anti-mask tweets slowly gaining ground. Additional analysis raised the possibility of the platform owners suppressing certain parts of the mask-wearing discussion

    Fusion: Privacy-preserving distributed protocol for high-dimensional data mashup

    Get PDF
    © 2015 IEEE. In the last decade, several approaches concerning private data release for data mining have been proposed. Data mashup, on the other hand, has recently emerged as a mechanism for integrating data from several data providers. Fusing both techniques to generate mashup data in a distributed environment while providing privacy and utility guarantees on the output involves several challenges. That is, how to ensure that no unnecessary information is leaked to the other parties during the mashup process, how to ensure the mashup data is protected against certain privacy threats, and how to handle the high-dimensional nature of the mashup data while guaranteeing high data utility. In this paper, we present Fusion, a privacy-preserving multi-party protocol for data mashup with guaranteed LKC-privacy for the purpose of data mining. Experiments on real-life data demonstrate that the anonymous mashup data provide better data utility, the approach can handle high dimensional data, and it is scalable with respect to the data size

    Distinguishing between fake news and satire with transformers

    Get PDF
    Indiscriminate elimination of harmful fake news risks destroying satirical news, which can be benign or even beneficial, because both types of news share highly similar textual cues. In this work we applied a recent development in neural network architecture, transformers, to the task of separating satirical news from fake news. Transformers have hitherto not been applied to this specific problem. Our evaluation results on a publicly available and carefully curated dataset show that the performance from a classifier framework built around a DistilBERT architecture performed better than existing machine-learning approaches. Additional improvement over baseline DistilBERT was achieved through the use of non-standard tokenization schemes as well as varying the pre-training and text pre-processing strategies. The improvement over existing approaches stands at 0.0429 (5.2%) in F1 and 0.0522 (6.4%) in accuracy. Further evaluation on two additional datasets shows our framework\u27s ability to generalize across datasets without diminished performance

    Anonymity meets game theory: secure data integration with malicious participants

    Get PDF
    Data integration methods enable different data providers to flexibly integrate their expertise and deliver highly customizable services to their customers. Nonetheless, combining data from different sources could potentially reveal person-specific sensitive information. In VLDBJ 2006, Jiang and Clifton (Very Large Data Bases J (VLDBJ) 15(4):316–333, 2006) propose a secure Distributed k-Anonymity (DkA) framework for integrating two private data tables to a k-anonymous table in which each private table is a vertical partition on the same set of records. Their proposed DkA framework is not scalable to large data sets. Moreover, DkA is limited to a two-party scenario and the parties are assumed to be semi-honest. In this paper, we propose two algorithms to securely integrate private data from multiple parties (data providers). Our first algorithm achieves the k-anonymity privacy model in a semi-honest adversary model. Our second algorithm employs a game-theoretic approach to thwart malicious participants and to ensure fair and honest participation of multiple data providers in the data integration process. Moreover, we study and resolve a real-life privacy problem in data sharing for the financial industry in Sweden. Experiments on the real-life data demonstrate that our proposed algorithms can effectively retain the essential information in anonymous data for data analysis and are scalable for anonymizing large data sets

    Privacy-preserving data mashup model for trading person-specific information

    Get PDF
    © 2016 Elsevier B.V. All rights reserved. Business enterprises adopt cloud integration services to improve collaboration with their trading partners and to deliver quality data mining services. Data-as-a-Service (DaaS) mashup allows multiple enterprises to integrate their data upon the demand of consumers. Business enterprises face challenges not only to protect private data over the cloud but also to legally adhere to privacy compliance rules when trading person-specific data. They need an effective privacy-preserving business model to deal with the challenges in emerging markets. We propose a model that allows the collaboration of multiple enterprises for integrating their data and derives the contribution of each data provider by valuating the incorporated cost factors. This model serves as a guide for business decision-making, such as estimating the potential risk and finding the optimal value for publishing mashup data. Experiments on real-life data demonstrate that our approach can identify the optimal value in data mashup for different privacy models, including K-anonymity, LKC-privacy, and ∈-differential privacy, with various anonymization algorithms and privacy parameters
    • …
    corecore