287 research outputs found

    Deficient data classification with fuzzy learning

    Full text link
    This thesis first proposes a novel algorithm for handling both missing values and imbalanced data classification problems. Then, algorithms for addressing the class imbalance problem in Twitter spam detection (Network Security Problem) have been proposed. Finally, the security profile of SVM against deliberate attacks has been simulated and analysed.<br /

    Support Directional Shifting Vector: A Direction Based Machine Learning Classifier

    Get PDF
    Machine learning models have been very popular nowadays for providing rigorous solutions to complicated real-life problems. There are three main domains named supervised, unsupervised, and reinforcement. Supervised learning mainly deals with regression and classification. There exist several types of classification algorithms, and these are based on various bases. The classification performance varies based on the dataset velocity and the algorithm selection. In this article, we have focused on developing a model of angular nature that performs supervised classification. Here, we have used two shifting vectors named Support Direction Vector (SDV) and Support Origin Vector (SOV) to form a linear function. These vectors form a linear function to measure cosine-angle with both the target class data and the non-target class data. Considering target data points, the linear function takes such a position that minimizes its angle with target class data and maximizes its angle with non-target class data. The positional error of the linear function has been modelled as a loss function which is iteratively optimized using the gradient descent algorithm. In order to justify the acceptability of this method, we have implemented this model on three different standard datasets. The model showed comparable accuracy with the existing standard supervised classification algorithm. Doi: 10.28991/esj-2021-01306 Full Text: PD

    Evidence-based Cybersecurity: Data-driven and Abstract Models

    Get PDF
    Achieving computer security requires both rigorous empirical measurement and models to understand cybersecurity phenomena and the effectiveness of defenses and interventions. To address the growing scale of cyber-insecurity, my approach to protecting users employs principled and rigorous measurements and models. In this dissertation, I examine four cybersecurity phenomena. I show that data-driven and abstract modeling can reveal surprising conclusions about longterm, persistent problems, like spam and malware, and growing threats like data-breaches and cyber conflict. I present two data-driven statistical models and two abstract models. Both of the data-driven models show that the presence of heavy-tailed distributions can make naive analysis of trends and interventions misleading. First, I examine ten years of publicly reported data breaches and find that there has been no increase in size or frequency. I also find that reported and perceived increases can be explained by the heavy-tailed nature of breaches. In the second data-driven model, I examine a large spam dataset, analyzing spam concentrations across Internet Service Providers. Again, I find that the heavy-tailed nature of spam concentrations complicates analysis. Using appropriate statistical methods, I identify unique risk factors with significant impact on local spam levels. I then use the model to estimate the effect of historical botnet takedowns and find they are frequently ineffective at reducing global spam concentrations and have highly variable local effects. Abstract models are an important tool when data are unavailable. Even without data, I evaluate both known and hypothesized interventions used by search providers to protect users from malicious websites. I present a Markov model of malware spread and study the effect of two potential interventions: blacklisting and depreferencing. I find that heavy-tailed traffic distributions obscure the effects of interventions, but with my abstract model, I showed that lowering search rankings is a viable alternative to blacklisting infected pages. Finally, I study how game-theoretic models can help clarify strategic decisions in cyber-conflict. I find that, in some circumstances, improving the attribution ability of adversaries may decrease the likelihood of escalating cyber conflict

    A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces

    Get PDF
    Mutemi, A., & Bacao, F. (2023). A numeric-based machine learning design for detecting organized retail fraud in digital marketplaces. Scientific Reports, 13(1), 1-16. [12499]. https://doi.org/10.1038/s41598-023-38304-5Organized retail crime (ORC) is a significant issue for retailers, marketplace platforms, and consumers. Its prevalence and influence have increased fast in lockstep with the expansion of online commerce, digital devices, and communication platforms. Today, it is a costly affair, wreaking havoc on enterprises’ overall revenues and continually jeopardizing community security. These negative consequences are set to rocket to unprecedented heights as more people and devices connect to the Internet. Detecting and responding to these terrible acts as early as possible is critical for protecting consumers and businesses while also keeping an eye on rising patterns and fraud. The issue of detecting fraud in general has been studied widely, especially in financial services, but studies focusing on organized retail crimes are extremely rare in literature. To contribute to the knowledge base in this area, we present a scalable machine learning strategy for detecting and isolating ORC listings on a prominent marketplace platform by merchants committing organized retail crimes or fraud. We employ a supervised learning approach to classify postings as fraudulent or real based on past data from buyer and seller behaviors and transactions on the platform. The proposed framework combines bespoke data preprocessing procedures, feature selection methods, and state-of-the-art class asymmetry resolution techniques to search for aligned classification algorithms capable of discriminating between fraudulent and legitimate listings in this context. Our best detection model obtains a recall score of 0.97 on the holdout set and 0.94 on the out-of-sample testing data set. We achieve these results based on a select set of 45 features out of 58.publishersversionpublishe
    corecore