103 research outputs found

    Deep learning in phishing mitigation: a uniform resource locator-based predictive model

    Get PDF
    To mitigate the evolution of phish websites, various phishing prediction8 schemes are being optimized eventually. However, the optimized methods produce gratuitous performance overhead due to the limited exploration of advanced phishing cues. Thus, a phishing uniform resource locator-based predictive model is enhanced by this work to defeat this deficiency using deep learning algorithms. This model’s architecture encompasses pre-processing of the effective feature space that is made up of 60 mutual uniform resource locator (URL) phishing features, and a dual deep learning-based model of convolution neural network with bi-directional long short-term memory (CNN-BiLSTM). The proposed predictive model is trained and tested on a dataset of 14,000 phish URLs and 28,074 legitimate URLs. Experimentally, the performance outputs are remarked with a 0.01% false positive rate (FPR) and 99.27% testing accuracy

    A Proposal for a European Cybersecurity Taxonomy

    Get PDF
    The Commission made a commitment in the Communication adopted in September 2018 (COM(2018) 630 final) to launch a pilot phase under Horizon 2020 to help bring national cybersecurity centres together into a network. In this context, the goal of this document is that of aligning the cybersecurity terminologies, definitions and domains into a coherent and comprehensive taxonomy to facilitate the categorisation of EU cybersecurity competencies.JRC.E.3-Cyber and Digital Citizens' Securit

    A Superficial Analysis Approach for Identifying Malicious Domain Names Generated by DGA Malware

    Get PDF
    Some of the most serious security threats facing computer networks involve malware. To prevent malware-related damage, administrators must swiftly identify and remove the infected machines that may reside in their networks. However, many malware families have domain generation algorithms (DGAs) to avoid detection. A DGA is a technique in which the domain name is changed frequently to hide the callback communication from the infected machine to the command-and-control server. In this article, we propose an approach for estimating the randomness of domain names by superficially analyzing their character strings. This approach is based on the following observations: human-generated benign domain names tend to reflect the intent of their domain registrants, such as an organization, product, or content. In contrast, dynamically generated malicious domain names consist of meaningless character strings because conflicts with already registered domain names must be avoided; hence, there are discernible differences in the strings of dynamically generated and human-generated domain names. Notably, our approach does not require any prior knowledge about DGAs. Our evaluation indicates that the proposed approach is capable of achieving recall and precision as high as 0.9960 and 0.9029, respectively, when used with labeled datasets. Additionally, this approach has proven to be highly effective for datasets collected via a campus network. Thus, these results suggest that malware-infected machines can be swiftly identified and removed from networks using DNS queries for detected malicious domains as triggers

    Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques

    Full text link
    Cybercrime is a growing threat to organizations and individuals worldwide, with criminals using increasingly sophisticated techniques to breach security systems and steal sensitive data. In recent years, machine learning, deep learning, and transfer learning techniques have emerged as promising tools for predicting cybercrime and preventing it before it occurs. This paper aims to provide a comprehensive survey of the latest advancements in cybercrime prediction using above mentioned techniques, highlighting the latest research related to each approach. For this purpose, we reviewed more than 150 research articles and discussed around 50 most recent and relevant research articles. We start the review by discussing some common methods used by cyber criminals and then focus on the latest machine learning techniques and deep learning techniques, such as recurrent and convolutional neural networks, which were effective in detecting anomalous behavior and identifying potential threats. We also discuss transfer learning, which allows models trained on one dataset to be adapted for use on another dataset, and then focus on active and reinforcement Learning as part of early-stage algorithmic research in cybercrime prediction. Finally, we discuss critical innovations, research gaps, and future research opportunities in Cybercrime prediction. Overall, this paper presents a holistic view of cutting-edge developments in cybercrime prediction, shedding light on the strengths and limitations of each method and equipping researchers and practitioners with essential insights, publicly available datasets, and resources necessary to develop efficient cybercrime prediction systems.Comment: 27 Pages, 6 Figures, 4 Table

    Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine

    Full text link
    Artificial intelligence (AI) continues to transform data analysis in many domains. Progress in each domain is driven by a growing body of annotated data, increased computational resources, and technological innovations. In medicine, the sensitivity of the data, the complexity of the tasks, the potentially high stakes, and a requirement of accountability give rise to a particular set of challenges. In this review, we focus on three key methodological approaches that address some of the particular challenges in AI-driven medical decision making. (1) Explainable AI aims to produce a human-interpretable justification for each output. Such models increase confidence if the results appear plausible and match the clinicians expectations. However, the absence of a plausible explanation does not imply an inaccurate model. Especially in highly non-linear, complex models that are tuned to maximize accuracy, such interpretable representations only reflect a small portion of the justification. (2) Domain adaptation and transfer learning enable AI models to be trained and applied across multiple domains. For example, a classification task based on images acquired on different acquisition hardware. (3) Federated learning enables learning large-scale models without exposing sensitive personal health information. Unlike centralized AI learning, where the centralized learning machine has access to the entire training data, the federated learning process iteratively updates models across multiple sites by exchanging only parameter updates, not personal health data. This narrative review covers the basic concepts, highlights relevant corner-stone and state-of-the-art research in the field, and discusses perspectives.Comment: This paper is accepted in IEEE CAA Journal of Automatica Sinica, Nov. 10 202

    ALMA: ALgorithm Modeling Application

    Get PDF
    As of today, the most recent trend in information technology is the employment of large-scale data analytic methods powered by Artificial Intelligence (AI), influencing the priorities of businesses and research centers all over the world. However, due to both the lack of specialized talent and the need for greater compute, less established businesses struggle to adopt such endeavors, with major technological mega-corporations such as Microsoft, Facebook and Google taking the upper hand in this uneven playing field. Therefore, in an attempt to promote the democratization of AI and increase the efficiency of data scientists, this work proposes a novel no-code/low-code AI platform: the ALgorithm Modeling Application (ALMA). Moreover, as the state of the art of such platforms is still gradually maturing, current solutions often fail into encompassing security/safety aspects directly into their process. In that respect, the solution proposed in this thesis aims not only to achieve greater development and deployment efficiency while building machine learning applications but also to build upon others by addressing the inherent pitfalls of AI through a ”secure by design” philosophy.Atualmente, a tendência mais recente no domínio das tecnologias de informação e a utilização de métodos de análise de dados baseados em Inteligência Artificial (IA), influenciando as prioridades das empresas e centros de investigação de todo o mundo. No entanto, devido à falta de talento especializado no mercado e a necessidade de obter equipamentos com maior capacidade de computação, negócios menos estabelecidos têm maiores dificuldades em realizar esse tipo de investimentos quando comparados a grandes empresas tecnológicas como a Microsoft, o Facebook e a Google. Deste modo, na tentativa de promover a democratização da IA e aumentar a eficiência dos cientistas de dados, este trabalho propõe uma nova plataforma de no-code/low- code: “THe Algorithm Modeling Application” (ALMA). Por outro lado, e visto que a maioria das soluções atuais falham em abranger aspetos de segurança relativos ˜ a IA diretamente no seu processo, a solução proposta nesta tese visa não só alcançar maior eficiência na construção de soluções baseadas em IA, mas também abordar as questões de segurança implícitas ao seu uso

    Interest-disclosing Mechanisms for Advertising are Privacy-Exposing (not Preserving)

    Full text link
    Today, targeted online advertising relies on unique identifiers assigned to users through third-party cookies--a practice at odds with user privacy. While the web and advertising communities have proposed interest-disclosing mechanisms, including Google's Topics API, as solutions, an independent analysis of these proposals in realistic scenarios has yet to be performed. In this paper, we attempt to validate the privacy (i.e., preventing unique identification) and utility (i.e., enabling ad targeting) claims of Google's Topics proposal in the context of realistic user behavior. Through new statistical models of the distribution of user behaviors and resulting targeting topics, we analyze the capabilities of malicious advertisers observing users over time and colluding with other third parties. Our analysis shows that even in the best case, individual users' identification across sites is possible, as 0.4% of the 250k users we simulate are re-identified. These guarantees weaken further over time and when advertisers collude: 57% of users are uniquely re-identified after 15 weeks of browsing, increasing to 75% after 30 weeks. While measuring that the Topics API provides moderate utility, we also find that advertisers and publishers can abuse the Topics API to potentially assign unique identifiers to users, defeating the desired privacy guarantees. As a result, the inherent diversity of users' interests on the web is directly at odds with the privacy objectives of interest-disclosing mechanisms; we discuss how any replacement of third-party cookies may have to seek other avenues to achieve privacy for the web

    Few-shot Class-incremental Learning: A Survey

    Full text link
    Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in machine learning, as it necessitates the continuous learning of new classes from sparse labeled training samples without forgetting previous knowledge. While this field has seen recent progress, it remains an active area of exploration. This paper aims to provide a comprehensive and systematic review of FSCIL. In our in-depth examination, we delve into various facets of FSCIL, encompassing the problem definition, the discussion of primary challenges of unreliable empirical risk minimization and the stability-plasticity dilemma, general schemes, and relevant problems of incremental learning and few-shot learning. Besides, we offer an overview of benchmark datasets and evaluation metrics. Furthermore, we introduce the classification methods in FSCIL from data-based, structure-based, and optimization-based approaches and the object detection methods in FSCIL from anchor-free and anchor-based approaches. Beyond these, we illuminate several promising research directions within FSCIL that merit further investigation
    corecore