103 research outputs found
Deep learning in phishing mitigation: a uniform resource locator-based predictive model
To mitigate the evolution of phish websites, various phishing prediction8 schemes are being optimized eventually. However, the optimized methods produce gratuitous performance overhead due to the limited exploration of advanced phishing cues. Thus, a phishing uniform resource locator-based predictive model is enhanced by this work to defeat this deficiency using deep learning algorithms. This model’s architecture encompasses pre-processing of the effective feature space that is made up of 60 mutual uniform resource locator (URL) phishing features, and a dual deep learning-based model of convolution neural network with bi-directional long short-term memory (CNN-BiLSTM). The proposed predictive model is trained and tested on a dataset of 14,000 phish URLs and 28,074 legitimate URLs. Experimentally, the performance outputs are remarked with a 0.01% false positive rate (FPR) and 99.27% testing accuracy
A Proposal for a European Cybersecurity Taxonomy
The Commission made a commitment in the Communication adopted in September 2018 (COM(2018) 630 final) to launch a pilot phase under Horizon 2020 to help bring national cybersecurity centres together into a network. In this context, the goal of this document is that of aligning the cybersecurity terminologies, definitions and domains into a coherent and comprehensive taxonomy to facilitate the categorisation of EU cybersecurity competencies.JRC.E.3-Cyber and Digital Citizens' Securit
A Superficial Analysis Approach for Identifying Malicious Domain Names Generated by DGA Malware
Some of the most serious security threats facing computer networks involve malware. To prevent malware-related damage, administrators must swiftly identify and remove the infected machines that may reside in their networks. However, many malware families have domain generation algorithms (DGAs) to avoid detection. A DGA is a technique in which the domain name is changed frequently to hide the callback communication from the infected machine to the command-and-control server. In this article, we propose an approach for estimating the randomness of domain names by superficially analyzing their character strings. This approach is based on the following observations: human-generated benign domain names tend to reflect the intent of their domain registrants, such as an organization, product, or content. In contrast, dynamically generated malicious domain names consist of meaningless character strings because conflicts with already registered domain names must be avoided; hence, there are discernible differences in the strings of dynamically generated and human-generated domain names. Notably, our approach does not require any prior knowledge about DGAs. Our evaluation indicates that the proposed approach is capable of achieving recall and precision as high as 0.9960 and 0.9029, respectively, when used with labeled datasets. Additionally, this approach has proven to be highly effective for datasets collected via a campus network. Thus, these results suggest that malware-infected machines can be swiftly identified and removed from networks using DNS queries for detected malicious domains as triggers
Advances in Cybercrime Prediction: A Survey of Machine, Deep, Transfer, and Adaptive Learning Techniques
Cybercrime is a growing threat to organizations and individuals worldwide,
with criminals using increasingly sophisticated techniques to breach security
systems and steal sensitive data. In recent years, machine learning, deep
learning, and transfer learning techniques have emerged as promising tools for
predicting cybercrime and preventing it before it occurs. This paper aims to
provide a comprehensive survey of the latest advancements in cybercrime
prediction using above mentioned techniques, highlighting the latest research
related to each approach. For this purpose, we reviewed more than 150 research
articles and discussed around 50 most recent and relevant research articles. We
start the review by discussing some common methods used by cyber criminals and
then focus on the latest machine learning techniques and deep learning
techniques, such as recurrent and convolutional neural networks, which were
effective in detecting anomalous behavior and identifying potential threats. We
also discuss transfer learning, which allows models trained on one dataset to
be adapted for use on another dataset, and then focus on active and
reinforcement Learning as part of early-stage algorithmic research in
cybercrime prediction. Finally, we discuss critical innovations, research gaps,
and future research opportunities in Cybercrime prediction. Overall, this paper
presents a holistic view of cutting-edge developments in cybercrime prediction,
shedding light on the strengths and limitations of each method and equipping
researchers and practitioners with essential insights, publicly available
datasets, and resources necessary to develop efficient cybercrime prediction
systems.Comment: 27 Pages, 6 Figures, 4 Table
Explainable, Domain-Adaptive, and Federated Artificial Intelligence in Medicine
Artificial intelligence (AI) continues to transform data analysis in many
domains. Progress in each domain is driven by a growing body of annotated data,
increased computational resources, and technological innovations. In medicine,
the sensitivity of the data, the complexity of the tasks, the potentially high
stakes, and a requirement of accountability give rise to a particular set of
challenges. In this review, we focus on three key methodological approaches
that address some of the particular challenges in AI-driven medical decision
making. (1) Explainable AI aims to produce a human-interpretable justification
for each output. Such models increase confidence if the results appear
plausible and match the clinicians expectations. However, the absence of a
plausible explanation does not imply an inaccurate model. Especially in highly
non-linear, complex models that are tuned to maximize accuracy, such
interpretable representations only reflect a small portion of the
justification. (2) Domain adaptation and transfer learning enable AI models to
be trained and applied across multiple domains. For example, a classification
task based on images acquired on different acquisition hardware. (3) Federated
learning enables learning large-scale models without exposing sensitive
personal health information. Unlike centralized AI learning, where the
centralized learning machine has access to the entire training data, the
federated learning process iteratively updates models across multiple sites by
exchanging only parameter updates, not personal health data. This narrative
review covers the basic concepts, highlights relevant corner-stone and
state-of-the-art research in the field, and discusses perspectives.Comment: This paper is accepted in IEEE CAA Journal of Automatica Sinica, Nov.
10 202
ALMA: ALgorithm Modeling Application
As of today, the most recent trend in information technology is the employment of large-scale data analytic methods powered by Artificial Intelligence (AI), influencing the priorities of businesses and research centers all over the world. However, due to both the lack of specialized talent and the need for greater compute, less established businesses struggle to adopt such endeavors, with major technological mega-corporations such as Microsoft, Facebook and Google taking the upper hand in this uneven playing field. Therefore, in an attempt to promote the democratization of AI and increase the efficiency of data scientists, this work proposes a novel no-code/low-code AI platform: the ALgorithm Modeling Application (ALMA). Moreover, as the state of the art of such platforms is still gradually maturing, current solutions often fail into encompassing security/safety aspects directly into their process. In that respect, the solution proposed in this thesis aims not only to achieve greater development and deployment efficiency while building machine learning applications but also to build upon others by addressing the inherent pitfalls of AI through a ”secure by design” philosophy.Atualmente, a tendência mais recente no domínio das tecnologias de informação e a utilização de métodos de análise de dados baseados em Inteligência Artificial (IA), influenciando as prioridades das empresas e centros de investigação de todo o mundo. No entanto, devido à falta de talento especializado no mercado e a necessidade de obter equipamentos com maior capacidade de computação, negócios menos estabelecidos têm maiores dificuldades em realizar esse tipo de investimentos quando comparados a grandes empresas tecnológicas como a Microsoft, o Facebook e a Google. Deste modo, na tentativa de promover a democratização da IA e aumentar a eficiência dos cientistas de dados, este trabalho propõe uma nova plataforma de no-code/low- code: “THe Algorithm Modeling Application” (ALMA). Por outro lado, e visto que a maioria das soluções atuais falham em abranger aspetos de segurança relativos ˜ a IA diretamente no seu processo, a solução proposta nesta tese visa não só alcançar maior eficiência na construção de soluções baseadas em IA, mas também abordar as questões de segurança implícitas ao seu uso
Interest-disclosing Mechanisms for Advertising are Privacy-Exposing (not Preserving)
Today, targeted online advertising relies on unique identifiers assigned to
users through third-party cookies--a practice at odds with user privacy. While
the web and advertising communities have proposed interest-disclosing
mechanisms, including Google's Topics API, as solutions, an independent
analysis of these proposals in realistic scenarios has yet to be performed. In
this paper, we attempt to validate the privacy (i.e., preventing unique
identification) and utility (i.e., enabling ad targeting) claims of Google's
Topics proposal in the context of realistic user behavior. Through new
statistical models of the distribution of user behaviors and resulting
targeting topics, we analyze the capabilities of malicious advertisers
observing users over time and colluding with other third parties. Our analysis
shows that even in the best case, individual users' identification across sites
is possible, as 0.4% of the 250k users we simulate are re-identified. These
guarantees weaken further over time and when advertisers collude: 57% of users
are uniquely re-identified after 15 weeks of browsing, increasing to 75% after
30 weeks. While measuring that the Topics API provides moderate utility, we
also find that advertisers and publishers can abuse the Topics API to
potentially assign unique identifiers to users, defeating the desired privacy
guarantees. As a result, the inherent diversity of users' interests on the web
is directly at odds with the privacy objectives of interest-disclosing
mechanisms; we discuss how any replacement of third-party cookies may have to
seek other avenues to achieve privacy for the web
Few-shot Class-incremental Learning: A Survey
Few-shot Class-Incremental Learning (FSCIL) presents a unique challenge in
machine learning, as it necessitates the continuous learning of new classes
from sparse labeled training samples without forgetting previous knowledge.
While this field has seen recent progress, it remains an active area of
exploration. This paper aims to provide a comprehensive and systematic review
of FSCIL. In our in-depth examination, we delve into various facets of FSCIL,
encompassing the problem definition, the discussion of primary challenges of
unreliable empirical risk minimization and the stability-plasticity dilemma,
general schemes, and relevant problems of incremental learning and few-shot
learning. Besides, we offer an overview of benchmark datasets and evaluation
metrics. Furthermore, we introduce the classification methods in FSCIL from
data-based, structure-based, and optimization-based approaches and the object
detection methods in FSCIL from anchor-free and anchor-based approaches. Beyond
these, we illuminate several promising research directions within FSCIL that
merit further investigation
- …