4,230 research outputs found
Identification of potential malicious web pages
Malicious web pages are an emerging security concern on the Internet due to their popularity and their potential serious impact. Detecting and analysing them are very costly because of their qualities and complexities. In this paper, we present a lightweight scoring mechanism that uses static features to identify potential malicious pages. This mechanism is intended as a filter that allows us to reduce the number suspicious web pages requiring more expensive analysis by other mechanisms that require loading and interpretation of the web pages to determine whether they are malicious or benign. Given its role as a filter, our main aim is to reduce false positives while minimising false negatives. The scoring mechanism has been developed by identifying candidate static features of malicious web pages that are evaluate using a feature selection algorithm. This identifies the most appropriate set of features that can be used to efficiently distinguish between benign and malicious web pages. These features are used to construct a scoring algorithm that allows us to calculate a score for a web page's potential maliciousness. The main advantage of this scoring mechanism compared to a binary classifier is the ability to make a trade-off between accuracy and performance. This allows us to adjust the number of web pages passed to the more expensive analysis mechanism in order to tune overall performance
Hybrid features-based prediction for novel phish websites
Phishers frequently craft novel deceptions on their websites and circumvent existing anti-phishing techniques for insecure intrusions, users’ digital identity theft, and then illegal profits. This raises the needs to incorporate new features for detecting novel phish websites and optimizing the existing anti-phishing techniques. In this light, 58 new hybrid features were proposed in this paper and their prediction susceptibilities were evaluated by using feature co-occurrence criterion and a baseline machine learning algorithm. Empirical test and analysis showed the significant outcomes of the proposed features on detection performance. As a result, the most influential features are identified, and new insights are offered for further detection improvement
xFraud: Explainable Fraud Transaction Detection
At online retail platforms, it is crucial to actively detect the risks of
transactions to improve customer experience and minimize financial loss. In
this work, we propose xFraud, an explainable fraud transaction prediction
framework which is mainly composed of a detector and an explainer. The xFraud
detector can effectively and efficiently predict the legitimacy of incoming
transactions. Specifically, it utilizes a heterogeneous graph neural network to
learn expressive representations from the informative heterogeneously typed
entities in the transaction logs. The explainer in xFraud can generate
meaningful and human-understandable explanations from graphs to facilitate
further processes in the business unit. In our experiments with xFraud on real
transaction networks with up to 1.1 billion nodes and 3.7 billion edges, xFraud
is able to outperform various baseline models in many evaluation metrics while
remaining scalable in distributed settings. In addition, we show that xFraud
explainer can generate reasonable explanations to significantly assist the
business analysis via both quantitative and qualitative evaluations.Comment: This is the extended version of a full paper to appear in PVLDB 15
(3) (VLDB 2022
Misinformation Containment Using NLP and Machine Learning: Why the Problem Is Still Unsolved
Despite the increased attention and substantial research into it claiming outstanding successes, the problem of misinformation containment has only been growing in the recent years with not many signs of respite. Misinformation is rapidly changing its latent characteristics and spreading vigorously in a multi-modal fashion, sometimes in a more damaging manner than viruses and other malicious programs on the internet. This chapter examines the existing research in natural language processing and machine learning to stop the spread of misinformation, analyzes why the research has not been practical enough to be incorporated into social media platforms, and provides future research directions. The state-of-the-art feature engineering, approaches, and algorithms used for the problem are expounded in the process
Big Data and the Internet of Things
Advances in sensing and computing capabilities are making it possible to
embed increasing computing power in small devices. This has enabled the sensing
devices not just to passively capture data at very high resolution but also to
take sophisticated actions in response. Combined with advances in
communication, this is resulting in an ecosystem of highly interconnected
devices referred to as the Internet of Things - IoT. In conjunction, the
advances in machine learning have allowed building models on this ever
increasing amounts of data. Consequently, devices all the way from heavy assets
such as aircraft engines to wearables such as health monitors can all now not
only generate massive amounts of data but can draw back on aggregate analytics
to "improve" their performance over time. Big data analytics has been
identified as a key enabler for the IoT. In this chapter, we discuss various
avenues of the IoT where big data analytics either is already making a
significant impact or is on the cusp of doing so. We also discuss social
implications and areas of concern.Comment: 33 pages. draft of upcoming book chapter in Japkowicz and Stefanowski
(eds.) Big Data Analysis: New algorithms for a new society, Springer Series
on Studies in Big Data, to appea
Privacy, security, and trust issues in smart environments
Recent advances in networking, handheld computing and sensor technologies have driven forward research towards the realisation of Mark Weiser's dream of calm and ubiquitous computing (variously called pervasive computing, ambient computing, active spaces, the disappearing computer or context-aware computing). In turn, this has led to the emergence of smart environments as one significant facet of research in this domain. A smart environment, or space, is a region of the real world that is extensively equipped with sensors, actuators and computing components [1]. In effect the smart space becomes a part of a larger information system: with all actions within the space potentially affecting the underlying computer applications, which may themselves affect the space through the actuators. Such smart environments have tremendous potential within many application areas to improve the utility of a space. Consider the potential offered by a smart environment that prolongs the time an elderly or infirm person can live an independent life or the potential offered by a smart environment that supports vicarious learning
- …