116 research outputs found
Recommended from our members
Using Diverse Detectors for Detecting Malicious Web Scraping Activity
We present ongoing work about how the use of diverse tools may help with detecting malicious web scraping behavior. We use a real dataset of Apache HTTP Access logs for an e-commerce application provided by Amadeus, a large multinational IT provider for the global travel and tourism industry. Two tools have been used to detect scraping activities based on the HTTP requests: a commercial tool, and an in-house tool called Arcane. Preliminary results suggest there is considerable diversity in alerting behavior of these tools
Recommended from our members
Detecting Malicious Web Scraping Activity: a Study with Diverse Detectors
We present results on the use of diverse monitoring tools for the detection of malicious web scraping activity. We have carried out an analysis of a real dataset of Apache HTTP Access logs for an e-commerce application provided by a large multinational IT provider for the global travel and tourism industry. Two tools have been used to detect scraping activities based on the HTTP requests: a commercial tool, and an in-house tool called Arcane. We show the benefits that can be achieved through the use of both systems, in terms of overall sensitivity and specificity, and we discuss the potential sources of diversity between the tool’s alert patterns
Recommended from our members
Using design diversity and optimal adjudication for detecting malicious web scraping and malware samples
Due to the constantly evolving nature of cyber threats and attacks, organisations see an ever-growing requirement to develop more sophisticated defence systems to protect their networks and information. In an arms race such as this, employing as many techniques as possible is crucial for companies to stay ahead of would-be attackers.
Design diversity is a technique with a significant history behind it, which has become more widely used as the availability of off-the-shelf defence software has become more commonplace. The simple concept behind design diversity is the age-old saying that ”two minds think better than one”. When combining multiple tools for cyber defence, it’s reasonable to expect that when these tools use different techniques, or work under different assumptions and configurations, they would also likely detect different threats. Hence, the security events that one tool misses or misclassifies, the other could correctly handle, and vice-versa. We would expect design diversity to remain an important design paradigm for as long as building a completely foolproof security system stays within the realms of impossibility.
While design diversity is appealing, and it has been used to great success in the past, it is important to realise that any possible gains from using this technique are entirely dependent on how diverse the various tools are, and on the context in which it is applied. Applying design diversity will yield different results in different environments, so it is important that empirical results are provided in as many contexts as possible.
In this work, we have looked at the use of design diversity in two major contexts. The first context deals with the question of detecting malicious web scraping activity. We have analysed three separate datasets provided to us by Amadeus - a global provider for the travel and tourism industry - which contain the HTTP traffic they observed within their network, as well as the alerts raised by two of their web scraping detectors. We studied how the combined performance potential of the two tools compares to their individual performances, in 1-out-of-2 (1oo2) and 2-out-of-2 (2oo2) adjudication schemes, meaning that a combined system raises an alert if any one of the internal tools does so as well, or the combined system only raises an alert if both internal tools do so, respectively. We’ve also identified several aspects that highlight the different alert patterns of both tools, which we use to explain the inherent diversity between the two.
The second context in which we have studied the use of design diversity is in the use of machine learning models for the classification of malware and benign software samples. We’ve done this with the use of a dataset that looked at the performance of 37 different RNN machine learning models used to classify a pool of over 4000 software samples, which originated from a previously published paper whose authors we have collaborated with. With the higher number and degree of diversity of the detection tools (the machine learning models) in this study, we were able to expand our results with additional adjudication schemes, anywhere between 1oo10 and 10oo10, as well as more interesting schemes, such as simple majority schemes, e.g., 3oo5. Similarly to the first body of work, we studied and summarised the different aspects that led to diversity in the behaviour of machine learning models.
When utilising multiple diverse systems, each producing a result, a voting or adjudication system is needed to decide on the overall system output/decision. The use of conventional adjudications schemes (e.g., 1-out-of-2) provides a useful first point for the use of design diversity, but these schemes may be deficient in comparison with others such as those that use optimal adjudication. As opposed to conventional adjudication schemes, where the individual outputs of each internal tool are not taken into account - i.e., a 1oo2 scheme does not care which one of its two internal tools raised an alert, only that one of them did - this is not the case for optimal adjudication. In optimal adjudication, the combined outputs of all the internal tools are called syndromes, and the output of the overall system is going to be dependent on which unique syndrome was generated for any given classification sample. This affords us several benefits, which we will detail in depth, primarily that specific tools can be given higher confidence over others, and that the error cost of generating false positive or false negative outputs can be taken into account when deciding the output of the overall system, such that we can optimise for the lowest error cost overall.
We have looked at the use of optimal adjudication in particular with our second dataset concerning the use of machine learning models for the classification of software samples. We expand on the benefits afforded over using conventional adjudication schemes, and delve into the aspects that make the various machine learning models diverse from one another.
We expect the results from this thesis will provide insight into the use of different adjudication schemes in the contexts we highlight (contexts in which, to the best of our knowledge, previous research has not been published), as well as provide guidance on the creation of such combined systems for use in security deployments beyond the contexts we have studied
Identifying and Mitigating the Security Risks of Generative AI
Every major technical invention resurfaces the dual-use dilemma -- the new
technology has the potential to be used for good as well as for harm.
Generative AI (GenAI) techniques, such as large language models (LLMs) and
diffusion models, have shown remarkable capabilities (e.g., in-context
learning, code-completion, and text-to-image generation and editing). However,
GenAI can be used just as well by attackers to generate new attacks and
increase the velocity and efficacy of existing attacks.
This paper reports the findings of a workshop held at Google (co-organized by
Stanford University and the University of Wisconsin-Madison) on the dual-use
dilemma posed by GenAI. This paper is not meant to be comprehensive, but is
rather an attempt to synthesize some of the interesting findings from the
workshop. We discuss short-term and long-term goals for the community on this
topic. We hope this paper provides both a launching point for a discussion on
this important topic as well as interesting problems that the research
community can work to address
AI Knowledge Transfer from the University to Society
AI Knowledge Transfer from the University to Society: Applications in High-Impact Sectors brings together examples from the "Innovative Ecosystem with Artificial Intelligence for Andalusia 2025" project at the University of Seville, a series of sub-projects composed of research groups and different institutions or companies that explore the use of Artificial Intelligence in a variety of high-impact sectors to lead innovation and assist in decision-making. Key Features Includes chapters on health and social welfare, transportation, digital economy, energy efficiency and sustainability, agro-industry, and tourism Great diversity of authors, expert in varied sectors, belonging to powerful research groups from the University of Seville with proven experience in the transfer of knowledge to the productive sector and agents attached to the AndalucĂa TECH Campu
AI Knowledge Transfer from the University to Society
AI Knowledge Transfer from the University to Society: Applications in High-Impact Sectors brings together examples from the "Innovative Ecosystem with Artificial Intelligence for Andalusia 2025" project at the University of Seville, a series of sub-projects composed of research groups and different institutions or companies that explore the use of Artificial Intelligence in a variety of high-impact sectors to lead innovation and assist in decision-making. Key Features Includes chapters on health and social welfare, transportation, digital economy, energy efficiency and sustainability, agro-industry, and tourism Great diversity of authors, expert in varied sectors, belonging to powerful research groups from the University of Seville with proven experience in the transfer of knowledge to the productive sector and agents attached to the AndalucĂa TECH Campu
Understanding the Evolution of Android App Vulnerabilities
The Android ecosystem today is a growing universe of a few billion devices, hundreds of millions of users and millions of applications targeting a wide range of activities where sensitive information is collected and processed. Security of communication and privacy of data are thus of utmost importance in application development. Yet, regularly, there are reports of successful attacks targeting Android users. While some of those attacks exploit vulnerabilities in the Android OS, others directly concern application-level code written by a large pool of developers with varying experience. Recently, a number of studies have investigated this phenomenon, focusing however only on a specific vulnerability type appearing in apps, and based on only a snapshot of the situation at a given time. Thus, the community is still lacking comprehensive studies exploring how vulnerabilities have evolved over time, and how they evolve in a single app across developer updates. Our work fills this gap by leveraging a data stream of 5 million app packages to re-construct versioned lineages of Android apps and finally obtained 28;564 app lineages (i.e., successive releases of the same Android apps) with more than 10 app versions each, corresponding to a total of 465;037 apks. Based on these app lineages, we apply state-of-
the-art vulnerability-finding tools and investigate systematically the reports produced by each tool. In particular, we study which types of vulnerabilities are found, how they are introduced in the app code, where they are located, and whether they foreshadow malware. We provide insights based on the quantitative data as reported by the tools, but we further discuss the potential false positives. Our findings and study artifacts constitute a tangible knowledge to the community. It could be leveraged by developers to focus verification tasks, and by researchers to drive vulnerability discovery and repair research efforts
Backup To The Rescue: Automated Forensic Techniques For Advanced Website-Targeting Cyber Attacks
The last decade has seen a significant rise in non-technical users gaining a web presence, often via the easy-to-use functionalities of Content Management Systems (CMS). In fact, over 60% of the world’s websites run on CMSs. Unfortunately, this huge user population has made CMS-based websites a high-profile target for hackers. Worse still, the vast majority of the website hosting industry has shifted to a “backup and restore” model of security, which relies on error-prone AV scanners to prompt non-technical users to roll back to a pre-infection nightly snapshot. My cyber forensics research directly addresses this emergent problem by developing next-generation techniques for the investigation of advanced cyber crimes.
Driven by economic incentives, attackers abuse the trust in this economy: selling malware on legitimate marketplaces, pirating popular website plugins, and infecting websites post-deployment. Furthermore, attackers are exploiting these websites at scale by carelessly dropping thousands of obfuscated and packed malicious files on the webserver. This is counter-intuitive since attackers are assumed to be stealthy. Despite the rise in web attacks, efficiently locating and accurately analyzing the malware dropped on compromised webservers has remained an open research challenge.
This dissertation posits that the already collected webserver nightly backup snapshots contain all required information to enable automated and scalable detection of website compromises. This dissertation presents a web attack forensics framework that leverages program analysis to automatically understand the webserver’s nightly backup snapshots. This will enable the recovery of temporal phases of a webserver compromise and its origin within the website supply chain.Ph.D
An Approach For Detecting Online Dating Scams
Online dating scam has been rapidly increasing the internet’s rapid growth synchronically. However, there is no such tool that is available for the public to use it and prevent online dating scams. In this paper, techniques for scam detection in online dating websites profiles are described. A tool for automatically identifying fake profiles on dating websites such as e-Harmony, OkCupid, match.com is used in this paper. The web application generates a scam likelihood regarding the input profile’s description by using the scam action components.
Regarding National Public Radio’s news recently, online dating scams had an impact of 143 Million In Online Relationship Scams Last Year,” 2019). This number indicates the link between the number of users that use online dating websites and the number of scams on these websites. The primary purpose of this paper is creating public awareness and alerting users for whom they might be contacting online dating websites
- …