8 research outputs found
Geographic information extraction from texts
A large volume of unstructured texts, containing valuable geographic information, is available online. This information – provided implicitly or explicitly – is useful not only for scientific studies (e.g., spatial humanities) but also for many practical applications (e.g., geographic information retrieval). Although large progress has been achieved in geographic information extraction from texts, there are still unsolved challenges and issues, ranging from methods, systems, and data, to applications and privacy. Therefore, this workshop will provide a timely opportunity to discuss the recent advances, new ideas, and concepts but also identify research gaps in geographic information extraction
Jornadas Nacionales de Investigación en Ciberseguridad: actas de las VIII Jornadas Nacionales de Investigación en ciberseguridad: Vigo, 21 a 23 de junio de 2023
Jornadas Nacionales de Investigación en Ciberseguridad (8ª. 2023. Vigo)atlanTTicAMTEGA: Axencia para a modernización tecnolóxica de GaliciaINCIBE: Instituto Nacional de Cibersegurida
Modeling Deception for Cyber Security
In the era of software-intensive, smart and connected systems, the growing power and so-
phistication of cyber attacks poses increasing challenges to software security. The reactive
posture of traditional security mechanisms, such as anti-virus and intrusion detection
systems, has not been sufficient to combat a wide range of advanced persistent threats
that currently jeopardize systems operation. To mitigate these extant threats, more ac-
tive defensive approaches are necessary. Such approaches rely on the concept of actively
hindering and deceiving attackers. Deceptive techniques allow for additional defense by
thwarting attackers’ advances through the manipulation of their perceptions. Manipu-
lation is achieved through the use of deceitful responses, feints, misdirection, and other
falsehoods in a system. Of course, such deception mechanisms may result in side-effects
that must be handled. Current methods for planning deception chiefly portray attempts
to bridge military deception to cyber deception, providing only high-level instructions
that largely ignore deception as part of the software security development life cycle. Con-
sequently, little practical guidance is provided on how to engineering deception-based
techniques for defense. This PhD thesis contributes with a systematic approach to specify
and design cyber deception requirements, tactics, and strategies. This deception approach
consists of (i) a multi-paradigm modeling for representing deception requirements, tac-
tics, and strategies, (ii) a reference architecture to support the integration of deception
strategies into system operation, and (iii) a method to guide engineers in deception mod-
eling. A tool prototype, a case study, and an experimental evaluation show encouraging
results for the application of the approach in practice. Finally, a conceptual coverage map-
ping was developed to assess the expressivity of the deception modeling language created.Na era digital o crescente poder e sofisticação dos ataques cibernéticos apresenta constan-
tes desafios para a segurança do software. A postura reativa dos mecanismos tradicionais
de segurança, como os sistemas antivírus e de detecção de intrusão, não têm sido suficien-
tes para combater a ampla gama de ameaças que comprometem a operação dos sistemas
de software actuais. Para mitigar estas ameaças são necessárias abordagens ativas de
defesa. Tais abordagens baseiam-se na ideia de adicionar mecanismos para enganar os
adversários (do inglês deception). As técnicas de enganação (em português, "ato ou efeito
de enganar, de induzir em erro; artimanha usada para iludir") contribuem para a defesa
frustrando o avanço dos atacantes por manipulação das suas perceções. A manipula-
ção é conseguida através de respostas enganadoras, de "fintas", ou indicações erróneas
e outras falsidades adicionadas intencionalmente num sistema. É claro que esses meca-
nismos de enganação podem resultar em efeitos colaterais que devem ser tratados. Os
métodos atuais usados para enganar um atacante inspiram-se fundamentalmente nas
técnicas da área militar, fornecendo apenas instruções de alto nível que ignoram, em
grande parte, a enganação como parte do ciclo de vida do desenvolvimento de software
seguro. Consequentemente, há poucas referências práticas em como gerar técnicas de
defesa baseadas em enganação. Esta tese de doutoramento contribui com uma aborda-
gem sistemática para especificar e desenhar requisitos, táticas e estratégias de enganação
cibernéticas. Esta abordagem é composta por (i) uma modelação multi-paradigma para re-
presentar requisitos, táticas e estratégias de enganação, (ii) uma arquitetura de referência
para apoiar a integração de estratégias de enganação na operação dum sistema, e (iii) um
método para orientar os engenheiros na modelação de enganação. Uma ferramenta protó-
tipo, um estudo de caso e uma avaliação experimental mostram resultados encorajadores
para a aplicação da abordagem na prática. Finalmente, a expressividade da linguagem
de modelação de enganação é avaliada por um mapeamento de cobertura de conceitos
Three Decades of Deception Techniques in Active Cyber Defense -- Retrospect and Outlook
Deception techniques have been widely seen as a game changer in cyber
defense. In this paper, we review representative techniques in honeypots,
honeytokens, and moving target defense, spanning from the late 1980s to the
year 2021. Techniques from these three domains complement with each other and
may be leveraged to build a holistic deception based defense. However, to the
best of our knowledge, there has not been a work that provides a systematic
retrospect of these three domains all together and investigates their
integrated usage for orchestrated deceptions. Our paper aims to fill this gap.
By utilizing a tailored cyber kill chain model which can reflect the current
threat landscape and a four-layer deception stack, a two-dimensional taxonomy
is developed, based on which the deception techniques are classified. The
taxonomy literally answers which phases of a cyber attack campaign the
techniques can disrupt and which layers of the deception stack they belong to.
Cyber defenders may use the taxonomy as a reference to design an organized and
comprehensive deception plan, or to prioritize deception efforts for a budget
conscious solution. We also discuss two important points for achieving active
and resilient cyber defense, namely deception in depth and deception lifecycle,
where several notable proposals are illustrated. Finally, some outlooks on
future research directions are presented, including dynamic integration of
different deception techniques, quantified deception effects and deception
operation cost, hardware-supported deception techniques, as well as techniques
developed based on better understanding of the human element.Comment: 19 page
Determining Unique Agents by Evaluating Web Form Interaction
Because of the inherent risks in today’s online activities, it becomes imperative to identify a malicious user masquerading as someone else. Incorporating biometric analysis enhances the confidence of authenticating valid users over the Internet while providing additional layers of security with no hindrance to the end user. Through the analysis of traffic patterns and HTTP Header analysis, the detection and early refusal of robot agents plays a great role in reducing fraudulent login attempts
Denial of Service in Web-Domains: Building Defenses Against Next-Generation Attack Behavior
The existing state-of-the-art in the field of application layer Distributed Denial of Service (DDoS) protection is generally designed, and thus effective, only for static web domains. To the best of our knowledge, our work is the first that studies the problem of application layer DDoS defense in web domains of dynamic content and organization, and for next-generation bot behaviour. In the first part of this thesis, we focus on the following research tasks: 1) we identify the main weaknesses of the existing application-layer anti-DDoS solutions as proposed in research literature and in the industry, 2) we obtain a comprehensive picture of the current-day as well as the next-generation application-layer attack behaviour and 3) we propose novel techniques, based on a multidisciplinary approach that combines offline machine learning algorithms and statistical analysis, for detection of suspicious web visitors in static web domains. Then, in the second part of the thesis, we propose and evaluate a novel anti-DDoS system that detects a broad range of application-layer DDoS attacks, both in static and dynamic web domains, through the use of advanced techniques of data mining. The key advantage of our system relative to other systems that resort to the use of challenge-response tests (such as CAPTCHAs) in combating malicious bots is that our system minimizes the number of these tests that are presented to valid human visitors while succeeding in preventing most malicious attackers from accessing the web site. The results of the experimental evaluation of the proposed system demonstrate effective detection of current and future variants of application layer DDoS attacks
Recommended from our members
Using design diversity and optimal adjudication for detecting malicious web scraping and malware samples
Due to the constantly evolving nature of cyber threats and attacks, organisations see an ever-growing requirement to develop more sophisticated defence systems to protect their networks and information. In an arms race such as this, employing as many techniques as possible is crucial for companies to stay ahead of would-be attackers.
Design diversity is a technique with a significant history behind it, which has become more widely used as the availability of off-the-shelf defence software has become more commonplace. The simple concept behind design diversity is the age-old saying that ”two minds think better than one”. When combining multiple tools for cyber defence, it’s reasonable to expect that when these tools use different techniques, or work under different assumptions and configurations, they would also likely detect different threats. Hence, the security events that one tool misses or misclassifies, the other could correctly handle, and vice-versa. We would expect design diversity to remain an important design paradigm for as long as building a completely foolproof security system stays within the realms of impossibility.
While design diversity is appealing, and it has been used to great success in the past, it is important to realise that any possible gains from using this technique are entirely dependent on how diverse the various tools are, and on the context in which it is applied. Applying design diversity will yield different results in different environments, so it is important that empirical results are provided in as many contexts as possible.
In this work, we have looked at the use of design diversity in two major contexts. The first context deals with the question of detecting malicious web scraping activity. We have analysed three separate datasets provided to us by Amadeus - a global provider for the travel and tourism industry - which contain the HTTP traffic they observed within their network, as well as the alerts raised by two of their web scraping detectors. We studied how the combined performance potential of the two tools compares to their individual performances, in 1-out-of-2 (1oo2) and 2-out-of-2 (2oo2) adjudication schemes, meaning that a combined system raises an alert if any one of the internal tools does so as well, or the combined system only raises an alert if both internal tools do so, respectively. We’ve also identified several aspects that highlight the different alert patterns of both tools, which we use to explain the inherent diversity between the two.
The second context in which we have studied the use of design diversity is in the use of machine learning models for the classification of malware and benign software samples. We’ve done this with the use of a dataset that looked at the performance of 37 different RNN machine learning models used to classify a pool of over 4000 software samples, which originated from a previously published paper whose authors we have collaborated with. With the higher number and degree of diversity of the detection tools (the machine learning models) in this study, we were able to expand our results with additional adjudication schemes, anywhere between 1oo10 and 10oo10, as well as more interesting schemes, such as simple majority schemes, e.g., 3oo5. Similarly to the first body of work, we studied and summarised the different aspects that led to diversity in the behaviour of machine learning models.
When utilising multiple diverse systems, each producing a result, a voting or adjudication system is needed to decide on the overall system output/decision. The use of conventional adjudications schemes (e.g., 1-out-of-2) provides a useful first point for the use of design diversity, but these schemes may be deficient in comparison with others such as those that use optimal adjudication. As opposed to conventional adjudication schemes, where the individual outputs of each internal tool are not taken into account - i.e., a 1oo2 scheme does not care which one of its two internal tools raised an alert, only that one of them did - this is not the case for optimal adjudication. In optimal adjudication, the combined outputs of all the internal tools are called syndromes, and the output of the overall system is going to be dependent on which unique syndrome was generated for any given classification sample. This affords us several benefits, which we will detail in depth, primarily that specific tools can be given higher confidence over others, and that the error cost of generating false positive or false negative outputs can be taken into account when deciding the output of the overall system, such that we can optimise for the lowest error cost overall.
We have looked at the use of optimal adjudication in particular with our second dataset concerning the use of machine learning models for the classification of software samples. We expand on the benefits afforded over using conventional adjudication schemes, and delve into the aspects that make the various machine learning models diverse from one another.
We expect the results from this thesis will provide insight into the use of different adjudication schemes in the contexts we highlight (contexts in which, to the best of our knowledge, previous research has not been published), as well as provide guidance on the creation of such combined systems for use in security deployments beyond the contexts we have studied
Using Novel Image-based Interactional Proofs and Source Randomization for Prevention of Web Bots
This work presents our efforts on preventing the web bots to illegitimately access web resources. As the first technique, we present SEMAGE (SEmantically MAtching imaGEs), a new image-based CAPTCHA that capitalizes on the human ability to define and comprehend image content and to establish semantic relationships between them. As the second technique, we present NOID - a "NOn-Intrusive Web Bot Defense system" that aims at creating a three tiered defence system against web automation programs or web bots. NOID is a server side technique and prevents the web bots from accessing web resources by inherently hiding the HTML elements of interest by randomization and obfuscation in the HTML responses.
A SEMAGE challenge asks a user to select semantically related images from a given image set. SEMAGE has a two-factor design where in order to pass a challenge the user is required to figure out the content of each image and then understand and identify semantic relationship between a subset of them. Most of the current state-of-the-art image-based systems like Assira only require the user to solve the first level, i.e., image recognition. Utilizing the semantic correlation between images to create more secure and user-friendly challenges makes SEMAGE novel. SEMAGE does not suffer from limitations of traditional image-based approaches such as lacking customization and adaptability. SEMAGE unlike the current Text based systems is also very user friendly with a high fun factor. We conduct a first of its kind large-scale user study involving 174 users to gauge and compare accuracy and usability of SEMAGE with existing state-of-the-art CAPTCHA systems like reCAPTCHA (text-based) and Asirra (image-based). The user study further reinstates our points and shows that users achieve high accuracy using our system and consider our system to be fun and easy.
We also design a novel server-side and non-intrusive web bot defense system, NOID, to prevent web bots from accessing web resources by inherently hiding and randomizing HTML elements. Specifically, to prevent web bots uniquely identifying HTML elements for later automation, NOID randomizes name/id parameter values of essential HTML elements such as "input textbox", "textarea" and "submit button" in each HTTP form page. In addition, to prevent powerful web bots from identifying special user-action HTML elements by analyzing the content of their accompanied "label text" HTML tags, we enhance NOID by adding a component, Label Concealer, which hides label indicators by replacing "label text" HTML tags with randomized images. To further prevent more powerful web bots identifying HTML elements by recognizing their relative positions or surrounding elements in the web pages, we enhance NOID by adding another component, Element Trapper, which obfuscates important HTML elements' surroundings by adding decoy elements without compromising usability.
We evaluate NOID against five powerful state-of-the-art web bots including XRumer, SENuke, Magic Submitter, Comment Blaster, and UWCS on several popular open source
web platforms including phpBB, Simple Machine Forums (SMF), and Wordpress. According to our evaluation, NOID can prevent all these web bots automatically sending spam on these web platforms with reasonable overhead