10 research outputs found
A Modular and Adaptive System for Business Email Compromise Detection
The growing sophistication of Business Email Compromise (BEC) and spear
phishing attacks poses significant challenges to organizations worldwide. The
techniques featured in traditional spam and phishing detection are insufficient
due to the tailored nature of modern BEC attacks as they often blend in with
the regular benign traffic. Recent advances in machine learning, particularly
in Natural Language Understanding (NLU), offer a promising avenue for combating
such attacks but in a practical system, due to limitations such as data
availability, operational costs, verdict explainability requirements or a need
to robustly evolve the system, it is essential to combine multiple approaches
together. We present CAPE, a comprehensive and efficient system for BEC
detection that has been proven in a production environment for a period of over
two years. Rather than being a single model, CAPE is a system that combines
independent ML models and algorithms detecting BEC-related behaviors across
various email modalities such as text, images, metadata and the email's
communication context. This decomposition makes CAPE's verdicts naturally
explainable. In the paper, we describe the design principles and constraints
behind its architecture, as well as the challenges of model design, evaluation
and adapting the system continuously through a Bayesian approach that combines
limited data with domain knowledge. Furthermore, we elaborate on several
specific behavioral detectors, such as those based on Transformer neural
architectures
Fully Invisible Protean Signatures Schemes
Protean Signatures (PS), recently introduced by Krenn et al. (CANS \u2718), allow a semi-trusted third party, named the sanitizer, to modify a signed message in a controlled way.
The sanitizer can
edit signer-chosen parts to arbitrary bitstrings, while the sanitizer can also redact
admissible parts, which are also chosen by the signer. Thus, PSs generalize both redactable signature (RSS) and sanitizable signature (SSS)
into a single notion.
However, the current definition of invisibility does not prohibit that an outsider can decide which
parts of a message are redactable - only which parts can be edited are hidden. This negatively
impacts on the privacy guarantees provided by the state-of-the-art definition.
We extend PSs to be fully invisible.
This strengthened notion guarantees that an outsider can neither decide which parts of a message can be edited nor which
parts can be redacted. To achieve our goal, we introduce the new notions of Invisible RSSs and Invisible Non-Accountable SSSs (SSS\u27), along with a consolidated framework for aggregate signatures.
Using those building blocks, our resulting construction is significantly
more efficient than the original scheme by Krenn et al., which we demonstrate in a prototypical implementation
FIN-DM: finantsteenuste andmekaeve protsessi mudel
Andmekaeve hõlmab reeglite kogumit, protsesse ja algoritme, mis võimaldavad ettevõtetel iga päev kogutud andmetest rakendatavaid teadmisi ammutades suurendada tulusid, vähendada kulusid, optimeerida tooteid ja kliendisuhteid ning saavutada teisi eesmärke. Andmekaeves ja -analüütikas on vaja hästi määratletud metoodikat ja protsesse. Saadaval on mitu andmekaeve ja -analüütika standardset protsessimudelit. Kõige märkimisväärsem ja laialdaselt kasutusele võetud standardmudel on CRISP-DM. Tegu on tegevusalast sõltumatu protsessimudeliga, mida kohandatakse sageli sektorite erinõuetega. CRISP-DMi tegevusalast lähtuvaid kohandusi on pakutud mitmes valdkonnas, kaasa arvatud meditsiini-, haridus-, tööstus-, tarkvaraarendus- ja logistikavaldkonnas. Seni pole aga mudelit kohandatud finantsteenuste sektoris, millel on omad valdkonnapõhised erinõuded.
Doktoritöös käsitletakse seda lünka finantsteenuste sektoripõhise andmekaeveprotsessi (FIN-DM) kavandamise, arendamise ja hindamise kaudu. Samuti uuritakse, kuidas kasutatakse andmekaeve standardprotsesse eri tegevussektorites ja finantsteenustes. Uurimise käigus tuvastati mitu tavapärase raamistiku kohandamise stsenaariumit. Lisaks ilmnes, et need meetodid ei keskendu piisavalt sellele, kuidas muuta andmekaevemudelid tarkvaratoodeteks, mida saab integreerida organisatsioonide IT-arhitektuuri ja äriprotsessi. Peamised finantsteenuste valdkonnas tuvastatud kohandamisstsenaariumid olid seotud andmekaeve tehnoloogiakesksete (skaleeritavus), ärikesksete (tegutsemisvõime) ja inimkesksete (diskrimineeriva mõju leevendus) aspektidega. Seejärel korraldati tegelikus finantsteenuste organisatsioonis juhtumiuuring, mis paljastas 18 tajutavat puudujääki CRISP- DMi protsessis.
Uuringu andmete ja tulemuste abil esitatakse doktoritöös finantsvaldkonnale kohandatud CRISP-DM nimega FIN-DM ehk finantssektori andmekaeve protsess (Financial Industry Process for Data Mining). FIN-DM laiendab CRISP-DMi nii, et see toetab privaatsust säilitavat andmekaevet, ohjab tehisintellekti eetilisi ohte, täidab riskijuhtimisnõudeid ja hõlmab kvaliteedi tagamist kui osa andmekaeve elutsüklisData mining is a set of rules, processes, and algorithms that allow companies to increase revenues, reduce costs, optimize products and customer relationships, and achieve other business goals, by extracting actionable insights from the data they collect on a day-to-day basis. Data mining and analytics projects require well-defined methodology and processes. Several standard process models for conducting data mining and analytics projects are available. Among them, the most notable and widely adopted standard model is CRISP-DM. It is industry-agnostic and often is adapted to meet sector-specific requirements. Industry- specific adaptations of CRISP-DM have been proposed across several domains, including healthcare, education, industrial and software engineering, logistics, etc. However, until now, there is no existing adaptation of CRISP-DM for the financial services industry, which has its own set of domain-specific requirements.
This PhD Thesis addresses this gap by designing, developing, and evaluating a sector-specific data mining process for financial services (FIN-DM). The PhD thesis investigates how standard data mining processes are used across various industry sectors and in financial services. The examination identified number of adaptations scenarios of traditional frameworks. It also suggested that these approaches do not pay sufficient attention to turning data mining models into software products integrated into the organizations' IT architectures and business processes. In the financial services domain, the main discovered adaptation scenarios concerned technology-centric aspects (scalability), business-centric aspects (actionability), and human-centric aspects (mitigating discriminatory effects) of data mining. Next, an examination by means of a case study in the actual financial services organization revealed 18 perceived gaps in the CRISP-DM process.
Using the data and results from these studies, the PhD thesis outlines an adaptation of
CRISP-DM for the financial sector, named the Financial Industry Process for Data Mining
(FIN-DM). FIN-DM extends CRISP-DM to support privacy-compliant data mining, to tackle AI ethics risks, to fulfill risk management requirements, and to embed quality assurance as part of the data mining life-cyclehttps://www.ester.ee/record=b547227
Side-Channel Analysis and Cryptography Engineering : Getting OpenSSL Closer to Constant-Time
As side-channel attacks reached general purpose PCs and started to be more practical for attackers to exploit, OpenSSL adopted in 2005 a flagging mechanism to protect against SCA. The opt-in mechanism allows to flag secret values, such as keys, with the BN_FLG_CONSTTIME flag. Whenever a flag is checked and detected, the library changes its execution flow to SCA-secure functions that are slower but safer, protecting these secret values from being leaked. This mechanism favors performance over security, it is error-prone, and is obscure for most library developers, increasing the potential for side-channel vulnerabilities. This dissertation presents an extensive side-channel analysis of OpenSSL and criticizes its fragile flagging mechanism. This analysis reveals several flaws affecting the library resulting in multiple side-channel attacks, improved cache-timing attack techniques, and a new side channel vector. The first part of this dissertation introduces the main topic and the necessary related work, including the microarchitecture, the cache hierarchy, and attack techniques; then it presents a brief troubled history of side-channel attacks and defenses in OpenSSL, setting the stage for the related publications. This dissertation includes seven original publications contributing to the area of side-channel analysis, microarchitecture timing attacks, and applied cryptography. From an SCA perspective, the results identify several vulnerabilities and flaws enabling protocol-level attacks on RSA, DSA, and ECDSA, in addition to full SCA of the SM2 cryptosystem. With respect to microarchitecture timing attacks, the dissertation presents a new side-channel vector due to port contention in the CPU execution units. And finally, on the applied cryptography front, OpenSSL now enjoys a revamped code base securing several cryptosystems against SCA, favoring a secure-by-default protection against side-channel attacks, instead of the insecure opt-in flagging mechanism provided by the fragile BN_FLG_CONSTTIME flag
Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications
throughout the world, the amount of data stored on the Web has been
growing exponentially. In this new digital era, a large number of companies
have emerged with the purpose of ltering the information available on the
web and provide users with interesting items. The algorithms and models
used to recommend these items are called Recommender Systems. These
systems are applied to a large number of domains, from music, books, or
movies to dating or Point-of-Interest (POI), which is an increasingly popular
domain where users receive recommendations of di erent places when
they arrive to a city.
In this thesis, we focus on exploiting the use of contextual information, especially
temporal and sequential data, and apply it in novel ways in both
traditional and Point-of-Interest recommendation. We believe that this type
of information can be used not only for creating new recommendation models
but also for developing new metrics for analyzing the quality of these
recommendations. In one of our rst contributions we propose di erent
metrics, some of them derived from previously existing frameworks, using
this contextual information. Besides, we also propose an intuitive algorithm
that is able to provide recommendations to a target user by exploiting the
last common interactions with other similar users of the system.
At the same time, we conduct a comprehensive review of the algorithms
that have been proposed in the area of POI recommendation between 2011
and 2019, identifying the common characteristics and methodologies used.
Once this classi cation of the algorithms proposed to date is completed, we
design a mechanism to recommend complete routes (not only independent
POIs) to users, making use of reranking techniques. In addition, due to the
great di culty of making recommendations in the POI domain, we propose
the use of data aggregation techniques to use information from di erent
cities to generate POI recommendations in a given target city.
In the experimental work we present our approaches on di erent datasets
belonging to both classical and POI recommendation. The results obtained
in these experiments con rm the usefulness of our recommendation proposals,
in terms of ranking accuracy and other dimensions like novelty, diversity,
and coverage, and the appropriateness of our metrics for analyzing temporal
information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones
en todo el mundo, la cantidad de datos almacenados en la red ha crecido
exponencialmente. En esta nueva era digital, han surgido un gran n umero
de empresas con el objetivo de ltrar la informaci on disponible en la red
y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos
utilizados para recomendar estos art culos reciben el nombre de Sistemas de
Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios,
desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs,
en ingl es), un dominio cada vez m as popular en el que los usuarios reciben
recomendaciones de diferentes lugares cuando llegan a una ciudad.
En esta tesis, nos centramos en explotar el uso de la informaci on contextual,
especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa
tanto en la recomendaci on cl asica como en la recomendaci on de POIs.
Creemos que este tipo de informaci on puede utilizarse no s olo para crear
nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas
m etricas para analizar la calidad de estas recomendaciones. En una de
nuestras primeras contribuciones proponemos diferentes m etricas, algunas
derivadas de formulaciones previamente existentes, utilizando esta informaci
on contextual. Adem as, proponemos un algoritmo intuitivo que es
capaz de proporcionar recomendaciones a un usuario objetivo explotando
las ultimas interacciones comunes con otros usuarios similares del sistema.
Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que
se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y
2019, identi cando las caracter sticas comunes y las metodolog as utilizadas.
Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la
fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo
POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking.
Adem as, debido a la gran di cultad de realizar recomendaciones en el
ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos
para utilizar la informaci on de diferentes ciudades y generar recomendaciones
de POIs en una determinada ciudad objetivo.
En el trabajo experimental presentamos nuestros m etodos en diferentes
conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los
resultados obtenidos en estos experimentos con rman la utilidad de nuestras
propuestas de recomendaci on en t erminos de precisi on de ranking y de
otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de
apropiadas son nuestras m etricas para analizar la informaci on temporal y
los sesgos en las recomendaciones producida
Smart Sensor Technologies for IoT
The recent development in wireless networks and devices has led to novel services that will utilize wireless communication on a new level. Much effort and resources have been dedicated to establishing new communication networks that will support machine-to-machine communication and the Internet of Things (IoT). In these systems, various smart and sensory devices are deployed and connected, enabling large amounts of data to be streamed. Smart services represent new trends in mobile services, i.e., a completely new spectrum of context-aware, personalized, and intelligent services and applications. A variety of existing services utilize information about the position of the user or mobile device. The position of mobile devices is often achieved using the Global Navigation Satellite System (GNSS) chips that are integrated into all modern mobile devices (smartphones). However, GNSS is not always a reliable source of position estimates due to multipath propagation and signal blockage. Moreover, integrating GNSS chips into all devices might have a negative impact on the battery life of future IoT applications. Therefore, alternative solutions to position estimation should be investigated and implemented in IoT applications. This Special Issue, “Smart Sensor Technologies for IoT” aims to report on some of the recent research efforts on this increasingly important topic. The twelve accepted papers in this issue cover various aspects of Smart Sensor Technologies for IoT
Machine Learning Based Detection and Evasion Techniques for Advanced Web Bots.
Web bots are programs that can be used to browse the web and perform different types of automated actions, both benign and malicious. Such web bots vary in sophistication based on their purpose, ranging from simple automated scripts to advanced web bots that have a browser fingerprint and exhibit a humanlike behaviour. Advanced web bots are especially appealing to malicious web bot creators, due to their browserlike fingerprint and humanlike behaviour which reduce their detectability.
Several effective behaviour-based web bot detection techniques have been pro- posed in literature. However, the performance of these detection techniques when target- ing malicious web bots that try to evade detection has not been examined in depth. Such evasive web bot behaviour is achieved by different techniques, including simple heuris- tics and statistical distributions, or more advanced machine learning based techniques. Motivated by the above, in this thesis we research novel web bot detection techniques and how effective these are against evasive web bots that try to evade detection using, among others, recent advances in machine learning.
To this end, we initially evaluate state-of-the-art web bot detection techniques against web bots of different sophistication levels and show that, while the existing approaches achieve very high performance in general, such approaches are not very effective when faced with only advanced web bots that try to remain undetected. Thus, we propose a novel web bot detection framework that can be used to detect effectively bots of varying levels of sophistication, including advanced web bots. This framework comprises and combines two detection modules: (i) a detection module that extracts several features from web logs and uses them as input to several well-known machine learning algo- rithms, and (ii) a detection module that uses mouse trajectories as input to Convolutional Neural Networks (CNNs).
Moreover, we examine the case where advanced web bots utilise themselves the re- cent advances in machine learning to evade detection. Specifically, we propose two novel evasive advanced web bot types: (i) the web bots that use Reinforcement Learning (RL) to update their browsing behaviour based on whether they have been detected or not, and (ii) the web bots that have in their possession several data from human behaviours and use them as input to Generative Adversarial Networks (GANs) to generate images of humanlike mouse trajectories. We show that both approaches increase the evasiveness of the web bots by reducing the performance of the detection framework utilised in each case.
We conclude that malicious web bots can exhibit high sophistication levels and com- bine different techniques that increase their evasiveness. Even though web bot detection frameworks can combine different methods to effectively detect such bots, web bots can update their behaviours using, among other, recent advances in machine learning to in- crease their evasiveness. Thus, the detection techniques should be continuously updated to keep up with new techniques introduced by malicious web bots to evade detection
Social informatics
5th International Conference, SocInfo 2013, Kyoto, Japan, November 25-27, 2013, Proceedings</p