3,555 research outputs found
A Survey on Reinforcement Learning Security with Application to Autonomous Driving
Reinforcement learning allows machines to learn from their own experience.
Nowadays, it is used in safety-critical applications, such as autonomous
driving, despite being vulnerable to attacks carefully crafted to either
prevent that the reinforcement learning algorithm learns an effective and
reliable policy, or to induce the trained agent to make a wrong decision. The
literature about the security of reinforcement learning is rapidly growing, and
some surveys have been proposed to shed light on this field. However, their
categorizations are insufficient for choosing an appropriate defense given the
kind of system at hand. In our survey, we do not only overcome this limitation
by considering a different perspective, but we also discuss the applicability
of state-of-the-art attacks and defenses when reinforcement learning algorithms
are used in the context of autonomous driving
FLARE: Fingerprinting Deep Reinforcement Learning Agents using Universal Adversarial Masks
We propose FLARE, the first fingerprinting mechanism to verify whether a
suspected Deep Reinforcement Learning (DRL) policy is an illegitimate copy of
another (victim) policy. We first show that it is possible to find
non-transferable, universal adversarial masks, i.e., perturbations, to generate
adversarial examples that can successfully transfer from a victim policy to its
modified versions but not to independently trained policies. FLARE employs
these masks as fingerprints to verify the true ownership of stolen DRL policies
by measuring an action agreement value over states perturbed via such masks.
Our empirical evaluations show that FLARE is effective (100% action agreement
on stolen copies) and does not falsely accuse independent policies (no false
positives). FLARE is also robust to model modification attacks and cannot be
easily evaded by more informed adversaries without negatively impacting agent
performance. We also show that not all universal adversarial masks are suitable
candidates for fingerprints due to the inherent characteristics of DRL
policies. The spatio-temporal dynamics of DRL problems and sequential
decision-making process make characterizing the decision boundary of DRL
policies more difficult, as well as searching for universal masks that capture
the geometry of it.Comment: Will appear in the proceedings of ACSAC 2023; 13 pages, 5 figures, 7
table
OddAssist - An eSports betting recommendation system
It is globally accepted that sports betting has been around for as long as the sport itself. Back in
the 1st century, circuses hosted chariot races and fans would bet on who they thought would
emerge victorious. With the evolution of technology, sports evolved and, mainly, the
bookmakers evolved. Due to the mass digitization, these houses are now available online, from
anywhere, which makes this market inherently more tempting. In fact, this transition has
propelled the sports betting industry into a multi-billion-dollar industry that can rival the sports
industry.
Similarly, younger generations are increasingly attached to the digital world, including
electronic sports – eSports. In fact, young men are more likely to follow eSports than traditional
sports. Counter-Strike: Global Offensive, the videogame on which this dissertation focuses, is
one of the pillars of this industry and during 2022, 15 million dollars were distributed in
tournament prizes and there was a peak of 2 million concurrent viewers. This factor, combined
with the digitization of bookmakers, make the eSports betting market extremely appealing for
exploring machine learning techniques, since young people who follow this type of sports also
find it easy to bet online.
In this dissertation, a betting recommendation system is proposed, implemented, tested, and
validated, which considers the match history of each team, the odds of several bookmakers and
the general feeling of fans in a discussion forum.
The individual machine learning models achieved great results by themselves. More specifically,
the match history model managed an accuracy of 66.66% with an expected calibration error of
2.10% and the bookmaker odds model, with an accuracy of 65.05% and a calibration error of
2.53%.
Combining the models through stacking increased the accuracy to 67.62% but worsened the
expected calibration error to 5.19%. On the other hand, merging the datasets and training a
new, stronger model on that data improved the accuracy to 66.81% and had an expected
calibration error of 2.67%.
The solution is thoroughly tested in a betting simulation encapsulating 2500 matches. The
system’s final odd is compared with the odds of the bookmakers and the expected long-term
return is computed. A bet is made depending on whether it is above a certain threshold. This
strategy called positive expected value betting was used at multiple thresholds and the results
were compared.
While the stacking solution did not perform in a betting environment, the match history model
prevailed with profits form 8% to 90%; the odds model had profits ranging from 13% to 211%;
and the dataset merging solution profited from 11% to 77%, all depending on the minimum
expected value thresholds.
Therefore, from this work resulted several machine learning approaches capable of profiting
from Counter Strike: Global Offensive bets long-term.É globalmente aceite que as apostas desportivas existem há tanto tempo quanto o próprio
desporto. Mesmo no primeiro século, os circos hospedavam corridas de carruagens e os fãs
apostavam em quem achavam que sairia vitorioso, semelhante às corridas de cavalo de agora.
Com a evolução da tecnologia, os desportos foram evoluindo e, principalmente, evoluíram as
casas de apostas. Devido à onda de digitalização em massa, estas casas passaram a estar
disponíveis online, a partir de qualquer sítio, o que torna este mercado inerentemente mais
tentador. De facto, esta transição propulsionou a indústria das apostas desportivas para uma
indústria multibilionária que agora pode mesmo ser comparada à indústria dos desportos.
De forma semelhante, gerações mais novas estão cada vez mais ligadas ao digital, incluindo
desportos digitais – eSports. Counter-Strike: Global Offensive, o videojogo sobre o qual esta
dissertação incide, é um dos grandes impulsionadores desta indústria e durante 2022, 15
milhões de dólares foram distribuídos em prémios de torneios e houve um pico de espectadores
concorrentes de 2 milhões. Embora esta realidade não seja tão pronunciada em Portugal, em
vários países, jovens adultos do sexo masculino, têm mais probabilidade de acompanharem
eSports que desportos tradicionais. Este fator, aliado à digitalização das casas de apostas,
tornam o mercado de apostas em eSports muito apelativo para a exploração técnicas de
aprendizagem automática, uma vez que os jovens que acompanham este tipo de desportos têm
facilidade em apostar online.
Nesta dissertação é proposto, implementado, testado e validado um sistema de recomendação
de apostas que considera o histórico de resultados de cada equipa, as cotas de várias casas de
apostas e o sentimento geral dos fãs num fórum de discussão – HLTV. Deste modo, foram
inicialmente desenvolvidos 3 sistemas de aprendizagem automática.
Para avaliar os sistemas criados, foi considerado o período de outubro de 2020 até março de
2023, o que corresponde a 2500 partidas. Porém, sendo o período de testes tão extenso, existe
muita variação na competitividade das equipas. Deste modo, para evitar que os modelos
ficassem obsoletos durante este período de teste, estes foram re-treinados no mínimo uma vez
por mês durante a duração do período de testes.
O primeiro sistema de aprendizagem automática incide sobre a previsão a partir de resultados
anteriores, ou seja, o histórico de jogos entre as equipas. A melhor solução foi incorporar os
jogadores na previsão, juntamente com o ranking da equipa e dando mais peso aos jogos mais
recentes. Esta abordagem, utilizando regressão logística teve uma taxa de acerto de 66.66%
com um erro expectável de calibração de 2.10%.
O segundo sistema compila as cotas das várias casas de apostas e faz previsões com base em
padrões das suas variações. Neste caso, incorporar as casas de aposta tendo atingido uma taxa
de acerto de 65.88% utilizando regressão logística, porém, era um modelo pior calibrado que o
modelo que utilizava a média das cotas utilizando gradient boosting machine, que exibiu uma
taxa de acerto de 65.06%, mas melhores métricas de calibração, com um erro expectável de
2.53%.
O terceiro sistema, baseia-se no sentimento dos fãs no fórum HLTV. Primeiramente, é utilizado
o GPT 3.5 para extrair o sentimento de cada comentário, com uma taxa geral de acerto de
84.28%. No entanto, considerando apenas os comentários classificados como conclusivos, a taxa de acerto é de 91.46%. Depois de classificados, os comentários são depois passados a um
modelo support vector machine que incorpora o comentador e a sua taxa de acerto nas partidas
anteriores. Esta solução apenas previu corretamente 59.26% dos casos com um erro esperado
de calibração de 3.22%.
De modo a agregar as previsões destes 3 modelos, foram testadas duas abordagens.
Primeiramente, foi testado treinar um novo modelo a partir das previsões dos restantes
(stacking), obtendo uma taxa de acerto de 67.62%, mas com um erro de calibração esperado
de 5.19%. Na segunda abordagem, por outro lado, são agregados os dados utilizados no treino
dos 3 modelos individuais, e é treinado um novo modelo com base nesse conjunto de dados
mais complexo. Esta abordagem, recorrendo a support vector machine, obteve uma taxa de
acerto mais baixa, 66.81% mas um erro esperado de calibração mais baixo, 2.67%.
Por fim, as abordagens são postas à prova através de um simulador de apostas, onde sistema
cada faz uma previsão e a compara com a cota oferecia pelas casas de apostas. A simulação é
feita para vários patamares de retorno mínimo esperado, onde os sistemas apenas apostam
caso a taxa esperada de retorno da cota seja superior à do patamar.
Esta cota final é depois comparada com as cotas das casas de apostas e, caso exista uma casa
com uma cota superior, uma aposta é feita. Esta estratégia denomina-se de apostas de valor
esperado positivo, ou seja, apostas cuja cota é demasiado elevada face à probabilidade de se
concretizar e que geram lucros a longo termo. Nesta simulação, os melhores resultados, para
uma taxa de mínima de 5% foram os modelos criados a partir das cotas das casas de apostas,
com lucros entre os 13% e os 211%; o dos dados históricos que lucrou entre 8% e 90%; e por
fim, o modelo composto, com lucros entre os 11% e os 77%.
Assim, deste trabalho resultaram diversos sistemas baseados em machine learning capazes de
obter lucro a longo-termo a apostar em Counter Strike: Global Offensive
Dynamic physical activity recommendation on personalised mobile health information service: A deep reinforcement learning approach
Mobile health (mHealth) information service makes healthcare management
easier for users, who want to increase physical activity and improve health.
However, the differences in activity preference among the individual, adherence
problems, and uncertainty of future health outcomes may reduce the effect of
the mHealth information service. The current health service system usually
provides recommendations based on fixed exercise plans that do not satisfy the
user specific needs. This paper seeks an efficient way to make physical
activity recommendation decisions on physical activity promotion in
personalised mHealth information service by establishing data-driven model. In
this study, we propose a real-time interaction model to select the optimal
exercise plan for the individual considering the time-varying characteristics
in maximising the long-term health utility of the user. We construct a
framework for mHealth information service system comprising a personalised AI
module, which is based on the scientific knowledge about physical activity to
evaluate the individual exercise performance, which may increase the awareness
of the mHealth artificial intelligence system. The proposed deep reinforcement
learning (DRL) methodology combining two classes of approaches to improve the
learning capability for the mHealth information service system. A deep learning
method is introduced to construct the hybrid neural network combing long-short
term memory (LSTM) network and deep neural network (DNN) techniques to infer
the individual exercise behavior from the time series data. A reinforcement
learning method is applied based on the asynchronous advantage actor-critic
algorithm to find the optimal policy through exploration and exploitation
Explaining Deep Q-Learning Experience Replay with SHapley Additive exPlanations
Reinforcement Learning (RL) has shown promise in optimizing complex control and decision-making processes but Deep Reinforcement Learning (DRL) lacks interpretability, limiting its adoption in regulated sectors like manufacturing, finance, and healthcare. Difficulties arise from DRL’s opaque decision-making, hindering efficiency and resource use, this issue is amplified with every advancement. While many seek to move from Experience Replay to A3C, the latter demands more resources. Despite efforts to improve Experience Replay selection strategies, there is a tendency to keep capacity high. This dissertation investigates training a Deep Convolutional Q-learning agent across 20 Atari games, in solving a control task, physics task, and simulating addition, while intentionally reducing Experience Replay capacity from 1×106 to 5×102 . It was found that over 40% in the reduction of Experience Replay size is allowed for 18 of 23 simulations tested, offering a practical path to resource-efficient DRL. To illuminate agent decisions and align them with game mechanics, a novel method is employed: visualizing Experience Replay via Deep SHAP Explainer. This approach fosters comprehension and transparent, interpretable explanations, though any capacity reduction must be cautious to avoid overfitting. This study demonstrates the feasibility of reducing Experience Replay and advocates for transparent, interpretable decision explanations using the Deep SHAP Explainer to promote enhancing resource efficiency in Experience Replay
Gambling and (“Dark”) Flow. A holistic Study with Best Practice Cases on How to Minimize Harm
How to apply “minimize harm” as a guiding principle for regulating, redesigning, and running the gambling industry/business? The article looks at gambling through the flow concept using a four-dimensional frame of reference. The attempt is to analyze the problem of gambling from a scientific-technical, interpersonal, systemic and a spiritual-existential perspective. Two best-practice cases are used as illustrations; Norway’s national gambling monopoly operated by Norsk Tipping (NT), and the Italian city Pavia which was transformed from being “Italy’s Las Vegas” to return to the old charming city where gambling are strictly regulated. The third case illustrates the spiritual-existential dimension mirrored by a young American lawyer who became addicted to gambling, but finally sued the casinos that had ruined her life. Implications for further research are discussed, suggesting a move from “dark” flow to “green” flow
Deep Learning for Phishing Detection: Taxonomy, Current Challenges and Future Directions
This work was supported in part by the Ministry of Higher Education under the Fundamental Research Grant Scheme under Grant FRGS/1/2018/ICT04/UTM/01/1; and in part by the Faculty of Informatics and Management, University of Hradec Kralove, through SPEV project under Grant 2102/2022.Phishing has become an increasing concern and captured the attention of end-users as well
as security experts. Existing phishing detection techniques still suffer from the de ciency in performance
accuracy and inability to detect unknown attacks despite decades of development and improvement.
Motivated to solve these problems, many researchers in the cybersecurity domain have shifted their attention
to phishing detection that capitalizes on machine learning techniques. Deep learning has emerged as a branch
of machine learning that becomes a promising solution for phishing detection in recent years. As a result,
this study proposes a taxonomy of deep learning algorithm for phishing detection by examining 81 selected
papers using a systematic literature review approach. The paper rst introduces the concept of phishing and
deep learning in the context of cybersecurity. Then, taxonomies of phishing detection and deep learning
algorithm are provided to classify the existing literature into various categories. Next, taking the proposed
taxonomy as a baseline, this study comprehensively reviews the state-of-the-art deep learning techniques
and analyzes their advantages as well as disadvantages. Subsequently, the paper discusses various issues
that deep learning faces in phishing detection and proposes future research directions to overcome these
challenges. Finally, an empirical analysis is conducted to evaluate the performance of various deep learning
techniques in a practical context, and to highlight the related issues that motivate researchers in their future
works. The results obtained from the empirical experiment showed that the common issues among most of
the state-of-the-art deep learning algorithms are manual parameter-tuning, long training time, and de cient
detection accuracy.Ministry of Higher Education under the Fundamental Research Grant Scheme FRGS/1/2018/ICT04/UTM/01/1Faculty of Informatics and Management, University of Hradec Kralove, through SPEV project 2102/202
- …