92 research outputs found
Minimax Trees in Linear Time with Applications
A minimax tree is similar to a Huffman tree except that, instead of minimizing the weighted average of the leaves\u27 depths, it minimizes the maximum of any leaf\u27s weight plus its depth. Golumbic (1976) introduced minimax trees and gave a Huffman-like, -time algorithm for building them. Drmota and Szpankowski (2002) gave another -time algorithm, which takes linear time when the weights are already sorted by their fractional parts. In this paper we give the first linear-time algorithm for building minimax trees for unsorted real weights
Man-in-the-Middle Attacks on MQTT based IoT networks
âThe use of Internet-of-Things (IoT) devices has increased a considerable amount in recent years due to decreasing cost and increasing availability of transistors, semiconductor, and other components. Examples can be found in daily life through smart cities, consumer security cameras, agriculture sensors, and more. However, Cyber Security in these IoT devices are often an afterthought making these devices susceptible to easy attacks. This can be due to multiple factors. An IoT device is often in a smaller form factor and must be affordable to buy in large quantities; as a result, IoT devices have less resources than a typical computer. This includes less processing power, battery power, and random access memory (RAM). This limits the possibilities of traditional security in IoT devices.
To help evaluate the state of IoT devices and further enforce them, we present an easy to use program that requires little to no prior knowledge of the target infrastructure. The process is a Man-in-the-Middle (MITM) attack that hijacks packets sent between IoT devices using the popular MQTT protocol. We do this by using a WiFi Pineapple from Hak5, in the deviceâs raw form, is a WiFi access point with specific offensive capabilities installed as software. We then pass these packets into a custom General Adversarial Network (GAN) that utilizes a Natural Language Processing (NLP) model to generate a malicious message. Once malicious messages are generated, the messages are passed back to the WiFI Pineapple and sent as a legitimate packet among the network.
We then look at the efficiency of these malicious messages through different NLP algorithms. In this particular work, we analyze an array of BERT variants and GPT-2â--Abstract, page iv
Game theoretic and machine learning techniques for balancing games
Game balance is the problem of determining the fairness of actions or sets of actions in competitive, multiplayer games. This problem primarily arises in the context of designing board and video games. Traditionally, balance has been achieved through large amounts of play-testing and trial-and-error on the part of the designers. In this thesis, it is our intent to lay down the beginnings of a framework for a formal and analytical solution to this problem, combining techniques from game theory and machine learning. We first develop a set of game-theoretic definitions for different forms of balance, and then introduce the concept of a strategic abstraction. We show how machine classification techniques can be used to identify high-level player strategy in games, using the two principal methods of sequence alignment and Naive Bayes classification. Bioinformatics sequence alignment, when combined with a 3-nearest neighbor classification approach, can, with only 3 exemplars of each strategy, correctly identify the strategy used in 55\% of cases using all data, and 77\% of cases on data that experts indicated actually had a strategic class. Naive Bayes classification achieves similar results, with 65\% accuracy on all data and 75\% accuracy on data rated to have an actual class. We then show how these game theoretic and machine learning techniques can be combined to automatically build matrices that can be used to analyze game balance properties
Evaluation Functions in General Game Playing
While in traditional computer game playing agents were designed solely for the purpose of playing one single game, General Game Playing is concerned with agents capable of playing classes of games. Given the game's rules and a few minutes time, the agent is supposed to play any game of the class and eventually win it.
Since the game is unknown beforehand, previously optimized data structures or human-provided features are not applicable. Instead, the agent must derive a strategy on its own.
One approach to obtain such a strategy is to analyze the game rules and create a state evaluation function that can be subsequently used to direct the agent to promising states in the match.
In this thesis we will discuss existing methods and present a general approach on how to construct such an evaluation function.
Each topic is discussed in a modular fashion and evaluated along the lines of quality and efficiency, resulting in a strong agent.:Introduction
Game Playing
Evaluation Functions I - Aggregation
Evaluation Functions II - Features
General Evaluation
Related Work
Discussio
Discovering robust dependencies from data
Science revolves around forming hypotheses, designing experiments, collecting data, and tests. It was not until recently, with the advent of modern hardware and data analytics, that science shifted towards a big-data-driven paradigm that led to an unprecedented success across various fields. What is perhaps the most astounding feature of this new era, is that interesting hypotheses can now be automatically discovered from observational data. This dissertation investigates knowledge discovery procedures that do exactly this. In particular, we seek algorithms that discover the most informative models able to compactly âdescribeâ aspects of the phenomena under investigation, in both supervised and unsupervised settings. We consider interpretable models in the form of subsets of the original variable set. We want the models to capture all possible interactions, e.g., linear, non-linear, between all types of variables, e.g., discrete, continuous, and lastly, we want their quality to be meaningfully assessed. For this, we employ information-theoretic
measures, and particularly, the fraction of information for the supervised setting, and the normalized total correlation for the unsupervised. The former measures the uncertainty reduction of the target variable conditioned on a model, and the latter measures the information overlap of the variables included in a model.
Without access to the true underlying data generating process, we estimate the aforementioned measures from observational data. This process is prone to statistical errors, and in our case, the errors manifest as biases towards larger models. This can lead to situations where the results are utterly random, hindering
therefore further analysis. We correct this behavior with notions from statistical learning theory. In particular, we propose regularized estimators that are unbiased under the hypothesis of independence, leading to robust estimation from limited data samples and arbitrary dimensionalities. Moreover, we do this for models
consisting of both discrete and continuous variables. Lastly, to discover the top scoring models, we derive effective optimization algorithms for exact, approximate, and heuristic search. These algorithms are
powered by admissible, tight, and efficient-to-compute bounding functions for our proposed estimators that can be used to greatly prune the search space. Overall, the products of this dissertation can successfully assist data analysts with data exploration, discovering powerful description models, or concluding that
no satisfactory models exist, implying therefore new experiments and data are required for the phenomena under investigation. This statement is supported by Materials Science researchers who corroborated our discoveries.In der Wissenschaft geht es um Hypothesenbildung, Entwerfen von Experimenten, Sammeln von Daten und Tests. JuÌngst hat sich die Wissenschaft, durch das Aufkommen moderner Hardware und Datenanalyse, zu einem Big-Data-basierten Paradigma hin entwickelt, das zu einem beispiellosen Erfolg in verschiedenen Bereichen gefuÌhrt hat. Ein erstaunliches Merkmal dieser neuen ra ist, dass interessante Hypothesen jetzt automatisch aus Beobachtungsdaten entdeckt werden kânnen. In dieser Dissertation werden Verfahren zur Wissensentdeckung untersucht, die genau dies tun. Insbesondere suchen wir nach Algorithmen, die Modelle identifizieren, die in der Lage sind, Aspekte der untersuchten Phânomene sowohl in beaufsichtigten als auch in unbeaufsichtigten Szenarien kompakt zu âbeschreibenâ. Hierzu betrachten wir interpretierbare Modelle in Form von Untermengen der urspruÌnglichen Variablenmenge. Ziel ist es, dass diese Modelle alle mâglichen Interaktionen erfassen (z.B. linear, nicht-lineare), zwischen allen Arten von Variablen unterscheiden (z.B. diskrete, kontinuierliche) und dass schlussendlich ihre Qualitât sinnvoll bewertet wird. Dazu setzen wir informationstheoretische Maâe ein, insbesondere den Informationsanteil fuÌr das uÌberwachte und die normalisierte Gesamtkorrelation fuÌr das unuÌberwachte Szenario. Ersteres misst die Unsicherheitsreduktion der Zielvariablen, die durch ein Modell bedingt ist, und letztere misst die InformationsuÌberlappung der enthaltenen Variablen. Ohne Kontrolle des Datengenerierungsprozesses werden die oben genannten Maâe aus Beobachtungsdaten geschâtzt. Dies ist anfâllig fuÌr statistische Fehler, die zu Verzerrungen in grââeren Modellen fuÌhren. So entstehen Situationen, wobei die Ergebnisse vâllig zufâllig sind und somit weitere Analysen stâren. Wir korrigieren dieses Verhalten mit Methoden aus der statistischen Lerntheorie. Insbesondere schlagen wir regularisierte Schâtzer vor, die unter der Hypothese der Unabhângigkeit nicht verzerrt sind und somit zu einer robusten Schâtzung aus begrenzten Datenstichproben und willkuÌrlichen-Dimensionalitâten fuÌhren. DaruÌber hinaus wenden wir dies fuÌr Modelle an, die sowohl aus diskreten als auch aus kontinuierlichen Variablen bestehen. Um die besten Modelle zu entdecken, leiten wir effektive Optimierungsalgorithmen mit verschiedenen Garantien ab. Diese Algorithmen basieren auf speziellen Begrenzungsfunktionen der vorgeschlagenen Schâtzer und erlauben es den Suchraum stark einzuschrânken. Insgesamt sind die Produkte dieser Arbeit sehr effektiv fuÌr die Wissensentdeckung. Letztere Aussage
wurde von Materialwissenschaftlern bestâtigt
A survey on online active learning
Online active learning is a paradigm in machine learning that aims to select
the most informative data points to label from a data stream. The problem of
minimizing the cost associated with collecting labeled observations has gained
a lot of attention in recent years, particularly in real-world applications
where data is only available in an unlabeled form. Annotating each observation
can be time-consuming and costly, making it difficult to obtain large amounts
of labeled data. To overcome this issue, many active learning strategies have
been proposed in the last decades, aiming to select the most informative
observations for labeling in order to improve the performance of machine
learning models. These approaches can be broadly divided into two categories:
static pool-based and stream-based active learning. Pool-based active learning
involves selecting a subset of observations from a closed pool of unlabeled
data, and it has been the focus of many surveys and literature reviews.
However, the growing availability of data streams has led to an increase in the
number of approaches that focus on online active learning, which involves
continuously selecting and labeling observations as they arrive in a stream.
This work aims to provide an overview of the most recently proposed approaches
for selecting the most informative observations from data streams in the
context of online active learning. We review the various techniques that have
been proposed and discuss their strengths and limitations, as well as the
challenges and opportunities that exist in this area of research. Our review
aims to provide a comprehensive and up-to-date overview of the field and to
highlight directions for future work
EXTENDING AND IMPROVING DESIGNS FOR LARGE-SCALE COMPUTER EXPERIMENTS
This research develops methods that increase the inventory of space-filling designs (SFDs) for large-scale computer-based experiments. We present a technique enabling researchers to add sequential blocks of design points effectively and efficiently to existing SFDs. We accomplish this through a quadratically constrained mixed-integer program that augments cataloged or computationally expensive designs by optimally permuting and stacking columns of an initial base design to minimize the maximum absolute pairwise correlation among columns in the new extended design. We extend many classes of SFDs to dimensions that are currently not easily obtainable. Adding new design points provides more degrees of freedom for building metamodels and assessing fit. The resulting extended designs have better correlation and space-filling properties than the original base designs and compare well with other types of SFDs created from scratch in the extended design space. In addition, through massive computer-based experimentation, we compare popular software packages for generating SFDs and provide insight into the methods and relationships among design measures of correlation and space-fillingness. These results provide experimenters with a broad understanding of SFD software packages, algorithms, and optimality criteria. Further, we provide a probability-distribution model for the maximum absolute pairwise correlation among columns in the widely used maximin Latin hypercube designs.Lieutenant Colonel, United States Marine CorpsApproved for public release. Distribution is unlimited
The generation of longest optimal box repetition-free words
Thesis (MSc)--Stellenbosch University, 2022.ENGLISH ABSTRACT: This thesis focuses on a specific problem within the field of combinatorial generation, namely, the
generation of box repetition-free words.
A box is a word over a given alphabet, where the first symbol in the word is the same as the last
symbol. For example, the word abaca is a box. A box can contain other boxes. The box abaca
contains boxes aba and aca. Boxes can overlap, such as aba and aca in abaca.
This work investigates the generation of the longest possible sequence of symbols, over a given
alphabet, which does not contain any repeating boxes.
We show that an exhaustive enumeration based on a brute force approach with backtracking is
not feasible. That is, we checked if adding a symbol to a word would create a repeating box; if
not, recursively add another symbol. This method will eventually find all valid words, but takes
an unreasonable amount of time for larger alphabets.
As a non-enumerative attempt to find individual valid words, the Monte Carlo tree search is used.
The search is based on the assumption that prefixes with good past results will also give good
results in the future.
Based on an analysis of the properties of box repetition-free words, a new search is devised. Factors
of words are mapped onto a graph, and all non-optimal edges removed. It is then shown that any
Hamiltonian path on this graph will result in a longest optimal word.
The results of this work show that backtracking fails to generate longest optimal words within a
reasonable time for any alphabet with more than three symbols. The Monte Carlo tree search
performs better than backtracking, finding optimal words for an alphabet size of four, but failing
for larger alphabets. The new method outperforms both, and with a small optimization, is shown
to generate longest optimal words up to an alphabet size of six.AFRIKAANSE OPSOMMING: Hierdie tesis fokus op ân spesifieke probleem in die veld van kombinatoriese generasie, naamlik die
generasie van boksherhalingsvrye woorde.
ân Boks is ân woord oor ân gegewe alfabet, waar die eerste simbool in die woord dieselfde is as die
laaste simbool. Byvoorbeeld, die woord abaca is ân boks. ân Boks kan ander boksse bevat. Die
boks abaca bevat bokse aba en aca. Bokse kan oorvleuel, soos aba en aca in abaca.
Hierdie werk ondersoek die generasie van die langste moontlike reeks simbole, oor ân gegewe alfabet,
wat geen herhalende bokse bevat nie.
Ons wys dat ân volledige enumeratiewe soektog gebaseer op ân brute krag benadering met terugsporing
nie haalbaar is nie. Dit wil sËe, ons het gekontroleer of die byvoeging van ân simbool aan ân wo ord ân herhalende boks sal skep; indien nie, voeg ân ander simbool rekursief by. Hierdie metode
sal uiteindelik alle geldige woorde vind, maar neem ân buitensporige hoeveelheid tyd vir groter
alfabette.
As ân nie-enumeratiewe poging om individuele geldige woorde te vind, word die Monte Carlo boom soektogalgoritme gebruik. Die soektog is gebaseer op die aanname dat voorvoegsels met goeie vorige
resultate ook in die toekoms goeie resultate sal lewer.
Op grond van ân ontleding van die eienskappe van boksherhalingsvrye woorde, word ân nuwe
soekalgoritme voorgestel. Faktore van woorde word op ân grafiek afgebeeld, en alle nie-optimale
oorgange word verwyder. Dit word dan aangetoon dat enige Hamiltonse pad op hierdie grafiek tot
ân langste optimale woord sal lei.
Die resultate van hierdie werk toon dat terugsporing nie die langste optimale woorde binne ân
redelike tyd genereer vir enige alfabet met meer as drie simbole nie. Die Monte Carlo boomsoektog
presteer beter as terugsporing, en vind optimale woorde vir ân alfabetgrootte van vier, maar misluk vir groter alfabette. Die nuwe metode presteer beter as albei, en met ân klein optimering kan dit
die langste optimale woorde van ân alfabetgrootte van ses genereer.Master
- âŠ