92 research outputs found

    Minimax Trees in Linear Time with Applications

    Get PDF
    A minimax tree is similar to a Huffman tree except that, instead of minimizing the weighted average of the leaves\u27 depths, it minimizes the maximum of any leaf\u27s weight plus its depth. Golumbic (1976) introduced minimax trees and gave a Huffman-like, O(nlogn)O (n log n)-time algorithm for building them. Drmota and Szpankowski (2002) gave another O(nlogn)O (n log n)-time algorithm, which takes linear time when the weights are already sorted by their fractional parts. In this paper we give the first linear-time algorithm for building minimax trees for unsorted real weights

    Man-in-the-Middle Attacks on MQTT based IoT networks

    Get PDF
    “The use of Internet-of-Things (IoT) devices has increased a considerable amount in recent years due to decreasing cost and increasing availability of transistors, semiconductor, and other components. Examples can be found in daily life through smart cities, consumer security cameras, agriculture sensors, and more. However, Cyber Security in these IoT devices are often an afterthought making these devices susceptible to easy attacks. This can be due to multiple factors. An IoT device is often in a smaller form factor and must be affordable to buy in large quantities; as a result, IoT devices have less resources than a typical computer. This includes less processing power, battery power, and random access memory (RAM). This limits the possibilities of traditional security in IoT devices. To help evaluate the state of IoT devices and further enforce them, we present an easy to use program that requires little to no prior knowledge of the target infrastructure. The process is a Man-in-the-Middle (MITM) attack that hijacks packets sent between IoT devices using the popular MQTT protocol. We do this by using a WiFi Pineapple from Hak5, in the device’s raw form, is a WiFi access point with specific offensive capabilities installed as software. We then pass these packets into a custom General Adversarial Network (GAN) that utilizes a Natural Language Processing (NLP) model to generate a malicious message. Once malicious messages are generated, the messages are passed back to the WiFI Pineapple and sent as a legitimate packet among the network. We then look at the efficiency of these malicious messages through different NLP algorithms. In this particular work, we analyze an array of BERT variants and GPT-2”--Abstract, page iv

    Game theoretic and machine learning techniques for balancing games

    Get PDF
    Game balance is the problem of determining the fairness of actions or sets of actions in competitive, multiplayer games. This problem primarily arises in the context of designing board and video games. Traditionally, balance has been achieved through large amounts of play-testing and trial-and-error on the part of the designers. In this thesis, it is our intent to lay down the beginnings of a framework for a formal and analytical solution to this problem, combining techniques from game theory and machine learning. We first develop a set of game-theoretic definitions for different forms of balance, and then introduce the concept of a strategic abstraction. We show how machine classification techniques can be used to identify high-level player strategy in games, using the two principal methods of sequence alignment and Naive Bayes classification. Bioinformatics sequence alignment, when combined with a 3-nearest neighbor classification approach, can, with only 3 exemplars of each strategy, correctly identify the strategy used in 55\% of cases using all data, and 77\% of cases on data that experts indicated actually had a strategic class. Naive Bayes classification achieves similar results, with 65\% accuracy on all data and 75\% accuracy on data rated to have an actual class. We then show how these game theoretic and machine learning techniques can be combined to automatically build matrices that can be used to analyze game balance properties

    Evaluation Functions in General Game Playing

    Get PDF
    While in traditional computer game playing agents were designed solely for the purpose of playing one single game, General Game Playing is concerned with agents capable of playing classes of games. Given the game's rules and a few minutes time, the agent is supposed to play any game of the class and eventually win it. Since the game is unknown beforehand, previously optimized data structures or human-provided features are not applicable. Instead, the agent must derive a strategy on its own. One approach to obtain such a strategy is to analyze the game rules and create a state evaluation function that can be subsequently used to direct the agent to promising states in the match. In this thesis we will discuss existing methods and present a general approach on how to construct such an evaluation function. Each topic is discussed in a modular fashion and evaluated along the lines of quality and efficiency, resulting in a strong agent.:Introduction Game Playing Evaluation Functions I - Aggregation Evaluation Functions II - Features General Evaluation Related Work Discussio

    Discovering robust dependencies from data

    Get PDF
    Science revolves around forming hypotheses, designing experiments, collecting data, and tests. It was not until recently, with the advent of modern hardware and data analytics, that science shifted towards a big-data-driven paradigm that led to an unprecedented success across various fields. What is perhaps the most astounding feature of this new era, is that interesting hypotheses can now be automatically discovered from observational data. This dissertation investigates knowledge discovery procedures that do exactly this. In particular, we seek algorithms that discover the most informative models able to compactly “describe” aspects of the phenomena under investigation, in both supervised and unsupervised settings. We consider interpretable models in the form of subsets of the original variable set. We want the models to capture all possible interactions, e.g., linear, non-linear, between all types of variables, e.g., discrete, continuous, and lastly, we want their quality to be meaningfully assessed. For this, we employ information-theoretic measures, and particularly, the fraction of information for the supervised setting, and the normalized total correlation for the unsupervised. The former measures the uncertainty reduction of the target variable conditioned on a model, and the latter measures the information overlap of the variables included in a model. Without access to the true underlying data generating process, we estimate the aforementioned measures from observational data. This process is prone to statistical errors, and in our case, the errors manifest as biases towards larger models. This can lead to situations where the results are utterly random, hindering therefore further analysis. We correct this behavior with notions from statistical learning theory. In particular, we propose regularized estimators that are unbiased under the hypothesis of independence, leading to robust estimation from limited data samples and arbitrary dimensionalities. Moreover, we do this for models consisting of both discrete and continuous variables. Lastly, to discover the top scoring models, we derive effective optimization algorithms for exact, approximate, and heuristic search. These algorithms are powered by admissible, tight, and efficient-to-compute bounding functions for our proposed estimators that can be used to greatly prune the search space. Overall, the products of this dissertation can successfully assist data analysts with data exploration, discovering powerful description models, or concluding that no satisfactory models exist, implying therefore new experiments and data are required for the phenomena under investigation. This statement is supported by Materials Science researchers who corroborated our discoveries.In der Wissenschaft geht es um Hypothesenbildung, Entwerfen von Experimenten, Sammeln von Daten und Tests. Jüngst hat sich die Wissenschaft, durch das Aufkommen moderner Hardware und Datenanalyse, zu einem Big-Data-basierten Paradigma hin entwickelt, das zu einem beispiellosen Erfolg in verschiedenen Bereichen geführt hat. Ein erstaunliches Merkmal dieser neuen ra ist, dass interessante Hypothesen jetzt automatisch aus Beobachtungsdaten entdeckt werden k nnen. In dieser Dissertation werden Verfahren zur Wissensentdeckung untersucht, die genau dies tun. Insbesondere suchen wir nach Algorithmen, die Modelle identifizieren, die in der Lage sind, Aspekte der untersuchten Ph nomene sowohl in beaufsichtigten als auch in unbeaufsichtigten Szenarien kompakt zu “beschreiben”. Hierzu betrachten wir interpretierbare Modelle in Form von Untermengen der ursprünglichen Variablenmenge. Ziel ist es, dass diese Modelle alle m glichen Interaktionen erfassen (z.B. linear, nicht-lineare), zwischen allen Arten von Variablen unterscheiden (z.B. diskrete, kontinuierliche) und dass schlussendlich ihre Qualit t sinnvoll bewertet wird. Dazu setzen wir informationstheoretische Ma e ein, insbesondere den Informationsanteil für das überwachte und die normalisierte Gesamtkorrelation für das unüberwachte Szenario. Ersteres misst die Unsicherheitsreduktion der Zielvariablen, die durch ein Modell bedingt ist, und letztere misst die Informationsüberlappung der enthaltenen Variablen. Ohne Kontrolle des Datengenerierungsprozesses werden die oben genannten Ma e aus Beobachtungsdaten gesch tzt. Dies ist anf llig für statistische Fehler, die zu Verzerrungen in gr  eren Modellen führen. So entstehen Situationen, wobei die Ergebnisse v llig zuf llig sind und somit weitere Analysen st ren. Wir korrigieren dieses Verhalten mit Methoden aus der statistischen Lerntheorie. Insbesondere schlagen wir regularisierte Sch tzer vor, die unter der Hypothese der Unabh ngigkeit nicht verzerrt sind und somit zu einer robusten Sch tzung aus begrenzten Datenstichproben und willkürlichen-Dimensionalit ten führen. Darüber hinaus wenden wir dies für Modelle an, die sowohl aus diskreten als auch aus kontinuierlichen Variablen bestehen. Um die besten Modelle zu entdecken, leiten wir effektive Optimierungsalgorithmen mit verschiedenen Garantien ab. Diese Algorithmen basieren auf speziellen Begrenzungsfunktionen der vorgeschlagenen Sch tzer und erlauben es den Suchraum stark einzuschr nken. Insgesamt sind die Produkte dieser Arbeit sehr effektiv für die Wissensentdeckung. Letztere Aussage wurde von Materialwissenschaftlern best tigt

    A survey on online active learning

    Full text link
    Online active learning is a paradigm in machine learning that aims to select the most informative data points to label from a data stream. The problem of minimizing the cost associated with collecting labeled observations has gained a lot of attention in recent years, particularly in real-world applications where data is only available in an unlabeled form. Annotating each observation can be time-consuming and costly, making it difficult to obtain large amounts of labeled data. To overcome this issue, many active learning strategies have been proposed in the last decades, aiming to select the most informative observations for labeling in order to improve the performance of machine learning models. These approaches can be broadly divided into two categories: static pool-based and stream-based active learning. Pool-based active learning involves selecting a subset of observations from a closed pool of unlabeled data, and it has been the focus of many surveys and literature reviews. However, the growing availability of data streams has led to an increase in the number of approaches that focus on online active learning, which involves continuously selecting and labeling observations as they arrive in a stream. This work aims to provide an overview of the most recently proposed approaches for selecting the most informative observations from data streams in the context of online active learning. We review the various techniques that have been proposed and discuss their strengths and limitations, as well as the challenges and opportunities that exist in this area of research. Our review aims to provide a comprehensive and up-to-date overview of the field and to highlight directions for future work

    EXTENDING AND IMPROVING DESIGNS FOR LARGE-SCALE COMPUTER EXPERIMENTS

    Get PDF
    This research develops methods that increase the inventory of space-filling designs (SFDs) for large-scale computer-based experiments. We present a technique enabling researchers to add sequential blocks of design points effectively and efficiently to existing SFDs. We accomplish this through a quadratically constrained mixed-integer program that augments cataloged or computationally expensive designs by optimally permuting and stacking columns of an initial base design to minimize the maximum absolute pairwise correlation among columns in the new extended design. We extend many classes of SFDs to dimensions that are currently not easily obtainable. Adding new design points provides more degrees of freedom for building metamodels and assessing fit. The resulting extended designs have better correlation and space-filling properties than the original base designs and compare well with other types of SFDs created from scratch in the extended design space. In addition, through massive computer-based experimentation, we compare popular software packages for generating SFDs and provide insight into the methods and relationships among design measures of correlation and space-fillingness. These results provide experimenters with a broad understanding of SFD software packages, algorithms, and optimality criteria. Further, we provide a probability-distribution model for the maximum absolute pairwise correlation among columns in the widely used maximin Latin hypercube designs.Lieutenant Colonel, United States Marine CorpsApproved for public release. Distribution is unlimited

    The generation of longest optimal box repetition-free words

    Get PDF
    Thesis (MSc)--Stellenbosch University, 2022.ENGLISH ABSTRACT: This thesis focuses on a specific problem within the field of combinatorial generation, namely, the generation of box repetition-free words. A box is a word over a given alphabet, where the first symbol in the word is the same as the last symbol. For example, the word abaca is a box. A box can contain other boxes. The box abaca contains boxes aba and aca. Boxes can overlap, such as aba and aca in abaca. This work investigates the generation of the longest possible sequence of symbols, over a given alphabet, which does not contain any repeating boxes. We show that an exhaustive enumeration based on a brute force approach with backtracking is not feasible. That is, we checked if adding a symbol to a word would create a repeating box; if not, recursively add another symbol. This method will eventually find all valid words, but takes an unreasonable amount of time for larger alphabets. As a non-enumerative attempt to find individual valid words, the Monte Carlo tree search is used. The search is based on the assumption that prefixes with good past results will also give good results in the future. Based on an analysis of the properties of box repetition-free words, a new search is devised. Factors of words are mapped onto a graph, and all non-optimal edges removed. It is then shown that any Hamiltonian path on this graph will result in a longest optimal word. The results of this work show that backtracking fails to generate longest optimal words within a reasonable time for any alphabet with more than three symbols. The Monte Carlo tree search performs better than backtracking, finding optimal words for an alphabet size of four, but failing for larger alphabets. The new method outperforms both, and with a small optimization, is shown to generate longest optimal words up to an alphabet size of six.AFRIKAANSE OPSOMMING: Hierdie tesis fokus op ’n spesifieke probleem in die veld van kombinatoriese generasie, naamlik die generasie van boksherhalingsvrye woorde. ’n Boks is ’n woord oor ’n gegewe alfabet, waar die eerste simbool in die woord dieselfde is as die laaste simbool. Byvoorbeeld, die woord abaca is ’n boks. ’n Boks kan ander boksse bevat. Die boks abaca bevat bokse aba en aca. Bokse kan oorvleuel, soos aba en aca in abaca. Hierdie werk ondersoek die generasie van die langste moontlike reeks simbole, oor ’n gegewe alfabet, wat geen herhalende bokse bevat nie. Ons wys dat ’n volledige enumeratiewe soektog gebaseer op ’n brute krag benadering met terugsporing nie haalbaar is nie. Dit wil sˆe, ons het gekontroleer of die byvoeging van ’n simbool aan ’n wo ord ’n herhalende boks sal skep; indien nie, voeg ’n ander simbool rekursief by. Hierdie metode sal uiteindelik alle geldige woorde vind, maar neem ’n buitensporige hoeveelheid tyd vir groter alfabette. As ’n nie-enumeratiewe poging om individuele geldige woorde te vind, word die Monte Carlo boom soektogalgoritme gebruik. Die soektog is gebaseer op die aanname dat voorvoegsels met goeie vorige resultate ook in die toekoms goeie resultate sal lewer. Op grond van ’n ontleding van die eienskappe van boksherhalingsvrye woorde, word ’n nuwe soekalgoritme voorgestel. Faktore van woorde word op ’n grafiek afgebeeld, en alle nie-optimale oorgange word verwyder. Dit word dan aangetoon dat enige Hamiltonse pad op hierdie grafiek tot ’n langste optimale woord sal lei. Die resultate van hierdie werk toon dat terugsporing nie die langste optimale woorde binne ’n redelike tyd genereer vir enige alfabet met meer as drie simbole nie. Die Monte Carlo boomsoektog presteer beter as terugsporing, en vind optimale woorde vir ’n alfabetgrootte van vier, maar misluk vir groter alfabette. Die nuwe metode presteer beter as albei, en met ’n klein optimering kan dit die langste optimale woorde van ’n alfabetgrootte van ses genereer.Master
    • 

    corecore