9,002 research outputs found

    Evaluation Methodologies in Software Protection Research

    Full text link
    Man-at-the-end (MATE) attackers have full control over the system on which the attacked software runs, and try to break the confidentiality or integrity of assets embedded in the software. Both companies and malware authors want to prevent such attacks. This has driven an arms race between attackers and defenders, resulting in a plethora of different protection and analysis methods. However, it remains difficult to measure the strength of protections because MATE attackers can reach their goals in many different ways and a universally accepted evaluation methodology does not exist. This survey systematically reviews the evaluation methodologies of papers on obfuscation, a major class of protections against MATE attacks. For 572 papers, we collected 113 aspects of their evaluation methodologies, ranging from sample set types and sizes, over sample treatment, to performed measurements. We provide detailed insights into how the academic state of the art evaluates both the protections and analyses thereon. In summary, there is a clear need for better evaluation methodologies. We identify nine challenges for software protection evaluations, which represent threats to the validity, reproducibility, and interpretation of research results in the context of MATE attacks

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    Geschätzt mehr als 6.000 Erkrankungen werden durch Veränderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begünstigen. All diese Prozesse müssen überprüft werden, um die zum beschriebenen Phänotyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer Pathogenität. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier präsentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells für das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf Allelhäufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfügbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

    Waiting Nets: State Classes and Taxonomy

    Full text link
    In time Petri nets (TPNs), time and control are tightly connected: time measurement for a transition starts only when all resources needed to fire it are available. Further, upper bounds on duration of enabledness can force transitions to fire (this is called urgency). For many systems, one wants to decouple control and time, i.e. start measuring time as soon as a part of the preset of a transition is filled, and fire it after some delay \underline{and} when all needed resources are available. This paper considers an extension of TPN called waiting nets that dissociates time measurement and control. Their semantics allows time measurement to start with incomplete presets, and can ignore urgency when upper bounds of intervals are reached but all resources needed to fire are not yet available. Firing of a transition is then allowed as soon as missing resources are available. It is known that extending bounded TPNs with stopwatches leads to undecidability. Our extension is weaker, and we show how to compute a finite state class graph for bounded waiting nets, yielding decidability of reachability and coverability. We then compare expressiveness of waiting nets with that of other models w.r.t. timed language equivalence, and show that they are strictly more expressive than TPNs

    Agent Miner: An Algorithm for Discovering Agent Systems from Event Data

    Full text link
    Process discovery studies ways to use event data generated by business processes and recorded by IT systems to construct models that describe the processes. Existing discovery algorithms are predominantly concerned with constructing process models that represent the control flow of the processes. Agent system mining argues that business processes often emerge from interactions of autonomous agents and uses event data to construct models of the agents and their interactions. This paper presents and evaluates Agent Miner, an algorithm for discovering models of agents and their interactions from event data composing the system that has executed the processes which generated the input data. The conducted evaluation using our open-source implementation of Agent Miner and publicly available industrial datasets confirms that our algorithm can provide insights into the process participants and their interaction patterns and often discovers models that describe the business processes more faithfully than process models discovered using conventional process discovery algorithms

    2023-2024 Boise State University Undergraduate Catalog

    Get PDF
    This catalog is primarily for and directed at students. However, it serves many audiences, such as high school counselors, academic advisors, and the public. In this catalog you will find an overview of Boise State University and information on admission, registration, grades, tuition and fees, financial aid, housing, student services, and other important policies and procedures. However, most of this catalog is devoted to describing the various programs and courses offered at Boise State

    Succinct representation for (non)deterministic finite automata

    Get PDF
    International audienceNon)-Deterministic finite automata are one of the simplest models of computation studied in automata theory. Here we study them through the lens of succinct data structures. Towards this goal, we design a data structure for any deterministic automaton D having n states over a σ-letter alphabet using (σ − 1)n log n(1 + o(1)) bits, that determines, given a string x, whether D accepts x in optimal O (|x|) time. We also consider the case when there are N < σ n non-failure transitions, and obtain various time-space trade-offs. Here some of our results are better than the recent work of Cotumaccio and Prezza (SODA 2021). We also exhibit a data structure for non-deterministic automaton N using σ n 2 + n bits that takes O (n 2 |x|) time for string membership checking. Finally, we also provide time and space efficient algorithms for performing several standard operations on the languages accepted by finite automata

    Modified Theories of Gravity and Cosmological Applications

    Get PDF
    This reprint focuses on recent aspects of gravitational theory and cosmology. It contains subjects of particular interest for modified gravity theories and applications to cosmology, special attention is given to Einstein–Gauss–Bonnet, f(R)-gravity, anisotropic inflation, extra dimension theories of gravity, black holes, dark energy, Palatini gravity, anisotropic spacetime, Einstein–Finsler gravity, off-diagonal cosmological solutions, Hawking-temperature and scalar-tensor-vector theories

    Modelling, Monitoring, Control and Optimization for Complex Industrial Processes

    Get PDF
    This reprint includes 22 research papers and an editorial, collected from the Special Issue "Modelling, Monitoring, Control and Optimization for Complex Industrial Processes", highlighting recent research advances and emerging research directions in complex industrial processes. This reprint aims to promote the research field and benefit the readers from both academic communities and industrial sectors

    ON EXPRESSIVENESS, INFERENCE, AND PARAMETER ESTIMATION OF DISCRETE SEQUENCE MODELS

    Get PDF
    Huge neural autoregressive sequence models have achieved impressive performance across different applications, such as NLP, reinforcement learning, and bioinformatics. However, some lingering problems (e.g., consistency and coherency of generated texts) continue to exist, regardless of the parameter count. In the first part of this thesis, we chart a taxonomy of the expressiveness of various sequence model families (Ch 3). In particular, we put forth complexity-theoretic proofs that string latent-variable sequence models are strictly more expressive than energy-based sequence models, which in turn are more expressive than autoregressive sequence models. Based on these findings, we introduce residual energy-based sequence models, a family of energy-based sequence models (Ch 4) whose sequence weights can be evaluated efficiently, and also perform competitively against autoregressive models. However, we show how unrestricted energy-based sequence models can suffer from uncomputability; and how such a problem is generally unfixable without knowledge of the true sequence distribution (Ch 5). In the second part of the thesis, we study practical sequence model families and algorithms based on theoretical findings in the first part of the thesis. We introduce neural particle smoothing (Ch 6), a family of approximate sampling methods that work with conditional latent variable models. We also introduce neural finite-state transducers (Ch 7), which extend weighted finite state transducers with the introduction of mark strings, allowing scoring transduction paths in a finite state transducer with a neural network. Finally, we propose neural regular expressions (Ch 8), a family of neural sequence models that are easy to engineer, allowing a user to design flexible weighted relations using Marked FSTs, and combine these weighted relations together with various operations

    On the growth rate of polyregular functions

    Full text link
    We consider polyregular functions, which are certain string-to-string functions that have polynomial output size. We prove that a polyregular function has output size O(nk)\mathcal O(n^k) if and only if it can be defined by an MSO interpretation of dimension kk, i.e. a string-to-string transformation where every output position is interpreted, using monadic second-order logic MSO, in some kk-tuple of input positions. We also show that this characterization does not extend to pebble transducers, another model for describing polyregular functions: we show that for every k{1,2,}k \in \{1,2,\ldots\} there is a polyregular function of quadratic output size which needs at least kk pebbles to be computed
    corecore