17,058 research outputs found

    Evaluation Methodologies in Software Protection Research

    Full text link
    Man-at-the-end (MATE) attackers have full control over the system on which the attacked software runs, and try to break the confidentiality or integrity of assets embedded in the software. Both companies and malware authors want to prevent such attacks. This has driven an arms race between attackers and defenders, resulting in a plethora of different protection and analysis methods. However, it remains difficult to measure the strength of protections because MATE attackers can reach their goals in many different ways and a universally accepted evaluation methodology does not exist. This survey systematically reviews the evaluation methodologies of papers on obfuscation, a major class of protections against MATE attacks. For 572 papers, we collected 113 aspects of their evaluation methodologies, ranging from sample set types and sizes, over sample treatment, to performed measurements. We provide detailed insights into how the academic state of the art evaluates both the protections and analyses thereon. In summary, there is a clear need for better evaluation methodologies. We identify nine challenges for software protection evaluations, which represent threats to the validity, reproducibility, and interpretation of research results in the context of MATE attacks

    Using machine learning to predict pathogenicity of genomic variants throughout the human genome

    Get PDF
    GeschĂ€tzt mehr als 6.000 Erkrankungen werden durch VerĂ€nderungen im Genom verursacht. Ursachen gibt es viele: Eine genomische Variante kann die Translation eines Proteins stoppen, die Genregulation stören oder das Spleißen der mRNA in eine andere Isoform begĂŒnstigen. All diese Prozesse mĂŒssen ĂŒberprĂŒft werden, um die zum beschriebenen PhĂ€notyp passende Variante zu ermitteln. Eine Automatisierung dieses Prozesses sind Varianteneffektmodelle. Mittels maschinellem Lernen und Annotationen aus verschiedenen Quellen bewerten diese Modelle genomische Varianten hinsichtlich ihrer PathogenitĂ€t. Die Entwicklung eines Varianteneffektmodells erfordert eine Reihe von Schritten: Annotation der Trainingsdaten, Auswahl von Features, Training verschiedener Modelle und Selektion eines Modells. Hier prĂ€sentiere ich ein allgemeines Workflow dieses Prozesses. Dieses ermöglicht es den Prozess zu konfigurieren, Modellmerkmale zu bearbeiten, und verschiedene Annotationen zu testen. Der Workflow umfasst außerdem die Optimierung von Hyperparametern, Validierung und letztlich die Anwendung des Modells durch genomweites Berechnen von Varianten-Scores. Der Workflow wird in der Entwicklung von Combined Annotation Dependent Depletion (CADD), einem Varianteneffektmodell zur genomweiten Bewertung von SNVs und InDels, verwendet. Durch Etablierung des ersten Varianteneffektmodells fĂŒr das humane Referenzgenome GRCh38 demonstriere ich die gewonnenen Möglichkeiten Annotationen aufzugreifen und neue Modelle zu trainieren. Außerdem zeige ich, wie Deep-Learning-Scores als Feature in einem CADD-Modell die Vorhersage von RNA-Spleißing verbessern. Außerdem werden Varianteneffektmodelle aufgrund eines neuen, auf AllelhĂ€ufigkeit basierten, Trainingsdatensatz entwickelt. Diese Ergebnisse zeigen, dass der entwickelte Workflow eine skalierbare und flexible Möglichkeit ist, um Varianteneffektmodelle zu entwickeln. Alle entstandenen Scores sind unter cadd.gs.washington.edu und cadd.bihealth.org frei verfĂŒgbar.More than 6,000 diseases are estimated to be caused by genomic variants. This can happen in many possible ways: a variant may stop the translation of a protein, interfere with gene regulation, or alter splicing of the transcribed mRNA into an unwanted isoform. It is necessary to investigate all of these processes in order to evaluate which variant may be causal for the deleterious phenotype. A great help in this regard are variant effect scores. Implemented as machine learning classifiers, they integrate annotations from different resources to rank genomic variants in terms of pathogenicity. Developing a variant effect score requires multiple steps: annotation of the training data, feature selection, model training, benchmarking, and finally deployment for the model's application. Here, I present a generalized workflow of this process. It makes it simple to configure how information is converted into model features, enabling the rapid exploration of different annotations. The workflow further implements hyperparameter optimization, model validation and ultimately deployment of a selected model via genome-wide scoring of genomic variants. The workflow is applied to train Combined Annotation Dependent Depletion (CADD), a variant effect model that is scoring SNVs and InDels genome-wide. I show that the workflow can be quickly adapted to novel annotations by porting CADD to the genome reference GRCh38. Further, I demonstrate the integration of deep-neural network scores as features into a new CADD model, improving the annotation of RNA splicing events. Finally, I apply the workflow to train multiple variant effect models from training data that is based on variants selected by allele frequency. In conclusion, the developed workflow presents a flexible and scalable method to train variant effect scores. All software and developed scores are freely available from cadd.gs.washington.edu and cadd.bihealth.org

    DATA AUGMENTATION FOR SYNTHETIC APERTURE RADAR USING ALPHA BLENDING AND DEEP LAYER TRAINING

    Get PDF
    Human-based object detection in synthetic aperture RADAR (SAR) imagery is complex and technical, laboriously slow but time critical—the perfect application for machine learning (ML). Training an ML network for object detection requires very large image datasets with imbedded objects that are accurately and precisely labeled. Unfortunately, no such SAR datasets exist. Therefore, this paper proposes a method to synthesize wide field of view (FOV) SAR images by combining two existing datasets: SAMPLE, which is composed of both real and synthetic single-object chips, and MSTAR Clutter, which is composed of real wide-FOV SAR images. Synthetic objects are extracted from SAMPLE using threshold-based segmentation before being alpha-blended onto patches from MSTAR Clutter. To validate the novel synthesis method, individual object chips are created and classified using a simple convolutional neural network (CNN); testing is performed against the measured SAMPLE subset. A novel technique is also developed to investigate training activity in deep layers. The proposed data augmentation technique produces a 17% increase in the accuracy of measured SAR image classification. This improvement shows that any residual artifacts from segmentation and blending do not negatively affect ML, which is promising for future use in wide-area SAR synthesis.Outstanding ThesisMajor, United States Air ForceApproved for public release. Distribution is unlimited

    Introduction to Facial Micro Expressions Analysis Using Color and Depth Images: A Matlab Coding Approach (Second Edition, 2023)

    Full text link
    The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment. FMER is a subset of image processing and it is a multidisciplinary topic to analysis. So, it requires familiarity with other topics of Artifactual Intelligence (AI) such as machine learning, digital image processing, psychology and more. So, it is a great opportunity to write a book which covers all of these topics for beginner to professional readers in the field of AI and even without having background of AI. Our goal is to provide a standalone introduction in the field of MFER analysis in the form of theorical descriptions for readers with no background in image processing with reproducible Matlab practical examples. Also, we describe any basic definitions for FMER analysis and MATLAB library which is used in the text, that helps final reader to apply the experiments in the real-world applications. We believe that this book is suitable for students, researchers, and professionals alike, who need to develop practical skills, along with a basic understanding of the field. We expect that, after reading this book, the reader feels comfortable with different key stages such as color and depth image processing, color and depth image representation, classification, machine learning, facial micro-expressions recognition, feature extraction and dimensionality reduction. The book attempts to introduce a gentle introduction to the field of Facial Micro Expressions Recognition (FMER) using Color and Depth images, with the aid of MATLAB programming environment.Comment: This is the second edition of the boo

    Integrating materials supply in strategic mine planning of underground coal mines

    Get PDF
    In July 2005 the Australian Coal Industry’s Research Program (ACARP) commissioned Gary Gibson to identify constraints that would prevent development production rates from achieving full capacity. A “TOP 5” constraint was “The logistics of supply transport distribution and handling of roof support consumables is an issue at older extensive mines immediately while the achievement of higher development rates will compound this issue at most mines.” Then in 2020, Walker, Harvey, Baafi, Kiridena, and Porter were commissioned by ACARP to investigate Australian best practice and progress made since Gibson’s 2005 report. This report was titled: - “Benchmarking study in underground coal mining logistics.” It found that even though logistics continue to be recognised as a critical constraint across many operations particularly at a tactical / day to day level, no strategic thought had been given to logistics in underground coal mines, rather it was always assumed that logistics could keep up with any future planned design and productivity. This subsequently meant that without estimating the impact of any logistical constraint in a life of mine plan, the risk of overvaluing a mining operation is high. This thesis attempts to rectify this shortfall and has developed a system to strategically identify logistics bottlenecks and the impacts that mine planning parameters might have on these at any point in time throughout a life of mine plan. By identifying any logistics constraints as early as possible, the best opportunity to rectify the problem at the least expense is realised. At the very worst if a logistics constraint was unsolvable then it could be understood, planned for, and reflected in the mine’s ongoing financial valuations. The system developed in this thesis, using a suite of unique algorithms, is designed to “bolt onto” existing mine plans in the XPAC mine scheduling software package, and identify at a strategic level the number of material delivery loads required to maintain planned productivity for a mining operation. Once an event was identified the system then drills down using FlexSim discrete event simulation to a tactical level to confirm the predicted impact and understand if a solution can be transferred back as a long-term solution. Most importantly the system developed in this thesis was designed to communicate to multiple non-technical stakeholders through simple graphical outputs if there is a risk to planned production levels due to a logistics constraint

    Ordinal time series analysis with the R package otsfeatures

    Full text link
    The 21st century has witnessed a growing interest in the analysis of time series data. Whereas most of the literature on the topic deals with real-valued time series, ordinal time series have typically received much less attention. However, the development of specific analytical tools for the latter objects has substantially increased in recent years. The R package otsfeatures attempts to provide a set of simple functions for analyzing ordinal time series. In particular, several commands allowing the extraction of well-known statistical features and the execution of inferential tasks are available for the user. The output of several functions can be employed to perform traditional machine learning tasks including clustering, classification or outlier detection. otsfeatures also incorporates two datasets of financial time series which were used in the literature for clustering purposes, as well as three interesting synthetic databases. The main properties of the package are described and its use is illustrated through several examples. Researchers from a broad variety of disciplines could benefit from the powerful tools provided by otsfeatures

    Soliton Gas: Theory, Numerics and Experiments

    Full text link
    The concept of soliton gas was introduced in 1971 by V. Zakharov as an infinite collection of weakly interacting solitons in the framework of Korteweg-de Vries (KdV) equation. In this theoretical construction of a diluted soliton gas, solitons with random parameters are almost non-overlapping. More recently, the concept has been extended to dense gases in which solitons strongly and continuously interact. The notion of soliton gas is inherently associated with integrable wave systems described by nonlinear partial differential equations like the KdV equation or the one-dimensional nonlinear Schr\"odinger equation that can be solved using the inverse scattering transform. Over the last few years, the field of soliton gases has received a rapidly growing interest from both the theoretical and experimental points of view. In particular, it has been realized that the soliton gas dynamics underlies some fundamental nonlinear wave phenomena such as spontaneous modulation instability and the formation of rogue waves. The recently discovered deep connections of soliton gas theory with generalized hydrodynamics have broadened the field and opened new fundamental questions related to the soliton gas statistics and thermodynamics. We review the main recent theoretical and experimental results in the field of soliton gas. The key conceptual tools of the field, such as the inverse scattering transform, the thermodynamic limit of finite-gap potentials and the Generalized Gibbs Ensembles are introduced and various open questions and future challenges are discussed.Comment: 35 pages, 8 figure

    Model Diagnostics meets Forecast Evaluation: Goodness-of-Fit, Calibration, and Related Topics

    Get PDF
    Principled forecast evaluation and model diagnostics are vital in fitting probabilistic models and forecasting outcomes of interest. A common principle is that fitted or predicted distributions ought to be calibrated, ideally in the sense that the outcome is indistinguishable from a random draw from the posited distribution. Much of this thesis is centered on calibration properties of various types of forecasts. In the first part of the thesis, a simple algorithm for exact multinomial goodness-of-fit tests is proposed. The algorithm computes exact pp-values based on various test statistics, such as the log-likelihood ratio and Pearson\u27s chi-square. A thorough analysis shows improvement on extant methods. However, the runtime of the algorithm grows exponentially in the number of categories and hence its use is limited. In the second part, a framework rooted in probability theory is developed, which gives rise to hierarchies of calibration, and applies to both predictive distributions and stand-alone point forecasts. Based on a general notion of conditional T-calibration, the thesis introduces population versions of T-reliability diagrams and revisits a score decomposition into measures of miscalibration, discrimination, and uncertainty. Stable and efficient estimators of T-reliability diagrams and score components arise via nonparametric isotonic regression and the pool-adjacent-violators algorithm. For in-sample model diagnostics, a universal coefficient of determination is introduced that nests and reinterprets the classical R2R^2 in least squares regression. In the third part, probabilistic top lists are proposed as a novel type of prediction in classification, which bridges the gap between single-class predictions and predictive distributions. The probabilistic top list functional is elicited by strictly consistent evaluation metrics, based on symmetric proper scoring rules, which admit comparison of various types of predictions

    Reduction of Petri net maintenance modeling complexity via Approximate Bayesian Computation

    Get PDF
    This paper is part of the ENHAnCE ITN project (https://www.h2020-enhanceitn.eu/) funded by the European Union's Horizon 2020 research and innovation programme under the Marie SklodowskaCurie grant agreement No. 859957. The authors would like to thank the Lloyd's Register Foundation (LRF), a charitable foundation in the U.K. helping to protect life and property by supporting engineeringrelated education, public engagement, and the application of research. The authors gratefully acknowledge the support of these organizations which have enabled the research reported in this paper.The accurate modeling of engineering systems and processes using Petri nets often results in complex graph representations that are computationally intensive, limiting the potential of this modeling tool in real life applications. This paper presents a methodology to properly define the optimal structure and properties of a reduced Petri net that mimic the output of a reference Petri net model. The methodology is based on Approximate Bayesian Computation to infer the plausible values of the model parameters of the reduced model in a rigorous probabilistic way. Also, the method provides a numerical measure of the level of approximation of the reduced model structure, thus allowing the selection of the optimal reduced structure among a set of potential candidates. The suitability of the proposed methodology is illustrated using a simple illustrative example and a system reliability engineering case study, showing satisfactory results. The results also show that the method allows flexible reduction of the structure of the complex Petri net model taken as reference, and provides numerical justification for the choice of the reduced model structure.European Commission 859957Lloyd's Register Foundation (LRF), a charitable foundation in the U.K

    Discovering the hidden structure of financial markets through bayesian modelling

    Get PDF
    Understanding what is driving the price of a financial asset is a question that is currently mostly unanswered. In this work we go beyond the classic one step ahead prediction and instead construct models that create new information on the behaviour of these time series. Our aim is to get a better understanding of the hidden structures that drive the moves of each financial time series and thus the market as a whole. We propose a tool to decompose multiple time series into economically-meaningful variables to explain the endogenous and exogenous factors driving their underlying variability. The methodology we introduce goes beyond the direct model forecast. Indeed, since our model continuously adapts its variables and coefficients, we can study the time series of coefficients and selected variables. We also present a model to construct the causal graph of relations between these time series and include them in the exogenous factors. Hence, we obtain a model able to explain what is driving the move of both each specific time series and the market as a whole. In addition, the obtained graph of the time series provides new information on the underlying risk structure of this environment. With this deeper understanding of the hidden structure we propose novel ways to detect and forecast risks in the market. We investigate our results with inferences up to one month into the future using stocks, FX futures and ETF futures, demonstrating its superior performance according to accuracy of large moves, longer-term prediction and consistency over time. We also go in more details on the economic interpretation of the new variables and discuss the created graph structure of the market.Open Acces
    • 

    corecore