52 research outputs found

    Modular design of data-parallel graph algorithms

    Get PDF
    Amorphous Data Parallelism has proven to be a suitable vehicle for implementing concurrent graph algorithms effectively on multi-core architectures. In view of the growing complexity of graph algorithms for information analysis, there is a need to facilitate modular design techniques in the context of Amorphous Data Parallelism. In this paper, we investigate what it takes to formulate algorithms possessing Amorphous Data Parallelism in a modular fashion enabling a large degree of code re-use. Using the betweenness centrality algorithm, a widely popular algorithm in the analysis of social networks, we demonstrate that a single optimisation technique can suffice to enable a modular programming style without loosing the efficiency of a tailor-made monolithic implementation

    Adaptive Constraint Solving for Information Flow Analysis

    Get PDF
    In program analysis, unknown properties for terms are typically represented symbolically as variables. Bound constraints on these variables can then specify multiple optimisation goals for computer programs and nd application in areas such as type theory, security, alias analysis and resource reasoning. Resolution of bound constraints is a problem steeped in graph theory; interdependencies between the variables is represented as a constraint graph. Additionally, constants are introduced into the system as concrete bounds over these variables and constants themselves are ordered over a lattice which is, once again, represented as a graph. Despite graph algorithms being central to bound constraint solving, most approaches to program optimisation that use bound constraint solving have treated their graph theoretic foundations as a black box. Little has been done to investigate the computational costs or design e cient graph algorithms for constraint resolution. Emerging examples of these lattices and bound constraint graphs, particularly from the domain of language-based security, are showing that these graphs and lattices are structurally diverse and could be arbitrarily large. Therefore, there is a pressing need to investigate the graph theoretic foundations of bound constraint solving. In this thesis, we investigate the computational costs of bound constraint solving from a graph theoretic perspective for Information Flow Analysis (IFA); IFA is a sub- eld of language-based security which veri es whether con dentiality and integrity of classified information is preserved as it is manipulated by a program. We present a novel framework based on graph decomposition for solving the (atomic) bound constraint problem for IFA. Our approach enables us to abstract away from connections between individual vertices to those between sets of vertices in both the constraint graph and an accompanying security lattice which defines ordering over constants. Thereby, we are able to achieve significant speedups compared to state-of-the-art graph algorithms applied to bound constraint solving. More importantly, our algorithms are highly adaptive in nature and seamlessly adapt to the structure of the constraint graph and the lattice. The computational costs of our approach is a function of the latent scope of decomposition in the constraint graph and the lattice; therefore, we enjoy the fastest runtime for every point in the structure-spectrum of these graphs and lattices. While the techniques in this dissertation are developed with IFA in mind, they can be extended to other application of the bound constraints problem, such as type inference and program analysis frameworks which use annotated type systems, where constants are ordered over a lattice

    Directed Acyclic Graphs

    Get PDF
    This code is copyright (2015) by the University of Hertfordshire and is made available to third parties for research or private study, criticism or review, and for the purpose of reporting the state of the art, under the normal fair use/fair dealing exceptions in Sections 29 and 30 of the Copyright, Designs and Patents Act 1988. Use of the code under this provision is limited to non-commercial use: please contact us if you wish to arrange a licence covering commercial use of the code.This source code implements a unified framework for pre-processing Directed Acyclic Graphs (DAGs) to lookup reachability between two vertices as well as compute the least upper bound of two vertices in constant time. Our framework builds on the adaptive pre-processing algorithm for constant time reachability lookups and extends this to compute the least upper bound of a vertex-pair in constant time. The theoretical details of this work can be found in the research paper which is available at http://uhra.herts.ac.uk/handle/2299/1215

    AWSomePy:A Dataset and Characterization of Serverless Applications

    Get PDF

    Do Names Echo Semantics? A Large-Scale Study of Identifiers Used in C++'s Named Casts

    Full text link
    Developers relax restrictions on a type to reuse methods with other types. While type casts are prevalent, in weakly typed languages such as C++, they are also extremely permissive. If type conversions are performed without care, they can lead to software bugs. Therefore, there is a clear need to check whether a type conversion is essential and used adequately according to the developer's intent. In this paper, we propose a technique to judge the fidelity of type conversions from an explicit cast operation, using the identifiers in an assignment. We measure accord in the identifiers using entropy and use it to check if the semantics of the source expression in the cast match the semantics of the variable it is being assigned. We present the results of running our tool on 34 components of the Chromium project, which collectively account for 27MLOC. Our tool identified 1,368 cases of discord indicating potential anti-patterns in the usage of explicit casts. We performed a manual evaluation of a random-uniform sample of these cases. Our evaluation shows that our tool identified 25.6% cases representing incorrect implementations of named casts and 28.04% cases representing imprecise names of identifiers.Comment: The manuscript has 21 pages and it contains 22 Figures and a table. The preprint is submitted and currently under review at Journal of Systems and Software Elsevie

    Design of Phase Locked Loop

    Get PDF
    In the optical communication in a backbone infra structure, flexibility means, for example, programmable bitrates requiring a PLL with robust operation over a wide range of frequency range. A wide range PLL could be used by different protocols and applications so that we maximize the reusability and reduce time to market. In this report we try to present an extended frequency CMOS monolithic VCO design. A negative feedback control algorithm is used to automatically adjust the VCO range according to control voltage. Based on this analog feedback control algorithm, the VCO achieves a wide range without any pre-register settings. Here we discuss about different component of PLL (Phase Lock Loop), mainly on Phase Frequency Detectors and VCO (voltage controlled oscillator). Here we proposed different architecture of Phase frequency detectors and also of VCOs and designed many architecture in mentor graphics

    Transcend:Detecting Concept Drift in Malware Classification Models

    Get PDF
    Building machine learning models of malware behavior is widely accepted as a panacea towards effective malware classification. A crucial requirement for building sustainable learning models, though, is to train on a wide variety of malware samples. Unfortunately, malware evolves rapidly and it thus becomes hard—if not impossible—to generalize learning models to reflect future, previously-unseen behaviors. Consequently, most malware classifiers become unsustainable in the long run, becoming rapidly antiquated as malware continues to evolve. In this work, we propose Transcend, a framework to identify aging classification models in vivo during deployment, much before the machine learning model’s performance starts to degrade. This is a significant departure from conventional approaches that retrain aging models retrospectively when poor performance is observed. Our approach uses a statistical comparison of samples seen during deployment with those used to train the model, thereby building metrics for prediction quality. We show how Transcend can be used to identify concept drift based on two separate case studies on Android andWindows malware, raising a red flag before the model starts making consistently poor decisions due to out-of-date training

    DroidSieve:Fast and Accurate Classification of Obfuscated Android Malware

    Get PDF
    With more than two million applications, Android marketplaces require automatic and scalable methods to efficiently vet apps for the absence of malicious threats. Recent techniques have successfully relied on the extraction of lightweight syntactic features suitable for machine learning classification, but despite their promising results, the very nature of such features suggest they would unlikely-on their own-be suitable for detecting obfuscated Android malware. To address this challenge, we propose DroidSieve, an Android malware classifier based on static analysis that is fast, accurate, and resilient to obfuscation. For a given app, DroidSieve first decides whether the app is malicious and, if so, classifies it as belonging to a family of related malware. DroidSieve exploits obfuscation-invariant features and artifacts introduced by obfuscation mechanisms used in malware. At the same time, these purely static features are designed for processing at scale and can be extracted quickly. For malware detection, we achieve up to 99.82% accuracy with zero false positives; for family identification of obfuscated malware, we achieve 99.26% accuracy at a fraction of the computational cost of state-of-The-Art techniques

    Euphony:Harmonious Unification of Cacophonous Anti-Virus Vendor Labels for Android Malware

    Get PDF
    Android malware is now pervasive and evolving rapidly. Thousands of malware samples are discovered every day with new models of attacks. The growth of these threats has come hand in hand with the proliferation of collective repositories sharing the latest specimens. Having access to a large number of samples opens new research directions aiming at efficiently vetting apps. However, automatically inferring a reference ground-truth from those repositories is not straightforward and can inadvertently lead to unforeseen misconceptions. On the one hand, samples are often mis-labeled as different parties use distinct naming schemes for the same sample. On the other hand, samples are frequently mis-classified due to conceptual errors made during labeling processes. In this paper, we analyze the associations between all labels given by different vendors and we propose a system called EUPHONY to systematically unify common samples into family groups. The key novelty of our approach is that no a-priori knowledge on malware families is needed. We evaluate our approach using reference datasets and more than 0.4 million additional samples outside of these datasets. Results show that EUPHONY provides competitive performance against the state-of-the-art
    corecore