1,433 research outputs found

    Methods and Applications of Synthetic Data Generation

    Get PDF
    The advent of data mining and machine learning has highlighted the value of large and varied sources of data, while increasing the demand for synthetic data captures the structural and statistical characteristics of the original data without revealing personal or proprietary information contained in the original dataset. In this dissertation, we use examples from original research to show that, using appropriate models and input parameters, synthetic data that mimics the characteristics of real data can be generated with sufficient rate and quality to address the volume, structural complexity, and statistical variation requirements of research and development of digital information processing systems. First, we present a progression of research studies using a variety of tools to generate synthetic network traffic patterns, enabling us to observe relationships between network latency and communication pattern benchmarks at all levels of the network stack. We then present a framework for synthesizing large scale IoT data with complex structural characteristics in a scalable extraction and synthesis framework, and demonstrate the use of generated data in the benchmarking of IoT middleware. Finally, we detail research on synthetic image generation for deep learning models using 3D modeling. We find that synthetic images can be an effective technique for augmenting limited sets of real training data, and in use cases that benefit from incremental training or model specialization, we find that pretraining on synthetic images provided a usable base model for transfer learning

    Keyword-based search in peer-to-peer networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A subsumption hierarchy of test case prioritization for composite services

    Get PDF
    published_or_final_versio

    Enhancing Web Browsing Security

    Get PDF
    Web browsing has become an integral part of our lives, and we use browsers to perform many important activities almost everyday and everywhere. However, due to the vulnerabilities in Web browsers and Web applications and also due to Web users\u27 lack of security knowledge, browser-based attacks are rampant over the Internet and have caused substantial damage to both Web users and service providers. Enhancing Web browsing security is therefore of great need and importance.;This dissertation concentrates on enhancing the Web browsing security through exploring and experimenting with new approaches and software systems. Specifically, we have systematically studied four challenging Web browsing security problems: HTTP cookie management, phishing, insecure JavaScript practices, and browsing on untrusted public computers. We have proposed new approaches to address these problems, and built unique systems to validate our approaches.;To manage HTTP cookies, we have proposed an approach to automatically validate the usefulness of HTTP cookies at the client-side on behalf of users. By automatically removing useless cookies, our approach helps a user to strike an appropriate balance between maximizing usability and minimizing security risks. to protect against phishing attacks, we have proposed an approach to transparently feed a relatively large number of bogus credentials into a suspected phishing site. Using those bogus credentials, our approach conceals victims\u27 real credentials and enables a legitimate website to identify stolen credentials in a timely manner. to identify insecure JavaScript practices, we have proposed an execution-based measurement approach and performed a large-scale measurement study. Our work sheds light on the insecure JavaScript practices and especially reveals the severity and nature of insecure JavaScript inclusion and dynamic generation practices on the Web. to achieve secure and convenient Web browsing on untrusted public computers, we have proposed a simple approach that enables an extended browser on a mobile device and a regular browser on a public computer to collaboratively support a Web session. A user can securely perform sensitive interactions on the mobile device and conveniently perform other browsing interactions on the public computer

    Model based test suite minimization using metaheuristics

    Get PDF
    Software testing is one of the most widely used methods for quality assurance and fault detection purposes. However, it is one of the most expensive, tedious and time consuming activities in software development life cycle. Code-based and specification-based testing has been going on for almost four decades. Model-based testing (MBT) is a relatively new approach to software testing where the software models as opposed to other artifacts (i.e. source code) are used as primary source of test cases. Models are simplified representation of a software system and are cheaper to execute than the original or deployed system. The main objective of the research presented in this thesis is the development of a framework for improving the efficiency and effectiveness of test suites generated from UML models. It focuses on three activities: transformation of Activity Diagram (AD) model into Colored Petri Net (CPN) model, generation and evaluation of AD based test suite and optimization of AD based test suite. Unified Modeling Language (UML) is a de facto standard for software system analysis and design. UML models can be categorized into structural and behavioral models. AD is a behavioral type of UML model and since major revision in UML version 2.x it has a new Petri Nets like semantics. It has wide application scope including embedded, workflow and web-service systems. For this reason this thesis concentrates on AD models. Informal semantics of UML generally and AD specially is a major challenge in the development of UML based verification and validation tools. One solution to this challenge is transforming a UML model into an executable formal model. In the thesis, a three step transformation methodology is proposed for resolving ambiguities in an AD model and then transforming it into a CPN representation which is a well known formal language with extensive tool support. Test case generation is one of the most critical and labor intensive activities in testing processes. The flow oriented semantic of AD suits modeling both sequential and concurrent systems. The thesis presented a novel technique to generate test cases from AD using a stochastic algorithm. In order to determine if the generated test suite is adequate, two test suite adequacy analysis techniques based on structural coverage and mutation have been proposed. In terms of structural coverage, two separate coverage criteria are also proposed to evaluate the adequacy of the test suite from both perspectives, sequential and concurrent. Mutation analysis is a fault-based technique to determine if the test suite is adequate for detecting particular types of faults. Four categories of mutation operators are defined to seed specific faults into the mutant model. Another focus of thesis is to improve the test suite efficiency without compromising its effectiveness. One way of achieving this is identifying and removing the redundant test cases. It has been shown that the test suite minimization by removing redundant test cases is a combinatorial optimization problem. An evolutionary computation based test suite minimization technique is developed to address the test suite minimization problem and its performance is empirically compared with other well known heuristic algorithms. Additionally, statistical analysis is performed to characterize the fitness landscape of test suite minimization problems. The proposed test suite minimization solution is extended to include multi-objective minimization. As the redundancy is contextual, different criteria and their combination can significantly change the solution test suite. Therefore, the last part of the thesis describes an investigation into multi-objective test suite minimization and optimization algorithms. The proposed framework is demonstrated and evaluated using prototype tools and case study models. Empirical results have shown that the techniques developed within the framework are effective in model based test suite generation and optimizatio

    Managing Schema Change in an Heterogeneous Environment

    Get PDF
    Change is inevitable even for persistent information. Effectively managing change of persistent information, which includes the specification, execution and the maintenance of any derived information, is critical and must be addressed by all database systems. Today, for every data model there exists a well-defined set of change primitives that can alter both the structure (the schema) and the data. Several proposals also exist for incrementally propagating a primitive change to any derived information (or view). However, existing support is lacking in two ways. First, change primitives as presented in literature are very limiting in terms of their capabilities allowing users to simply add or remove schema elements. More complex types of changes such the merging or splitting of schema elements are not supported in a principled manner. Second, algorithms for maintaining derived information often do not account for the potential heterogeneity between the source and the target. The goal of this dissertation is to provide solutions that address these two key issues. The first part of this dissertation addresses the challenge of expressing a rich complex set of changes. We propose the SERF (Schema Evolution through an Extensible, Re-usable and Flexible) framework that allows users to perform a wide range of complex user-defined schema transformations. Our approach combines existing schema evolution primitives using OQL (object query language) as the glue logic. Within the context of this work, we look at the different domains in which SERF can be applied, including web site management. To further enrich our framework, we also investigate the optimization and verification of SERF transformations. The second part of this dissertation addresses the problem of maintaining views in the face of source changes when the source and the view are not in the same data model. With today\u27s increasing heterogeneity in information structure, it is critical that maintenance of views addresses the data model boundaries. However, view definitions that go across data models are limited to hard-coded algorithms, thereby making it difficult to develop general maintenance algorithms. We provide a two-step solution for this problem. We have developed a cross algebra, that defines views such that there is no restriction that forces the view and the source data models to be the same. We then define update propagation algorithms that can propagate changes from source to target irrespective of the exact translation and the data models. We validate our ideas by applying them to translation and change propagation between the XML and relational data models

    Enabling Multi-Perspective Business Process Compliance

    Get PDF
    A particular challenge for any enterprise is to ensure that its business processes conform with compliance rules, i.e., semantic constraints on the multiple perspectives of the business processes. Compliance rules stem, for example, from legal regulations, corporate best practices, domain-specific guidelines, and industrial standards. In general, compliance rules are multi-perspective, i.e., they not only restrict the process behavior (i.e. control flow), but may refer to other process perspectives (e.g. time, data, and resources) and the interactions (i.e. message exchanges) of a business process with other processes as well. The aim of this thesis is to improve the specification and verification of multi-perspective process compliance based on three contributions: 1. The extended Compliance Rule Graph (eCRG) language, which enables the visual modeling of multi-perspective compliance rules. Besides control flow, the latter may refer to the time, data, resource, and interaction perspectives of a business process. 2. A framework for multi-perspective monitoring of the compliance of running processes with a given set of eCRG compliance rules. 3. Techniques for verifying business process compliance with respect to the interaction perspective. In particular, we consider compliance verification for cross-organizational business processes, for which solely incomplete process knowledge is available. All contributions were thoroughly evaluated through proof-of-concept prototypes, case studies, empirical studies, and systematic comparisons with related works

    Knowledge visualizations: a tool to achieve optimized operational decision making and data integration

    Get PDF
    The overabundance of data created by modern information systems (IS) has led to a breakdown in cognitive decision-making. Without authoritative source data, commanders’ decision-making processes are hindered as they attempt to paint an accurate shared operational picture (SOP). Further impeding the decision-making process is the lack of proper interface interaction to provide a visualization that aids in the extraction of the most relevant and accurate data. Utilizing the DSS to present visualizations based on OLAP cube integrated data allow decision-makers to rapidly glean information and build their situation awareness (SA). This yields a competitive advantage to the organization while in garrison or in combat. Additionally, OLAP cube data integration enables analysis to be performed on an organization’s data-flows. This analysis is used to identify the critical path of data throughout the organization. Linking a decision-maker to the authoritative data along this critical path eliminates the many decision layers in a hierarchal command structure that can introduce latency or error into the decision-making process. Furthermore, the organization has an integrated SOP from which to rapidly build SA, and make effective and efficient decisions.http://archive.org/details/knowledgevisuali1094545877Outstanding ThesisOutstanding ThesisMajor, United States Marine CorpsCaptain, United States Marine CorpsApproved for public release; distribution is unlimited
    corecore