29,984 research outputs found

    BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking

    Full text link
    Data generation is a key issue in big data benchmarking that aims to generate application-specific data sets to meet the 4V requirements of big data. Specifically, big data generators need to generate scalable data (Volume) of different types (Variety) under controllable generation rates (Velocity) while keeping the important characteristics of raw data (Veracity). This gives rise to various new challenges about how we design generators efficiently and successfully. To date, most existing techniques can only generate limited types of data and support specific big data systems such as Hadoop. Hence we develop a tool, called Big Data Generator Suite (BDGS), to efficiently generate scalable big data while employing data models derived from real data to preserve data veracity. The effectiveness of BDGS is demonstrated by developing six data generators covering three representative data types (structured, semi-structured and unstructured) and three data sources (text, graph, and table data)

    DeepMasterPrints: Generating MasterPrints for Dictionary Attacks via Latent Variable Evolution

    Full text link
    Recent research has demonstrated the vulnerability of fingerprint recognition systems to dictionary attacks based on MasterPrints. MasterPrints are real or synthetic fingerprints that can fortuitously match with a large number of fingerprints thereby undermining the security afforded by fingerprint systems. Previous work by Roy et al. generated synthetic MasterPrints at the feature-level. In this work we generate complete image-level MasterPrints known as DeepMasterPrints, whose attack accuracy is found to be much superior than that of previous methods. The proposed method, referred to as Latent Variable Evolution, is based on training a Generative Adversarial Network on a set of real fingerprint images. Stochastic search in the form of the Covariance Matrix Adaptation Evolution Strategy is then used to search for latent input variables to the generator network that can maximize the number of impostor matches as assessed by a fingerprint recognizer. Experiments convey the efficacy of the proposed method in generating DeepMasterPrints. The underlying method is likely to have broad applications in fingerprint security as well as fingerprint synthesis.Comment: 8 pages; added new verification systems and diagrams. Accepted to conference Biometrics: Theory, Applications, and Systems 201

    Performance Testing of Distributed Component Architectures

    Get PDF
    Performance characteristics, such as response time, throughput andscalability, are key quality attributes of distributed applications. Current practice,however, rarely applies systematic techniques to evaluate performance characteristics.We argue that evaluation of performance is particularly crucial in early developmentstages, when important architectural choices are made. At first glance, thiscontradicts the use of testing techniques, which are usually applied towards the endof a project. In this chapter, we assume that many distributed systems are builtwith middleware technologies, such as the Java 2 Enterprise Edition (J2EE) or theCommon Object Request Broker Architecture (CORBA). These provide servicesand facilities whose implementations are available when architectures are defined.We also note that it is the middleware functionality, such as transaction and persistenceservices, remote communication primitives and threading policy primitives,that dominates distributed system performance. Drawing on these observations, thischapter presents a novel approach to performance testing of distributed applications.We propose to derive application-specific test cases from architecture designs so thatthe performance of a distributed application can be tested based on the middlewaresoftware at early stages of a development process. We report empirical results thatsupport the viability of the approach

    Benchmarking Distributed Stream Data Processing Systems

    Full text link
    The need for scalable and efficient stream analysis has led to the development of many open-source streaming data processing systems (SDPSs) with highly diverging capabilities and performance characteristics. While first initiatives try to compare the systems for simple workloads, there is a clear gap of detailed analyses of the systems' performance characteristics. In this paper, we propose a framework for benchmarking distributed stream processing engines. We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink. Our evaluation focuses in particular on measuring the throughput and latency of windowed operations, which are the basic type of operations in stream analytics. For this benchmark, we design workloads based on real-life, industrial use-cases inspired by the online gaming industry. The contribution of our work is threefold. First, we give a definition of latency and throughput for stateful operators. Second, we carefully separate the system under test and driver, in order to correctly represent the open world model of typical stream processing deployments and can, therefore, measure system performance under realistic conditions. Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems. Our detailed evaluation highlights the individual characteristics and use-cases of each system.Comment: Published at ICDE 201
    • ā€¦
    corecore