12,776 research outputs found

    Rumble: Data Independence for Large Messy Data Sets

    Full text link
    This paper introduces Rumble, an engine that executes JSONiq queries on large, heterogeneous and nested collections of JSON objects, leveraging the parallel capabilities of Spark so as to provide a high degree of data independence. The design is based on two key insights: (i) how to map JSONiq expressions to Spark transformations on RDDs and (ii) how to map JSONiq FLWOR clauses to Spark SQL on DataFrames. We have developed a working implementation of these mappings showing that JSONiq can efficiently run on Spark to query billions of objects into, at least, the TB range. The JSONiq code is concise in comparison to Spark's host languages while seamlessly supporting the nested, heterogeneous data sets that Spark SQL does not. The ability to process this kind of input, commonly found, is paramount for data cleaning and curation. The experimental analysis indicates that there is no excessive performance loss, occasionally even a gain, over Spark SQL for structured data, and a performance gain over PySpark. This demonstrates that a language such as JSONiq is a simple and viable approach to large-scale querying of denormalized, heterogeneous, arborescent data sets, in the same way as SQL can be leveraged for structured data sets. The results also illustrate that Codd's concept of data independence makes as much sense for heterogeneous, nested data sets as it does on highly structured tables.Comment: Preprint, 9 page

    Achieving Obfuscation Through Self-Modifying Code: A Theoretical Model

    Get PDF
    With the extreme amount of data and software available on networks, the protection of online information is one of the most important tasks of this technological age. There is no such thing as safe computing, and it is inevitable that security breaches will occur. Thus, security professionals and practices focus on two areas: security, preventing a breach from occurring, and resiliency, minimizing the damages once a breach has occurred. One of the most important practices for adding resiliency to source code is through obfuscation, a method of re-writing the code to a form that is virtually unreadable. This makes the code incredibly hard to decipher by attackers, protecting intellectual property and reducing the amount of information gained by the malicious actor. Achieving obfuscation through the use of self-modifying code, code that mutates during runtime, is a complicated but impressive undertaking that creates an incredibly robust obfuscating system. While there is a great amount of research that is still ongoing, the preliminary results of this subject suggest that the application of self-modifying code to obfuscation may yield self-maintaining software capable of healing itself following an attack

    Indexing Information for Data Forensics

    Get PDF
    We introduce novel techniques for organizing the indexing structures of how data is stored so that alterations from an original version can be detected and the changed values specifically identified. We give forensic constructions for several fundamental data structures, including arrays, linked lists, binary search trees, skip lists, and hash tables. Some of our constructions are based on a new reduced-randomness construction for nonadaptive combinatorial group testing

    Non-malleable secret sharing against joint tampering attacks

    Get PDF
    Since thousands of years ago, the goal of cryptography has been to hide messages from prying eyes. In recent times, cryptography two important changes: first, cryptography itself evolved from just being about encryption to a broader class of situations coming from the digital era; second, the way of studying cryptography evolved from creating ``seemingly hard'' cryptographic schemes to constructing schemes which are provably secure. However, once the mathematical abstraction of cryptographic primitives started to be too hard to break, attackers found another way to defeat security. Side channel attacks have been proved to be very effective in this task, breaking the security of otherwise provably secure schemes. Because of this, recent trends in cryptography aim to capture this situation and construct schemes that are secure even against such powerful attacks. In this setting, this thesis specializes in the study of secret sharing, an important cryptographic primitive that allows to balance privacy and integrity of data and also has applications to multi-party protocols. Namely, continuing the trend which aims to protect against side channel attacks, this thesis brings some contributions to the state of the art of the so-called leakage-resilient and non-malleable secret sharing schemes, which have stronger guarantees against attackers that are able to learn information from possibly all the shares and even tamper with the shares and see the effects of the tampering. The main contributions of this thesis are twofold. First, we construct secret sharing schemes that are secure against a very powerful class of attacks which, informally, allows the attacker to jointly leak some information and tamper with the shares in a continuous fashion. Second, we study the capacity of continuously non-malleable secret sharing schemes, that is, the maximum achievable information rate. Roughly speaking, we find some lower bounds to the size that the shares must have in order to achieve some forms of non-malleability

    Exploiting loop transformations for the protection of software

    Get PDF
    Il software conserva la maggior parte del know-how che occorre per svilupparlo. Poich\ue9 oggigiorno il software pu\uf2 essere facilmente duplicato e ridistribuito ovunque, il rischio che la propriet\ue0 intellettuale venga violata su scala globale \ue8 elevato. Una delle pi\uf9 interessanti soluzioni a questo problema \ue8 dotare il software di un watermark. Ai watermark si richiede non solo di certificare in modo univoco il proprietario del software, ma anche di essere resistenti e pervasivi. In questa tesi riformuliamo i concetti di robustezza e pervasivit\ue0 a partire dalla semantica delle tracce. Evidenziamo i cicli quali costrutti di programmazione pervasivi e introduciamo le trasformazioni di ciclo come mattone di costruzione per schemi di watermarking pervasivo. Passiamo in rassegna alcune fra tali trasformazioni, studiando i loro principi di base. Infine, sfruttiamo tali principi per costruire una tecnica di watermarking pervasivo. La robustezza rimane una difficile, quanto affascinante, questione ancora da risolvere.Software retains most of the know-how required fot its development. Because nowadays software can be easily cloned and spread worldwide, the risk of intellectual property infringement on a global scale is high. One of the most viable solutions to this problem is to endow software with a watermark. Good watermarks are required not only to state unambiguously the owner of software, but also to be resilient and pervasive. In this thesis we base resiliency and pervasiveness on trace semantics. We point out loops as pervasive programming constructs and we introduce loop transformations as the basic block of pervasive watermarking schemes. We survey several loop transformations, outlining their underlying principles. Then we exploit these principles to build some pervasive watermarking techniques. Resiliency still remains a big and challenging open issue

    Overlay networks for smart grids

    Get PDF

    Contextual biometric watermarking of fingerprint images

    Get PDF
    This research presents contextual digital watermarking techniques using face and demographic text data as multiple watermarks for protecting the evidentiary integrity of fingerprint image. The proposed techniques embed the watermarks into selected regions of fingerprint image in MDCT and DWT domains. A general image watermarking algorithm is developed to investigate the application of MDCT in the elimination of blocking artifacts. The application of MDCT has improved the performance of the watermarking technique compared to DCT. Experimental results show that modifications to fingerprint image are visually imperceptible and maintain the minutiae detail. The integrity of the fingerprint image is verified through high matching score obtained from the AFIS system. There is also a high degree of correlation between the embedded and extracted watermarks. The degree of similarity is computed using pixel-based metrics and human visual system metrics. It is useful for personal identification and establishing digital chain of custody. The results also show that the proposed watermarking technique is resilient to common image modifications that occur during electronic fingerprint transmission
    • …
    corecore