34 research outputs found

    Data warehousing technologies for large-scale and right-time data

    Get PDF

    A proof-of-proximity framework for device pairing in ubiquitous computing environments

    Get PDF
    Ad hoc interactions between devices over wireless networks in ubiquitous computing environments present a security problem: the generation of shared secrets to initialize secure communication over a medium that is inherently vulnerable to various attacks. However, these ad hoc scenarios also offer the potential for physical security of spaces and the use of protocols in which users must visibly demonstrate their presence and/or involvement to generate an association. As a consequence, recently secure device pairing has had significant attention from a wide community of academic as well as industrial researchers and a plethora of schemes and protocols have been proposed, which use various forms of out-of-band exchange to form an association between two unassociated devices. These protocols and schemes have different strengths and weaknesses – often in hardware requirements, strength against various attacks or usability in particular scenarios. From ordinary user‟s point of view, the problem then becomes which to choose or which is the best possible scheme in a particular scenario. We advocate that in a world of modern heterogeneous devices and requirements, there is a need for mechanisms that allow automated selection of the best protocols without requiring the user to have an in-depth knowledge of the minutiae of the underlying technologies. Towards this, the main argument forming the basis of this dissertation is that the integration of a discovery mechanism and several pairing schemes into a single system is more efficient from a usability point of view as well as security point of view in terms of dynamic choice of pairing schemes. In pursuit of this, we have proposed a generic system for secure device pairing by demonstration of physical proximity. Our main contribution is the design and prototype implementation of Proof-of-Proximity framework along with a novel Co- Location protocol. Other contributions include a detailed analysis of existing device pairing schemes, a simple device discovery mechanism, a protocol selection mechanism that is used to find out the best possible scheme to demonstrate the physical proximity of the devices according to the scenario, and a usability study of eight pairing schemes and the proposed system

    Designing algorithms for big graph datasets : a study of computing bisimulation and joins

    Get PDF

    Analysing sequencing data in Hadoop: The road to interactivity via SQL

    Get PDF
    Analysis of high volumes of data has always been performed with distributed computing on computer clusters. But due to rapidly increasing data amounts in, for example, DNA sequencing, new approaches to data analysis are needed. Warehouse-scale computing environments with up to tens of thousands of networked nodes may be necessary to solve future Big Data problems related to sequencing data analysis. And to utilize such systems effectively, specialized software is needed. Hadoop is a collection of software built specifically for Big Data processing, with a core consisting of the Hadoop MapReduce scalable distributed computing platform and the Hadoop Distributed File System, HDFS. This work explains the principles underlying Hadoop MapReduce and HDFS as well as certain prominent higher-level interfaces to them: Pig, Hive, and HBase. An overview of the current state of Hadoop usage in bioinformatics is then provided alongside brief introductions to the Hadoop-BAM and SeqPig projects of the author and his colleagues. Data analysis tasks are often performed interactively, exploring the data sets at hand in order to familiarize oneself with them in preparation for well targeted long-running computations. Hadoop MapReduce is optimized for throughput instead of latency, making it a poor fit for interactive use. This Thesis presents two high-level alternatives designed especially with interactive data analysis in mind: Shark and Impala, both of which are Hive-compatible SQL-based systems. Aside from the computational framework used, the format in which the data sets are stored can greatly affect analytical performance. Thus new file formats are being developed to better cope with the needs of modern and future Big Data sets. This work analyses the current state of the art storage formats used in the worlds of bioinformatics and Hadoop. Finally, this Thesis presents the results of experiments performed by the author with the goal of understanding how well the landscape of available frameworks and storage formats can tackle interactive sequencing data analysis tasks
    corecore