14 research outputs found

    Secure and efficient storage of multimedia: content in public cloud environments using joint compression and encryption

    Get PDF
    The Cloud Computing is a paradigm still with many unexplored areas ranging from the technological component to the de nition of new business models, but that is revolutionizing the way we design, implement and manage the entire infrastructure of information technology. The Infrastructure as a Service is the delivery of computing infrastructure, typically a virtual data center, along with a set of APIs that allow applications, in an automatic way, can control the resources they wish to use. The choice of the service provider and how it applies to their business model may lead to higher or lower cost in the operation and maintenance of applications near the suppliers. In this sense, this work proposed to carry out a literature review on the topic of Cloud Computing, secure storage and transmission of multimedia content, using lossless compression, in public cloud environments, and implement this system by building an application that manages data in public cloud environments (dropbox and meocloud). An application was built during this dissertation that meets the objectives set. This system provides the user a wide range of functions of data management in public cloud environments, for that the user only have to login to the system with his/her credentials, after performing the login, through the Oauth 1.0 protocol (authorization protocol) is generated an access token, this token is generated only with the consent of the user and allows the application to get access to data/user les without having to use credentials. With this token the framework can now operate and unlock the full potential of its functions. With this application is also available to the user functions of compression and encryption so that user can make the most of his/her cloud storage system securely. The compression function works using the compression algorithm LZMA being only necessary for the user to choose the les to be compressed. Relatively to encryption it will be used the encryption algorithm AES (Advanced Encryption Standard) that works with a 128 bit symmetric key de ned by user. We build the research into two distinct and complementary parts: The rst part consists of the theoretical foundation and the second part is the development of computer application where the data is managed, compressed, stored, transmitted in various environments of cloud computing. The theoretical framework is organized into two chapters, chapter 2 - Background on Cloud Storage and chapter 3 - Data compression. Sought through theoretical foundation demonstrate the relevance of the research, convey some of the pertinent theories and input whenever possible, research in the area. The second part of the work was devoted to the development of the application in cloud environment. We showed how we generated the application, presented the features, advantages, and safety standards for the data. Finally, we re ect on the results, according to the theoretical framework made in the rst part and platform development. We think that the work obtained is positive and that ts the goals we set ourselves to achieve. This research has some limitations, we believe that the time for completion was scarce and the implementation of the platform could bene t from the implementation of other features.In future research it would be appropriate to continue the project expanding the capabilities of the application, test the operation with other users and make comparative tests.A Computação em nuvem é um paradigma ainda com muitas áreas por explorar que vão desde a componente tecnológica à definição de novos modelos de negócio, mas que está a revolucionar a forma como projetamos, implementamos e gerimos toda a infraestrutura da tecnologia da informação. A Infraestrutura como Serviço representa a disponibilização da infraestrutura computacional, tipicamente um datacenter virtual, juntamente com um conjunto de APls que permitirá que aplicações, de forma automática, possam controlar os recursos que pretendem utilizar_ A escolha do fornecedor de serviços e a forma como este aplica o seu modelo de negócio poderão determinar um maior ou menor custo na operacionalização e manutenção das aplicações junto dos fornecedores. Neste sentido, esta dissertação propôs· se efetuar uma revisão bibliográfica sobre a temática da Computação em nuvem, a transmissão e o armazenamento seguro de conteúdos multimédia, utilizando a compressão sem perdas, em ambientes em nuvem públicos, e implementar um sistema deste tipo através da construção de uma aplicação que faz a gestão dos dados em ambientes de nuvem pública (dropbox e meocloud). Foi construída uma aplicação no decorrer desta dissertação que vai de encontro aos objectivos definidos. Este sistema fornece ao utilizador uma variada gama de funções de gestão de dados em ambientes de nuvem pública, para isso o utilizador tem apenas que realizar o login no sistema com as suas credenciais, após a realização de login, através do protocolo Oauth 1.0 (protocolo de autorização) é gerado um token de acesso, este token só é gerado com o consentimento do utilizador e permite que a aplicação tenha acesso aos dados / ficheiros do utilizador ~em que seja necessário utilizar as credenciais. Com este token a aplicação pode agora operar e disponibilizar todo o potencial das suas funções. Com esta aplicação é também disponibilizado ao utilizador funções de compressão e encriptação de modo a que possa usufruir ao máximo do seu sistema de armazenamento cloud com segurança. A função de compressão funciona utilizando o algoritmo de compressão LZMA sendo apenas necessário que o utilizador escolha os ficheiros a comprimir. Relativamente à cifragem utilizamos o algoritmo AES (Advanced Encryption Standard) que funciona com uma chave simétrica de 128bits definida pelo utilizador. Alicerçámos a investigação em duas partes distintas e complementares: a primeira parte é composta pela fundamentação teórica e a segunda parte consiste no desenvolvimento da aplicação informática em que os dados são geridos, comprimidos, armazenados, transmitidos em vários ambientes de computação em nuvem. A fundamentação teórica encontra-se organizada em dois capítulos, o capítulo 2 - "Background on Cloud Storage" e o capítulo 3 "Data Compression", Procurámos, através da fundamentação teórica, demonstrar a pertinência da investigação. transmitir algumas das teorias pertinentes e introduzir, sempre que possível, investigações existentes na área. A segunda parte do trabalho foi dedicada ao desenvolvimento da aplicação em ambiente "cloud". Evidenciámos o modo como gerámos a aplicação, apresentámos as funcionalidades, as vantagens. Por fim, refletimos sobre os resultados , de acordo com o enquadramento teórico efetuado na primeira parte e o desenvolvimento da plataforma. Pensamos que o trabalho obtido é positivo e que se enquadra nos objetivos que nos propusemos atingir. Este trabalho de investigação apresenta algumas limitações, consideramos que o tempo para a sua execução foi escasso e a implementação da plataforma poderia beneficiar com a implementação de outras funcionalidades. Em investigações futuras seria pertinente dar continuidade ao projeto ampliando as potencialidades da aplicação, testar o funcionamento com outros utilizadores e efetuar testes comparativos.Fundação para a Ciência e a Tecnologia (FCT

    Representation and Exploitation of Event Sequences

    Get PDF
    Programa Oficial de Doutoramento en Computación . 5009V01[Abstract] The Ten Commandments, the thirty best smartphones in the market and the five most wanted people by the FBI. Our life is ruled by sequences: thought sequences, number sequences, event sequences. . . a history book is nothing more than a compilation of events and our favorite film is just a sequence of scenes. All of them have something in common, it is possible to acquire relevant information from them. Frequently, by accumulating some data from the elements of each sequence we may access hidden information (e.g. the passengers transported by a bus on a journey is the sum of the passengers who got on in the sequence of stops made); other times, reordering the elements by any of their characteristics facilitates the access to the elements of interest (e.g. the publication of books in 2019 can be ordered chronologically, by author, by literary genre or even by a combination of characteristics); but it will always be sought to store them in the smallest space possible. Thus, this thesis proposes technological solutions for the storage and subsequent processing of events, focusing specifically on three fundamental aspects that can be found in any application that needs to manage them: compressed and dynamic storage, aggregation or accumulation of elements of the sequence and element sequence reordering by their different characteristics or dimensions. The first contribution of this work is a compact structure for the dynamic compression of event sequences. This structure allows any sequence to be compressed in a single pass, that is, it is capable of compressing in real time as elements arrive. This contribution is a milestone in the world of compression since, to date, this is the first proposal for a variable-to-variable dynamic compressor for general purpose. Regarding aggregation, a data warehouse-like proposal is presented capable of storing information on any characteristic of the events in a sequence in an aggregated, compact and accessible way. Following the philosophy of current data warehouses, we avoid repeating cumulative operations and speed up aggregate queries by preprocessing the information and keeping it in this separate structure. Finally, this thesis addresses the problem of indexing event sequences considering their different characteristics and possible reorderings. A new approach for simultaneously keeping the elements of a sequence ordered by different characteristics is presented through compact structures. Thus, it is possible to consult the information and perform operations on the elements of the sequence using any possible rearrangement in a simple and efficient way.[Resumen] Los diez mandamientos, los treinta mejores móviles del mercado y las cinco personas más buscadas por el FBI. Nuestra vida está gobernada por secuencias: secuencias de pensamientos, secuencias de números, secuencias de eventos. . . un libro de historia no es más que una sucesión de eventos y nuestra película favorita no es sino una secuencia de escenas. Todas ellas tienen algo en común, de todas podemos extraer información relevante. A veces, al acumular algún dato de los elementos de cada secuencia accedemos a información oculta (p. ej. los viajeros transportados por un autobús en un trayecto es la suma de los pasajeros que se subieron en la secuencia de paradas realizadas); otras veces, la reordenación de los elementos por alguna de sus características facilita el acceso a los elementos de interés (p. ej. la publicación de obras literarias en 2019 puede ordenarse cronológicamente, por autor, por género literario o incluso por una combinación de características); pero siempre se buscará almacenarlas en el espacio más reducido posible sin renunciar a su contenido. Por ello, esta tesis propone soluciones tecnológicas para el almacenamiento y posterior procesamiento de secuencias, centrándose concretamente en tres aspectos fundamentales que se pueden encontrar en cualquier aplicación que precise gestionarlas: el almacenamiento comprimido y dinámico, la agregación o acumulación de algún dato sobre los elementos de la secuencia y la reordenación de los elementos de la secuencia por sus diferentes características o dimensiones. La primera contribución de este trabajo es una estructura compacta para la compresión dinámica de secuencias. Esta estructura permite comprimir cualquier secuencia en una sola pasada, es decir, es capaz de comprimir en tiempo real a medida que llegan los elementos de la secuencia. Esta aportación es un hito en el mundo de la compresión ya que, hasta la fecha, es la primera propuesta de un compresor dinámico “variable to variable” de carácter general. En cuanto a la agregación, se presenta una propuesta de almacén de datos capaz de guardar la información acumulada sobre alguna característica de los eventos de la secuencia de modo compacto y fácilmente accesible. Siguiendo la filosofía de los actuales almacenes de datos, el objetivo es evitar repetir operaciones de acumulación y agilizar las consultas agregadas mediante el preprocesado de la información manteniéndola en esta estructura. Por último, esta tesis aborda el problema de la indexación de secuencias de eventos considerando sus diferentes características y posibles reordenaciones. Se presenta una nueva forma de mantener simultáneamente ordenados los elementos de una secuencia por diferentes características a través de estructuras compactas. Así se permite consultar la información y realizar operaciones sobre los elementos de la secuencia usando cualquier posible ordenación de una manera sencilla y eficiente

    New data structures and algorithms for the efficient management of large spatial datasets

    Get PDF
    [Resumen] En esta tesis estudiamos la representación eficiente de matrices multidimensionales, presentando nuevas estructuras de datos compactas para almacenar y procesar grids en distintos ámbitos de aplicación. Proponemos varias estructuras de datos estáticas y dinámicas para la representación de matrices binarias o de enteros y estudiamos aplicaciones a la representación de datos raster en Sistemas de Información Geográfica, bases de datos RDF, etc. En primer lugar proponemos una colección de estructuras de datos estáticas para la representación de matrices binarias y de enteros: 1) una nueva representación de matrices binarias con grandes grupos de valores uniformes, con aplicaciones a la representación de datos raster binarios; 2) una nueva estructura de datos para representar matrices multidimensionales; 3) una nueva estructura de datos para representar matrices de enteros con soporte para consultas top-k de rango. También proponemos una nueva representación dinámica de matrices binarias, una nueva estructura de datos que proporciona las mismas funcionalidades que nuestras propuestas estáticas pero también soporta cambios en la matriz. Nuestras estructuras de datos pueden utilizarse en distintos dominios. Proponemos variantes específicas y combinaciones de nuestras propuestas para representar grafos temporales, bases de datos RDF, datos raster binarios o generales y datos raster temporales. También proponemos un nuevo algoritmo para consultar conjuntamente un conjuto de datos raster (almacenado usando nuestras propuestas) y un conjunto de datos vectorial almacenado en una estructura de datos clásica, mostrando que nuestra propuesta puede ser más rápida y usar menos espacio que otras alternativas. Nuestras representaciones proporcionan interesantes trade-offs y son competitivas en espacio y tiempos de consulta con representaciones habituales en los diferentes dominios.[Resumo] Nesta tese estudiamos a representación eficiente de matrices multidimensionais, presentando novas estruturas de datos compactas para almacenar e procesar grids en distintos ámbitos de aplicación. Propoñemos varias estruturas de datos estáticas e dinámicas para a representación de matrices binarias ou de enteiros e estudiamos aplicacións á representación de datos raster en Sistemas de Información Xeográfica, bases de datos RDF, etc. En primeiro lugar propoñemos unha colección de estruturas de datos estáticas para a representación de matrices binarias e de enteiros: 1) unha nova representación de matrices binarias con grandes grupos de valores uniformes, con aplicacións á representación de datos raster binarios; 2) unha nova estrutura de datos para representar matrices multidimensionais; 3) unha nova estrutura de datos para representar matrices de enteiros con soporte para consultas top-k. Tamén propoñemos unha nova representación dinámica de matrices binarias, unha nova estrutura de datos que proporciona as mesmas funcionalidades que as nosas propostas estáticas pero tamén soporta cambios na matriz. As nosas estruturas de datos poden utilizarse en distintos dominios. Propoñemos variantes específicas e combinacións das nosas propostas para representar grafos temporais, bases de datos RDF, datos raster binarios ou xerais e datos raster temporais. Tamén propoñemos un novo algoritmo para consultar conxuntamente datos raster (almacenados usando as nosas propostas) con datos vectoriais almacenados nunha estrutura de datos clásica, amosando que a nosa proposta pode ser máis rápida e usar menos espazo que outras alternativas. As nosas representacións proporcionan interesantes trade-offs e son competitivas en espazo e tempos de consulta con representacións habituais nos diferentes dominios.[Abstract] In this thesis we study the efficient representation of multidimensional grids, presenting new compact data structures to store and query grids in different application domains. We propose several static and dynamic data structures for the representation of binary grids and grids of integers, and study applications to the representation of raster data in Geographic Information Systems, RDF databases, etc. We first propose a collection of static data structures for the representation of binary grids and grids of integers: 1) a new representation of bi-dimensional binary grids with large clusters of uniform values, with applications to the representation of binary raster data; 2) a new data structure to represent multidimensional binary grids; 3) a new data structure to represent grids of integers with support for top-k range queries. We also propose a new dynamic representation of binary grids, a new data structure that provides the same functionalities that our static representations of binary grids but also supports changes in the grid. Our data structures can be used in several application domains. We propose specific variants and combinations of our generic proposals to represent temporal graphs, RDF databases, OLAP databases, binary or general raster data, and temporal raster data. We also propose a new algorithm to jointly query a raster dataset (stored using our representations) and a vectorial dataset stored in a classic data structure, showing that our proposal can be faster and require less space than the usual alternatives. Our representations provide interesting trade-offs and are competitive in terms of space and query times with usual representations in the different domains

    Multimedia

    Get PDF
    The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications

    Succinct and Self-Indexed Data Structures for the Exploitation and Representation of Moving Objects

    Get PDF
    Programa Oficial de Doutoramento en Computación . 5009V01[Abstract] This thesis deals with the efficient representation and exploitation of trajectories of objects that move in space without any type of restriction (airplanes, birds, boats, etc.). Currently, this is a very relevant problem due to the proliferation of GPS devices, which makes it possible to collect a large number of trajectories. However, until now there is no efficient way to properly store and exploit them. In this thesis, we propose eight structures that meet two fundamental objectives. First, they are capable of storing space-time data, describing the trajectories, in a reduced space, so that their exploitation takes advantage of the memory hierarchy. Second, those structures allow exploiting the information by object queries, given an object, they retrieve the position or trajectory of that object along that time; or space-time range queries, given a region of space and a time interval, the objects that are within the region at that time are obtained. It should be noted that state-of-the-art solutions are only capable of efficiently answering one of the two types of queries. All of these data structures have a common nexus, they all use two elements: snapshots and logs. Each snapshot works as a spatial index that periodically indexes the absolute position of each object or the Minimum Bounding Rectangle (MBR) of its trajectory. They serve to speed up the spatio-temporal range queries. We have implemented two types of snapshots: based on k2-trees or R-trees. With respect to the log, it represents the trajectory (sequence of movements) of each object. It is the main element of the structures, and facilitates the resolution of object and spatio-temporal range queries. Four strategies have been implemented to represent the log in a compressed form: ScdcCT, GraCT, ContaCT and RCT. With the combination of these two elements we build eight different structures for the representation of trajectories. All of them have been implemented and evaluated experimentally, showing that they reduce the space required by traditional methods by up to two orders of magnitude. Furthermore, they are all competitive in solving object queries as well as spatial-temporal ones.[Resumen] Esta tesis aborda la representación y explotación eficiente de trayectorias de objetos que se mueven en el espacio sin ningún tipo de restricción (aviones, pájaros, barcos, etc.). En la actualidad, este es un problema muy relevante debido a la proliferación de dispositivos GPS, lo que permite coleccionar una gran cantidad de trayectorias. Sin embargo, hasta ahora no existe un modo eficiente para almacenarlas y explotarlas adecuadamente. Esta tesis propone ocho estructuras que cumplen con dos objetivos fundamentales. En primer lugar, son capaces de almacenar en espacio reducido los datos espaciotemporales, que describen las trayectorias, de modo que su explotación saque partido a la jerarquía de memoria. En segundo lugar, las estructuras permiten explotar la información realizando consultas sobre objetos, dado el objeto se calcula su posición o trayectoria durante un intervalo de tiempo; o consultas de rango espacio-temporal, dada una región del espacio y un intervalo de tiempo se obtienen los objetos que estaban dentro de la región en ese tiempo. Hay que destacar que las soluciones del estado del arte solo son capaces de responder eficientemente uno de los dos tipos de consultas. Todas estas estructuras de datos tienen un nexo común, todas ellas usan dos elementos: snapshots y logs. Cada snapshot funciona como un índice espacial que periódicamente indexa la posición absoluta de cada objeto o el Minimum Bounding Rectangle (MBR) de su trayectoria. Sirven para agilizar las consultas de rango espacio-temporal. Hemos implementado dos tipos de snapshot: basadas en k2-trees o en R-trees. Con respecto al log, éste representa la trayectoria (secuencia de movimientos) de cada objeto. Es el principal elemento de nuestras estructuras, y facilita la resolución de consultas de objeto y de rango espacio-temporal. Se han implementado cuatro estrategias para representar el log de forma comprimida: ScdcCT, GraCT, ContaCT y RCT. Con la combinación de estos dos elementos construimos ocho estructuras diferentes para la representación de trayectorias. Todas ellas han sido implementadas y evaluadas experimentalmente, donde reducen hasta dos órdenes de magnitud el espacio que requieren los métodos tradicionales. Además, todas ellas son competitivas resolviendo tanto consultas de objeto como de rango espacio-temporal.[Resumo] Esta tese trata sobre a representación e explotación eficiente de traxectorias de obxectos que se moven no espazo sen ningún tipo de restrición (avións, paxaros, buques, etc.). Na actualidade, este é un problema moi relevante debido á proliferación de dispositivos GPS, o que fai posible a recollida dun gran número de traxectorias. Non obstante, ata o de agora non existe un xeito eficiente de almacenalos e explotalos. Esta tese propón oito estruturas que cumpren dous obxectivos fundamentais. En primeiro lugar, son capaces de almacenar datos espazo-temporais, que describen as traxectorias, nun espazo reducido, de xeito que a súa explotación aproveita a xerarquía da memoria. En segundo lugar, as estruturas permiten explotar a información realizando consultas de obxectos, dado o obxecto calcúlase a súa posición ou traxectoria nun período de tempo; ou consultas de rango espazo-temporal, dada unha rexión de espazo e un intervalo de tempo, obtéñense os obxectos que estaban dentro da rexión nese momento. Cómpre salientar que as solucións do estado do arte só son capaces de responder eficientemente a un dos dous tipos de consultas. Todas estas estruturas de datos teñen unha ligazón común, empregan dous elementos: snapshots e logs. Cada snapshot funciona como un índice espacial que indexa periodicamente a posición absoluta de cada obxecto ou o Minimum Bounding Rectangle (MBR) da súa traxectoria. Serven para acelerar as consultas de rango espazo-temporal. Implementamos dous tipos de snapshot: baseadas en k2-trees ou en R-trees. Con respecto ao log, este representa a traxectoria (secuencia de movementos) de cada obxecto. É o principal elemento das nosas estruturas, e facilita a resolución de consultas sobre obxectos e de rango espacio-temporal. Implementáronse catro estratexias para representar o log nunha forma comprimida: ScdcCT, GraCT, ContaCT e RCT. Coa combinación destes dous elementos construímos oito estruturas diferentes para a representación de traxectorias. Todas elas foron implementadas e avaliadas experimentalmente, onde reducen ata dúas ordes de magnitude o espazo requirido polos métodos tradicionais. Ademais, todas elas son competitivas para resolver tanto consultas de obxectos como espazo-temporais

    Sublinear Computation Paradigm

    Get PDF
    This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms

    Entropy in Image Analysis II

    Get PDF
    Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas

    A vector symbolic approach for cognitive services and decentralized workflows

    Get PDF
    The proliferation of smart devices and sensors known as the Internet of Things (IoT), along with the transformation of mobile phones into powerful handheld computers as well as the continuing advancement in high-speed communication technologies, introduces new possibilities for collaborative distributed computing and collaborative workflows along with a new set of problems to be solved. However, traditional service-based applications, in fixed networks, are typically constructed and managed centrally and assume stable service endpoints and adequate network connectivity. Constructing and maintaining such applications in dynamic heterogeneous wireless networked environments, where limited bandwidth and transient connectivity are commonplace, presents significant challenges and makes centralized application construction and management impossible. The key objective for this thesis can be summarised as follows: a means is required to discover and orchestrate sequences of micro-services, i.e., workflows, on-demand, using currently available distributed resources (compute devices, functional services, data and sensors) in spite of a poor quality (fragmented, low bandwidth) network infrastructure and without central control. It is desirable to be able to compose such workflows on-the-fly in order to fulfil an ‘intent’. The research undertaken investigates how service definition, service matching and decentralised service composition and orchestration can be achieved without centralised control using an approach based on a Binary Spatter Code Vector Symbolic Architec-ture and shows that the approach offers significant advantages in environments where communication networks are unreliable. The outcomes demonstrate a new cognitive workflow model that uses one-to-many communications to enable intelligent cooperation between self-describing service entities that can self-organise to complete a workflow task. Workflow orchestration overhead was minimised using two innovations, a local arbitration mechanism that uses a delayed response mechanism to suppress responses that are not an ideal match and the holographic nature of VSA descriptions enables messages to be truncated without loss of meaning. A new hierarchical VSA encoding scheme was created that is scaleable to any number of vector embeddings including workflow steps. The encoding can also facilitate learning since it provides unique contexts for each step in a workflow. The encoding also enables service pre-provisioning because individual workflow steps can be decoded easily by any service receiving a multicast workflow vector. This thesis brings the state-of-the-art closer to the ability to discover distributed services on-the-fly to fulfil an intent and without the need for centralised management or the imperative definition of all service steps, including locations. The use of a mathematically deterministic distributed vector representation in the form of BSC vectors for both service objects and workflows enables a common language for all elements required to discover and execute workflows in decentralised transient environments and opens up the possibilities of employing learning algorithms that can advance the state-of-the-art in distributed workflows towards a true cognitive distributed network architectur

    Scalable succinct indexing for large text collections

    Get PDF
    Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems

    Advanced Cryptographic Techniques for Protecting Log Data

    Get PDF
    This thesis examines cryptographic techniques providing security for computer log files. It focuses on ensuring authenticity and integrity, i.e. the properties of having been created by a specific entity and being unmodified. Confidentiality, the property of being unknown to unauthorized entities, will be considered, too, but with less emphasis. Computer log files are recordings of actions performed and events encountered in computer systems. While the complexity of computer systems is steadily growing, it is increasingly difficult to predict how a given system will behave under certain conditions, or to retrospectively reconstruct and explain which events and conditions led to a specific behavior. Computer log files help to mitigate the problem of retracing a system’s behavior retrospectively by providing a (usually chronological) view of events and actions encountered in a system. Authenticity and integrity of computer log files are widely recognized security requirements, see e.g. [Latham, ed., "Department of Defense Trusted Computer System Evaluation Criteria", 1985, p. 10], [Kent and Souppaya, "Guide to Computer Security Log Management", NIST Special Publication 800-92, 2006, Section 2.3.2], [Guttman and Roback, "An Introduction to Computer Security: The NIST Handbook", superseded NIST Special Publication 800-12, 1995, Section 18.3.1], [Nieles et al., "An Introduction to Information Security" , NIST Special Publication 800-12, 2017, Section 9.3], [Common Criteria Editorial Board, ed., "Common Criteria for Information Technology Security Evaluation", Part 2, Section 8.6]. Two commonly cited ways to ensure integrity of log files are to store log data on so-called write-once-read-many-times (WORM) drives and to immediately print log records on a continuous-feed printer. This guarantees that log data cannot be retroactively modified by an attacker without physical access to the storage medium. However, such special-purpose hardware may not always be a viable option for the application at hand, for example because it may be too costly. In such cases, the integrity and authenticity of log records must be ensured via other means, e.g. with cryptographic techniques. Although these techniques cannot prevent the modification of log data, they can offer strong guarantees that modifications will be detectable, while being implementable in software. Furthermore, cryptography can be used to achieve public verifiability of log files, which may be needed in applications that have strong transparency requirements. Cryptographic techniques can even be used in addition to hardware solutions, providing protection against attackers who do have physical access to the logging hardware, such as insiders. Cryptographic schemes for protecting stored log data need to be resilient against attackers who obtain control over the computer storing the log data. If this computer operates in a standalone fashion, it is an absolute requirement for the cryptographic schemes to offer security even in the event of a key compromise. As this is impossible with standard cryptographic tools, cryptographic solutions for protecting log data typically make use of forward-secure schemes, guaranteeing that changes to log data recorded in the past can be detected. Such schemes use a sequence of authentication keys instead of a single one, where previous keys cannot be computed efficiently from latter ones. This thesis considers the following requirements for, and desirable features of, cryptographic logging schemes: 1) security, i.e. the ability to reliably detect violations of integrity and authenticity, including detection of log truncations, 2) efficiency regarding both computational and storage overhead, 3) robustness, i.e. the ability to verify unmodified log entries even if others have been illicitly changed, and 4) verifiability of excerpts, including checking an excerpt for omissions. The goals of this thesis are to devise new techniques for the construction of cryptographic schemes that provide security for computer log files, to give concrete constructions of such schemes, to develop new models that can accurately capture the security guarantees offered by the new schemes, as well as to examine the security of previously published schemes. This thesis demands that cryptographic schemes for securely storing log data must be able to detect if log entries have been deleted from a log file. A special case of deletion is log truncation, where a continuous subsequence of log records from the end of the log file is deleted. Obtaining truncation resistance, i.e. the ability to detect truncations, is one of the major difficulties when designing cryptographic logging schemes. This thesis alleviates this problem by introducing a novel technique to detect log truncations without the help of third parties or designated logging hardware. Moreover, this work presents new formal security notions capturing truncation resistance. The technique mentioned above is applied to obtain cryptographic logging schemes which can be shown to satisfy these notions under mild assumptions, making them the first schemes with formally proven truncation security. Furthermore, this thesis develops a cryptographic scheme for the protection of log files which can support the creation of excerpts. For this thesis, an excerpt is a (not necessarily contiguous) subsequence of records from a log file. Excerpts created with the scheme presented in this thesis can be publicly checked for integrity and authenticity (as explained above) as well as for completeness, i.e. the property that no relevant log entry has been omitted from the excerpt. Excerpts provide a natural way to preserve the confidentiality of information that is contained in a log file, but not of interest for a specific public analysis of the log file, enabling the owner of the log file to meet confidentiality and transparency requirements at the same time. The scheme demonstrates and exemplifies the technique for obtaining truncation security mentioned above. Since cryptographic techniques to safeguard log files usually require authenticating log entries individually, some researchers [Ma and Tsudik, "A New Approach to Secure Logging", LNCS 5094, 2008; Ma and Tsudik, "A New Approach to Secure Logging", ACM TOS 2009; Yavuz and Peng, "BAF: An Efficient Publicly Verifiable Secure Audit Logging Scheme for Distributed Systems", ACSAC 2009] have proposed using aggregatable signatures [Boneh et al., "Aggregate and Verifiably Encrypted Signatures from Bilinear Maps", EUROCRYPT 2003] in order to reduce the overhead in storage space incurred by using such a cryptographic scheme. Aggregation of signatures refers to some “combination” of any number of signatures (for distinct or equal messages, by distinct or identical signers) into an “aggregate” signature. The size of the aggregate signature should be less than the total of the sizes of the orginal signatures, ideally the size of one of the original signatures. Using aggregation of signatures in applications that require storing or transmitting a large number of signatures (such as the storage of log records) can lead to significant reductions in the use of storage space and bandwidth. However, aggregating the signatures for all log records into a single signature will cause some fragility: The modification of a single log entry will render the aggregate signature invalid, preventing the cryptographic verification of any part of the log file. However, being able to distinguish manipulated log entries from non-manipulated ones may be of importance for after-the-fact investigations. This thesis addresses this issue by presenting a new technique providing a trade-off between storage overhead and robustness, i.e. the ability to tolerate some modifications to the log file while preserving the cryptographic verifiability of unmodified log entries. This robustness is achieved by the use of a special kind of aggregate signatures (called fault-tolerant aggregate signatures), which contain some redundancy. The construction makes use of combinatorial methods guaranteeing that if the number of errors is below a certain threshold, then there will be enough redundancy to identify and verify the non-modified log entries. Finally, this thesis presents a total of four attacks on three different schemes intended for securely storing log files presented in the literature [Yavuz et al., "Efficient, Compromise Resilient and Append-Only Cryptographic Schemes for Secure Audit Logging", Financial Cryptography 2012; Ma, "Practical Forward Secure Sequential Aggregate Signatures", ASIACCS 2008]. The attacks allow for virtually arbitrary log file forgeries or even recovery of the secret key used for authenticating the log file, which could then be used for mostly arbitrary log file forgeries, too. All of these attacks exploit weaknesses of the specific schemes. Three of the attacks presented here contradict the security properties of the schemes claimed and supposedly proven by the respective authors. This thesis briefly discusses these proofs and points out their flaws. The fourth attack presented here is outside of the security model considered by the scheme’s authors, but nonetheless presents a realistic threat. In summary, this thesis advances the scientific state-of-the-art with regard to providing security for computer log files in a number of ways: by introducing a new technique for obtaining security against log truncations, by providing the first scheme where excerpts from log files can be verified for completeness, by describing the first scheme that can achieve some notion of robustness while being able to aggregate log record signatures, and by analyzing the security of previously proposed schemes
    corecore