14 research outputs found
Secure and efficient storage of multimedia: content in public cloud environments using joint compression and encryption
The Cloud Computing is a paradigm still with many unexplored areas ranging from the
technological component to the de nition of new business models, but that is revolutionizing the way we design, implement and manage the entire infrastructure of information technology.
The Infrastructure as a Service is the delivery of computing infrastructure, typically a virtual data center, along with a set of APIs that allow applications, in an automatic way, can control the resources they wish to use. The choice of the service provider and how it applies to their business model may lead to higher or lower cost in the operation and maintenance of applications near the suppliers.
In this sense, this work proposed to carry out a literature review on the topic of Cloud
Computing, secure storage and transmission of multimedia content, using lossless compression, in public cloud environments, and implement this system by building an application that manages data in public cloud environments (dropbox and meocloud).
An application was built during this dissertation that meets the objectives set. This system provides the user a wide range of functions of data management in public cloud environments, for that the user only have to login to the system with his/her credentials, after performing the login, through the Oauth 1.0 protocol (authorization protocol) is generated an access token, this token is generated only with the consent of the user and allows the application to get access to data/user les without having to use credentials. With this token the framework can now operate and unlock the full potential of its functions. With this application
is also available to the user functions of compression and encryption so that user can make the most of his/her cloud storage system securely. The compression function works using the compression algorithm LZMA being only necessary for the user to choose the les to be compressed.
Relatively to encryption it will be used the encryption algorithm AES (Advanced Encryption Standard) that works with a 128 bit symmetric key de ned by user.
We build the research into two distinct and complementary parts: The rst part consists
of the theoretical foundation and the second part is the development of computer application where the data is managed, compressed, stored, transmitted in various environments of cloud computing. The theoretical framework is organized into two chapters, chapter 2 - Background
on Cloud Storage and chapter 3 - Data compression.
Sought through theoretical foundation demonstrate the relevance of the research, convey some of the pertinent theories and input whenever possible, research in the area. The second part of the work was devoted to the development of the application in cloud environment.
We showed how we generated the application, presented the features, advantages, and
safety standards for the data. Finally, we re ect on the results, according to the theoretical
framework made in the rst part and platform development.
We think that the work obtained is positive and that ts the goals we set ourselves
to achieve. This research has some limitations, we believe that the time for completion was scarce and the implementation of the platform could bene t from the implementation of other features.In future research it would be appropriate to continue the project expanding the capabilities
of the application, test the operation with other users and make comparative tests.A Computação em nuvem é um paradigma ainda com muitas áreas por explorar que
vão desde a componente tecnológica à definição de novos modelos de negócio, mas que está
a revolucionar a forma como projetamos, implementamos e gerimos toda a infraestrutura da
tecnologia da informação.
A Infraestrutura como Serviço representa a disponibilização da infraestrutura computacional,
tipicamente um datacenter virtual, juntamente com um conjunto de APls que permitirá
que aplicações, de forma automática, possam controlar os recursos que pretendem utilizar_ A
escolha do fornecedor de serviços e a forma como este aplica o seu modelo de negócio poderão
determinar um maior ou menor custo na operacionalização e manutenção das aplicações junto
dos fornecedores.
Neste sentido, esta dissertação propôs· se efetuar uma revisão bibliográfica sobre a
temática da Computação em nuvem, a transmissão e o armazenamento seguro de conteúdos
multimédia, utilizando a compressão sem perdas, em ambientes em nuvem públicos, e implementar
um sistema deste tipo através da construção de uma aplicação que faz a gestão dos
dados em ambientes de nuvem pública (dropbox e meocloud).
Foi construída uma aplicação no decorrer desta dissertação que vai de encontro aos objectivos
definidos. Este sistema fornece ao utilizador uma variada gama de funções de gestão
de dados em ambientes de nuvem pública, para isso o utilizador tem apenas que realizar o login
no sistema com as suas credenciais, após a realização de login, através do protocolo Oauth 1.0
(protocolo de autorização) é gerado um token de acesso, este token só é gerado com o consentimento
do utilizador e permite que a aplicação tenha acesso aos dados / ficheiros do utilizador
~em que seja necessário utilizar as credenciais. Com este token a aplicação pode agora operar e
disponibilizar todo o potencial das suas funções. Com esta aplicação é também disponibilizado
ao utilizador funções de compressão e encriptação de modo a que possa usufruir ao máximo
do seu sistema de armazenamento cloud com segurança. A função de compressão funciona
utilizando o algoritmo de compressão LZMA sendo apenas necessário que o utilizador escolha os
ficheiros a comprimir. Relativamente à cifragem utilizamos o algoritmo AES (Advanced Encryption
Standard) que funciona com uma chave simétrica de 128bits definida pelo utilizador.
Alicerçámos a investigação em duas partes distintas e complementares: a primeira parte
é composta pela fundamentação teórica e a segunda parte consiste no desenvolvimento da aplicação
informática em que os dados são geridos, comprimidos, armazenados, transmitidos em
vários ambientes de computação em nuvem. A fundamentação teórica encontra-se organizada
em dois capítulos, o capítulo 2 - "Background on Cloud Storage" e o capítulo 3 "Data Compression",
Procurámos, através da fundamentação teórica, demonstrar a pertinência da investigação. transmitir algumas das teorias pertinentes e introduzir, sempre que possível, investigações
existentes na área. A segunda parte do trabalho foi dedicada ao desenvolvimento da
aplicação em ambiente "cloud". Evidenciámos o modo como gerámos a aplicação, apresentámos
as funcionalidades, as vantagens. Por fim, refletimos sobre os resultados , de acordo com o
enquadramento teórico efetuado na primeira parte e o desenvolvimento da plataforma.
Pensamos que o trabalho obtido é positivo e que se enquadra nos objetivos que nos propusemos
atingir. Este trabalho de investigação apresenta algumas limitações, consideramos que
o tempo para a sua execução foi escasso e a implementação da plataforma poderia beneficiar
com a implementação de outras funcionalidades. Em investigações futuras seria pertinente dar continuidade ao projeto ampliando as potencialidades da aplicação, testar o funcionamento
com outros utilizadores e efetuar testes comparativos.Fundação para a Ciência e a Tecnologia (FCT
Representation and Exploitation of Event Sequences
Programa Oficial de Doutoramento en Computación . 5009V01[Abstract]
The Ten Commandments, the thirty best smartphones in the market and
the five most wanted people by the FBI. Our life is ruled by sequences:
thought sequences, number sequences, event sequences. . . a history book
is nothing more than a compilation of events and our favorite film is
just a sequence of scenes. All of them have something in common, it
is possible to acquire relevant information from them. Frequently, by
accumulating some data from the elements of each sequence we may
access hidden information (e.g. the passengers transported by a bus
on a journey is the sum of the passengers who got on in the sequence
of stops made); other times, reordering the elements by any of their
characteristics facilitates the access to the elements of interest (e.g. the
publication of books in 2019 can be ordered chronologically, by author,
by literary genre or even by a combination of characteristics); but it
will always be sought to store them in the smallest space possible.
Thus, this thesis proposes technological solutions for the storage
and subsequent processing of events, focusing specifically on three
fundamental aspects that can be found in any application that needs
to manage them: compressed and dynamic storage, aggregation
or accumulation of elements of the sequence and element sequence
reordering by their different characteristics or dimensions.
The first contribution of this work is a compact structure for the
dynamic compression of event sequences. This structure allows any
sequence to be compressed in a single pass, that is, it is capable of
compressing in real time as elements arrive. This contribution is
a milestone in the world of compression since, to date, this is the
first proposal for a variable-to-variable dynamic compressor for general purpose.
Regarding aggregation, a data warehouse-like proposal is presented
capable of storing information on any characteristic of the events in a
sequence in an aggregated, compact and accessible way. Following the
philosophy of current data warehouses, we avoid repeating cumulative
operations and speed up aggregate queries by preprocessing the
information and keeping it in this separate structure.
Finally, this thesis addresses the problem of indexing event sequences
considering their different characteristics and possible reorderings. A new
approach for simultaneously keeping the elements of a sequence ordered
by different characteristics is presented through compact structures.
Thus, it is possible to consult the information and perform operations
on the elements of the sequence using any possible rearrangement in a
simple and efficient way.[Resumen]
Los diez mandamientos, los treinta mejores móviles del mercado y las
cinco personas más buscadas por el FBI. Nuestra vida está gobernada
por secuencias: secuencias de pensamientos, secuencias de números,
secuencias de eventos. . . un libro de historia no es más que una sucesión
de eventos y nuestra película favorita no es sino una secuencia de
escenas. Todas ellas tienen algo en común, de todas podemos extraer
información relevante. A veces, al acumular algún dato de los elementos
de cada secuencia accedemos a información oculta (p. ej. los viajeros
transportados por un autobús en un trayecto es la suma de los pasajeros
que se subieron en la secuencia de paradas realizadas); otras veces, la
reordenación de los elementos por alguna de sus características facilita
el acceso a los elementos de interés (p. ej. la publicación de obras
literarias en 2019 puede ordenarse cronológicamente, por autor, por
género literario o incluso por una combinación de características); pero
siempre se buscará almacenarlas en el espacio más reducido posible sin
renunciar a su contenido.
Por ello, esta tesis propone soluciones tecnológicas para el almacenamiento
y posterior procesamiento de secuencias, centrándose
concretamente en tres aspectos fundamentales que se pueden encontrar
en cualquier aplicación que precise gestionarlas: el almacenamiento
comprimido y dinámico, la agregación o acumulación de algún dato
sobre los elementos de la secuencia y la reordenación de los elementos
de la secuencia por sus diferentes características o dimensiones.
La primera contribución de este trabajo es una estructura compacta
para la compresión dinámica de secuencias. Esta estructura permite
comprimir cualquier secuencia en una sola pasada, es decir, es capaz de comprimir en tiempo real a medida que llegan los elementos de la
secuencia. Esta aportación es un hito en el mundo de la compresión ya
que, hasta la fecha, es la primera propuesta de un compresor dinámico
“variable to variable” de carácter general.
En cuanto a la agregación, se presenta una propuesta de almacén
de datos capaz de guardar la información acumulada sobre alguna
característica de los eventos de la secuencia de modo compacto y
fácilmente accesible. Siguiendo la filosofía de los actuales almacenes de
datos, el objetivo es evitar repetir operaciones de acumulación y agilizar
las consultas agregadas mediante el preprocesado de la información
manteniéndola en esta estructura.
Por último, esta tesis aborda el problema de la indexación de
secuencias de eventos considerando sus diferentes características y
posibles reordenaciones. Se presenta una nueva forma de mantener
simultáneamente ordenados los elementos de una secuencia por diferentes
características a través de estructuras compactas. Así se permite
consultar la información y realizar operaciones sobre los elementos
de la secuencia usando cualquier posible ordenación de una manera
sencilla y eficiente
New data structures and algorithms for the efficient management of large spatial datasets
[Resumen] En esta tesis estudiamos la representación eficiente de matrices multidimensionales,
presentando nuevas estructuras de datos compactas para almacenar y procesar
grids en distintos ámbitos de aplicación. Proponemos varias estructuras de datos
estáticas y dinámicas para la representación de matrices binarias o de enteros
y estudiamos aplicaciones a la representación de datos raster en Sistemas de
Información Geográfica, bases de datos RDF, etc.
En primer lugar proponemos una colección de estructuras de datos estáticas para
la representación de matrices binarias y de enteros: 1) una nueva representación
de matrices binarias con grandes grupos de valores uniformes, con aplicaciones
a la representación de datos raster binarios; 2) una nueva estructura de datos
para representar matrices multidimensionales; 3) una nueva estructura de datos
para representar matrices de enteros con soporte para consultas top-k de rango.
También proponemos una nueva representación dinámica de matrices binarias, una
nueva estructura de datos que proporciona las mismas funcionalidades que nuestras
propuestas estáticas pero también soporta cambios en la matriz.
Nuestras estructuras de datos pueden utilizarse en distintos dominios. Proponemos
variantes específicas y combinaciones de nuestras propuestas para representar
grafos temporales, bases de datos RDF, datos raster binarios o generales y
datos raster temporales. También proponemos un nuevo algoritmo para consultar
conjuntamente un conjuto de datos raster (almacenado usando nuestras propuestas)
y un conjunto de datos vectorial almacenado en una estructura de datos clásica,
mostrando que nuestra propuesta puede ser más rápida y usar menos espacio que
otras alternativas. Nuestras representaciones proporcionan interesantes trade-offs y
son competitivas en espacio y tiempos de consulta con representaciones habituales
en los diferentes dominios.[Resumo] Nesta tese estudiamos a representación eficiente de matrices multidimensionais,
presentando novas estruturas de datos compactas para almacenar e procesar grids
en distintos ámbitos de aplicación. Propoñemos varias estruturas de datos estáticas
e dinámicas para a representación de matrices binarias ou de enteiros e estudiamos
aplicacións á representación de datos raster en Sistemas de Información Xeográfica,
bases de datos RDF, etc.
En primeiro lugar propoñemos unha colección de estruturas de datos estáticas
para a representación de matrices binarias e de enteiros: 1) unha nova representación
de matrices binarias con grandes grupos de valores uniformes, con aplicacións
á representación de datos raster binarios; 2) unha nova estrutura de datos
para representar matrices multidimensionais; 3) unha nova estrutura de datos
para representar matrices de enteiros con soporte para consultas top-k. Tamén
propoñemos unha nova representación dinámica de matrices binarias, unha nova
estrutura de datos que proporciona as mesmas funcionalidades que as nosas
propostas estáticas pero tamén soporta cambios na matriz.
As nosas estruturas de datos poden utilizarse en distintos dominios. Propoñemos
variantes específicas e combinacións das nosas propostas para representar grafos temporais,
bases de datos RDF, datos raster binarios ou xerais e datos raster temporais.
Tamén propoñemos un novo algoritmo para consultar conxuntamente datos raster
(almacenados usando as nosas propostas) con datos vectoriais almacenados nunha
estrutura de datos clásica, amosando que a nosa proposta pode ser máis rápida e
usar menos espazo que outras alternativas. As nosas representacións proporcionan
interesantes trade-offs e son competitivas en espazo e tempos de consulta con
representacións habituais nos diferentes dominios.[Abstract] In this thesis we study the efficient representation of multidimensional grids,
presenting new compact data structures to store and query grids in different
application domains. We propose several static and dynamic data structures for the
representation of binary grids and grids of integers, and study applications to the
representation of raster data in Geographic Information Systems, RDF databases,
etc.
We first propose a collection of static data structures for the representation of
binary grids and grids of integers: 1) a new representation of bi-dimensional binary
grids with large clusters of uniform values, with applications to the representation
of binary raster data; 2) a new data structure to represent multidimensional binary
grids; 3) a new data structure to represent grids of integers with support for top-k
range queries. We also propose a new dynamic representation of binary grids, a new
data structure that provides the same functionalities that our static representations
of binary grids but also supports changes in the grid.
Our data structures can be used in several application domains. We propose
specific variants and combinations of our generic proposals to represent temporal
graphs, RDF databases, OLAP databases, binary or general raster data, and
temporal raster data. We also propose a new algorithm to jointly query a raster
dataset (stored using our representations) and a vectorial dataset stored in a classic
data structure, showing that our proposal can be faster and require less space than
the usual alternatives. Our representations provide interesting trade-offs and are
competitive in terms of space and query times with usual representations in the
different domains
Multimedia
The nowadays ubiquitous and effortless digital data capture and processing capabilities offered by the majority of devices, lead to an unprecedented penetration of multimedia content in our everyday life. To make the most of this phenomenon, the rapidly increasing volume and usage of digitised content requires constant re-evaluation and adaptation of multimedia methodologies, in order to meet the relentless change of requirements from both the user and system perspectives. Advances in Multimedia provides readers with an overview of the ever-growing field of multimedia by bringing together various research studies and surveys from different subfields that point out such important aspects. Some of the main topics that this book deals with include: multimedia management in peer-to-peer structures & wireless networks, security characteristics in multimedia, semantic gap bridging for multimedia content and novel multimedia applications
Succinct and Self-Indexed Data Structures for the Exploitation and Representation of Moving Objects
Programa Oficial de Doutoramento en Computación . 5009V01[Abstract]
This thesis deals with the efficient representation and exploitation of trajectories of
objects that move in space without any type of restriction (airplanes, birds, boats,
etc.). Currently, this is a very relevant problem due to the proliferation of GPS
devices, which makes it possible to collect a large number of trajectories. However,
until now there is no efficient way to properly store and exploit them.
In this thesis, we propose eight structures that meet two fundamental objectives.
First, they are capable of storing space-time data, describing the trajectories, in a
reduced space, so that their exploitation takes advantage of the memory hierarchy.
Second, those structures allow exploiting the information by object queries, given
an object, they retrieve the position or trajectory of that object along that time; or
space-time range queries, given a region of space and a time interval, the objects
that are within the region at that time are obtained. It should be noted that
state-of-the-art solutions are only capable of efficiently answering one of the two
types of queries.
All of these data structures have a common nexus, they all use two elements:
snapshots and logs. Each snapshot works as a spatial index that periodically indexes
the absolute position of each object or the Minimum Bounding Rectangle (MBR) of
its trajectory. They serve to speed up the spatio-temporal range queries. We have
implemented two types of snapshots: based on k2-trees or R-trees.
With respect to the log, it represents the trajectory (sequence of movements) of
each object. It is the main element of the structures, and facilitates the resolution
of object and spatio-temporal range queries. Four strategies have been implemented
to represent the log in a compressed form: ScdcCT, GraCT, ContaCT and RCT.
With the combination of these two elements we build eight different structures for
the representation of trajectories. All of them have been implemented and evaluated
experimentally, showing that they reduce the space required by traditional methods
by up to two orders of magnitude. Furthermore, they are all competitive in solving
object queries as well as spatial-temporal ones.[Resumen]
Esta tesis aborda la representación y explotación eficiente de trayectorias de objetos
que se mueven en el espacio sin ningún tipo de restricción (aviones, pájaros, barcos,
etc.). En la actualidad, este es un problema muy relevante debido a la proliferación
de dispositivos GPS, lo que permite coleccionar una gran cantidad de trayectorias.
Sin embargo, hasta ahora no existe un modo eficiente para almacenarlas y explotarlas
adecuadamente.
Esta tesis propone ocho estructuras que cumplen con dos objetivos fundamentales.
En primer lugar, son capaces de almacenar en espacio reducido los datos espaciotemporales,
que describen las trayectorias, de modo que su explotación saque partido
a la jerarquía de memoria.
En segundo lugar, las estructuras permiten explotar la información realizando
consultas sobre objetos, dado el objeto se calcula su posición o trayectoria durante
un intervalo de tiempo; o consultas de rango espacio-temporal, dada una región del
espacio y un intervalo de tiempo se obtienen los objetos que estaban dentro de la
región en ese tiempo. Hay que destacar que las soluciones del estado del arte solo
son capaces de responder eficientemente uno de los dos tipos de consultas.
Todas estas estructuras de datos tienen un nexo común, todas ellas usan dos
elementos: snapshots y logs. Cada snapshot funciona como un índice espacial que
periódicamente indexa la posición absoluta de cada objeto o el Minimum Bounding
Rectangle (MBR) de su trayectoria. Sirven para agilizar las consultas de rango
espacio-temporal. Hemos implementado dos tipos de snapshot: basadas en k2-trees
o en R-trees.
Con respecto al log, éste representa la trayectoria (secuencia de movimientos) de
cada objeto. Es el principal elemento de nuestras estructuras, y facilita la resolución
de consultas de objeto y de rango espacio-temporal. Se han implementado cuatro
estrategias para representar el log de forma comprimida: ScdcCT, GraCT, ContaCT
y RCT.
Con la combinación de estos dos elementos construimos ocho estructuras diferentes
para la representación de trayectorias. Todas ellas han sido implementadas y
evaluadas experimentalmente, donde reducen hasta dos órdenes de magnitud el
espacio que requieren los métodos tradicionales. Además, todas ellas son competitivas resolviendo tanto consultas de objeto como de rango espacio-temporal.[Resumo]
Esta tese trata sobre a representación e explotación eficiente de traxectorias de
obxectos que se moven no espazo sen ningún tipo de restrición (avións, paxaros,
buques, etc.). Na actualidade, este é un problema moi relevante debido á proliferación
de dispositivos GPS, o que fai posible a recollida dun gran número de traxectorias.
Non obstante, ata o de agora non existe un xeito eficiente de almacenalos e explotalos.
Esta tese propón oito estruturas que cumpren dous obxectivos fundamentais. En
primeiro lugar, son capaces de almacenar datos espazo-temporais, que describen
as traxectorias, nun espazo reducido, de xeito que a súa explotación aproveita a
xerarquía da memoria.
En segundo lugar, as estruturas permiten explotar a información realizando
consultas de obxectos, dado o obxecto calcúlase a súa posición ou traxectoria nun
período de tempo; ou consultas de rango espazo-temporal, dada unha rexión de
espazo e un intervalo de tempo, obtéñense os obxectos que estaban dentro da rexión
nese momento. Cómpre salientar que as solucións do estado do arte só son capaces
de responder eficientemente a un dos dous tipos de consultas.
Todas estas estruturas de datos teñen unha ligazón común, empregan dous
elementos: snapshots e logs. Cada snapshot funciona como un índice espacial que
indexa periodicamente a posición absoluta de cada obxecto ou o Minimum Bounding
Rectangle (MBR) da súa traxectoria. Serven para acelerar as consultas de rango
espazo-temporal. Implementamos dous tipos de snapshot: baseadas en k2-trees ou
en R-trees.
Con respecto ao log, este representa a traxectoria (secuencia de movementos) de
cada obxecto. É o principal elemento das nosas estruturas, e facilita a resolución
de consultas sobre obxectos e de rango espacio-temporal. Implementáronse catro
estratexias para representar o log nunha forma comprimida: ScdcCT, GraCT,
ContaCT e RCT.
Coa combinación destes dous elementos construímos oito estruturas diferentes
para a representación de traxectorias. Todas elas foron implementadas e avaliadas
experimentalmente, onde reducen ata dúas ordes de magnitude o espazo requirido
polos métodos tradicionais. Ademais, todas elas son competitivas para resolver tanto
consultas de obxectos como espazo-temporais
Sublinear Computation Paradigm
This open access book gives an overview of cutting-edge work on a new paradigm called the “sublinear computation paradigm,” which was proposed in the large multiyear academic research project “Foundations of Innovative Algorithms for Big Data.” That project ran from October 2014 to March 2020, in Japan. To handle the unprecedented explosion of big data sets in research, industry, and other areas of society, there is an urgent need to develop novel methods and approaches for big data analysis. To meet this need, innovative changes in algorithm theory for big data are being pursued. For example, polynomial-time algorithms have thus far been regarded as “fast,” but if a quadratic-time algorithm is applied to a petabyte-scale or larger big data set, problems are encountered in terms of computational resources or running time. To deal with this critical computational and algorithmic bottleneck, linear, sublinear, and constant time algorithms are required. The sublinear computation paradigm is proposed here in order to support innovation in the big data era. A foundation of innovative algorithms has been created by developing computational procedures, data structures, and modelling techniques for big data. The project is organized into three teams that focus on sublinear algorithms, sublinear data structures, and sublinear modelling. The work has provided high-level academic research results of strong computational and algorithmic interest, which are presented in this book. The book consists of five parts: Part I, which consists of a single chapter on the concept of the sublinear computation paradigm; Parts II, III, and IV review results on sublinear algorithms, sublinear data structures, and sublinear modelling, respectively; Part V presents application results. The information presented here will inspire the researchers who work in the field of modern algorithms
Entropy in Image Analysis II
Image analysis is a fundamental task for any application where extracting information from images is required. The analysis requires highly sophisticated numerical and analytical methods, particularly for those applications in medicine, security, and other fields where the results of the processing consist of data of vital importance. This fact is evident from all the articles composing the Special Issue "Entropy in Image Analysis II", in which the authors used widely tested methods to verify their results. In the process of reading the present volume, the reader will appreciate the richness of their methods and applications, in particular for medical imaging and image security, and a remarkable cross-fertilization among the proposed research areas
A vector symbolic approach for cognitive services and decentralized workflows
The proliferation of smart devices and sensors known as the Internet of Things (IoT),
along with the transformation of mobile phones into powerful handheld computers
as well as the continuing advancement in high-speed communication technologies,
introduces new possibilities for collaborative distributed computing and collaborative
workflows along with a new set of problems to be solved.
However, traditional service-based applications, in fixed networks, are typically constructed and managed centrally and assume stable service endpoints and adequate network connectivity. Constructing and maintaining such applications in dynamic heterogeneous wireless networked environments, where limited bandwidth and transient
connectivity are commonplace, presents significant challenges and makes centralized
application construction and management impossible.
The key objective for this thesis can be summarised as follows: a means is required
to discover and orchestrate sequences of micro-services, i.e., workflows, on-demand,
using currently available distributed resources (compute devices, functional services,
data and sensors) in spite of a poor quality (fragmented, low bandwidth) network infrastructure and without central control. It is desirable to be able to compose such
workflows on-the-fly in order to fulfil an ‘intent’.
The research undertaken investigates how service definition, service matching and decentralised service composition and orchestration can be achieved without centralised
control using an approach based on a Binary Spatter Code Vector Symbolic Architec-ture and shows that the approach offers significant advantages in environments where
communication networks are unreliable.
The outcomes demonstrate a new cognitive workflow model that uses one-to-many
communications to enable intelligent cooperation between self-describing service entities that can self-organise to complete a workflow task. Workflow orchestration overhead was minimised using two innovations, a local arbitration mechanism that uses a
delayed response mechanism to suppress responses that are not an ideal match and the
holographic nature of VSA descriptions enables messages to be truncated without loss
of meaning. A new hierarchical VSA encoding scheme was created that is scaleable
to any number of vector embeddings including workflow steps. The encoding can also
facilitate learning since it provides unique contexts for each step in a workflow. The
encoding also enables service pre-provisioning because individual workflow steps can
be decoded easily by any service receiving a multicast workflow vector.
This thesis brings the state-of-the-art closer to the ability to discover distributed services on-the-fly to fulfil an intent and without the need for centralised management or
the imperative definition of all service steps, including locations. The use of a mathematically deterministic distributed vector representation in the form of BSC vectors
for both service objects and workflows enables a common language for all elements
required to discover and execute workflows in decentralised transient environments
and opens up the possibilities of employing learning algorithms that can advance the
state-of-the-art in distributed workflows towards a true cognitive distributed network
architectur
Scalable succinct indexing for large text collections
Self-indexes save space by emulating operations of traditional data structures using basic operations on bitvectors. Succinct text indexes provide full-text search functionality which is traditionally provided by suffix trees and suffix arrays for a given text, while using space equivalent to the compressed representation of the text. Succinct text indexes can therefore provide full-text search functionality over inputs much larger than what is viable using traditional uncompressed suffix-based data structures. Fields such as Information Retrieval involve the processing of massive text collections. However, the in-memory space requirements of succinct text indexes during construction have hampered their adoption for large text collections. One promising approach to support larger data sets is to avoid constructing the full suffix array by using alternative indexing representations. This thesis focuses on several aspects related to the scalability of text indexes to larger data sets. We identify practical improvements in the core building blocks of all succinct text indexing algorithms, and subsequently improve the index performance on large data sets. We evaluate our findings using several standard text collections and demonstrate: (1) the practical applications of our improved indexing techniques; and (2) that succinct text indexes are a practical alternative to inverted indexes for a variety of top-k ranked document retrieval problems
Advanced Cryptographic Techniques for Protecting Log Data
This thesis examines cryptographic techniques providing security for computer log files.
It focuses on ensuring authenticity and integrity, i.e. the properties of having been created by a specific entity and being unmodified.
Confidentiality, the property of being unknown to unauthorized entities, will be considered, too, but with less emphasis.
Computer log files are recordings of actions performed and events encountered in
computer systems. While the complexity of computer systems is steadily growing, it is increasingly difficult to predict how a given system will behave under certain
conditions, or to retrospectively reconstruct and explain which events and conditions led to a specific behavior.
Computer log files help to mitigate the problem of retracing a system’s behavior retrospectively by providing a (usually chronological) view of events
and actions encountered in a system.
Authenticity and integrity of computer log files are widely recognized security requirements, see e.g. [Latham, ed., "Department of Defense Trusted Computer System Evaluation Criteria", 1985, p. 10], [Kent and Souppaya, "Guide to Computer Security Log Management", NIST Special Publication 800-92, 2006, Section 2.3.2], [Guttman and Roback, "An Introduction to Computer Security: The NIST Handbook", superseded NIST Special Publication 800-12, 1995, Section 18.3.1],
[Nieles et al., "An Introduction to Information Security" , NIST Special Publication 800-12, 2017, Section 9.3], [Common Criteria Editorial Board, ed., "Common Criteria for Information Technology Security Evaluation", Part 2, Section 8.6].
Two commonly cited ways to ensure integrity of log files are to store log data on so-called write-once-read-many-times (WORM) drives and to immediately print log records on a continuous-feed printer.
This guarantees that log data cannot be retroactively modified by an attacker without physical access to the storage medium.
However, such special-purpose hardware may not always be a viable option for the application at hand, for example because it may be too costly.
In such cases, the integrity and authenticity of log records must be ensured via other means, e.g. with cryptographic techniques. Although these techniques cannot prevent the modification of log data, they can offer strong guarantees that modifications will be detectable, while being implementable in software.
Furthermore, cryptography can be used to achieve public verifiability of log files, which may be needed in applications that have strong transparency requirements. Cryptographic techniques can even be used in addition to hardware solutions, providing protection against attackers who do have physical access
to the logging hardware, such as insiders.
Cryptographic schemes for protecting stored log data need to be resilient against attackers who obtain control over the computer storing the log data.
If this computer operates in a standalone fashion, it is an absolute requirement for the cryptographic schemes to offer security even in the event of a key compromise.
As this is impossible with standard cryptographic tools, cryptographic solutions for protecting log data typically make use of forward-secure schemes, guaranteeing that changes to log data recorded in the past can be detected. Such schemes use a sequence of authentication keys instead of a single one, where previous keys cannot be computed efficiently from latter ones.
This thesis considers the following requirements for, and desirable features of, cryptographic logging schemes:
1) security, i.e. the ability to reliably detect violations of integrity and authenticity, including detection of log truncations,
2) efficiency regarding both computational and storage overhead,
3) robustness, i.e. the ability to verify unmodified log entries even if others have been illicitly changed, and
4) verifiability of excerpts, including checking an excerpt for omissions.
The goals of this thesis are to devise new techniques for the construction of cryptographic schemes that provide security for computer log files, to give concrete constructions of such schemes, to develop new models that can accurately capture the security guarantees offered by the new schemes, as well as to examine the security of previously published schemes.
This thesis demands that cryptographic schemes for securely storing log data must be able to detect if log entries have been deleted from a log file. A special case of deletion is log truncation, where a continuous subsequence of log records from the end of the log file is deleted.
Obtaining truncation resistance, i.e. the ability to detect truncations, is one of the major difficulties when designing cryptographic logging schemes.
This thesis alleviates this problem by introducing a novel technique to detect log truncations without the help of third parties or designated logging hardware.
Moreover, this work presents new formal security notions capturing truncation resistance.
The technique mentioned above is applied to obtain cryptographic logging schemes which can be shown to satisfy these notions under mild assumptions, making them the first schemes with formally proven truncation security.
Furthermore, this thesis develops a cryptographic scheme for the protection of log files which can support the creation of excerpts.
For this thesis, an excerpt is a (not necessarily contiguous) subsequence of records from a log file.
Excerpts created with the scheme presented in this thesis can be publicly checked for integrity and authenticity (as explained above) as well as for completeness, i.e. the property that no relevant log entry has been omitted from the excerpt.
Excerpts provide a natural way to preserve the confidentiality of information that is contained in a log file, but not of interest for a specific public analysis of the log file, enabling the owner of the log file to meet confidentiality and transparency requirements at the same time.
The scheme demonstrates and exemplifies the technique for obtaining truncation security mentioned above.
Since cryptographic techniques to safeguard log files usually require authenticating log entries individually, some researchers [Ma and Tsudik, "A New Approach to Secure Logging", LNCS 5094, 2008; Ma and Tsudik, "A New Approach to Secure Logging", ACM TOS 2009; Yavuz and Peng, "BAF: An Efficient Publicly Verifiable Secure Audit Logging Scheme for Distributed Systems", ACSAC 2009] have proposed using aggregatable signatures [Boneh et al., "Aggregate and Verifiably Encrypted Signatures from Bilinear Maps", EUROCRYPT 2003] in order to reduce the overhead in storage space incurred by using such a cryptographic scheme.
Aggregation of signatures refers to some “combination” of any number of signatures (for distinct or equal messages, by distinct or identical signers) into an “aggregate” signature. The size of the aggregate signature should be less than the total of the sizes of the orginal signatures, ideally the size of one of the original signatures.
Using aggregation of signatures in applications that require storing or transmitting a large number of signatures (such as the storage of log
records) can lead to significant reductions in the use of storage space and bandwidth.
However, aggregating the signatures for all log records into a single signature will cause some fragility:
The modification of a single log entry will render the aggregate signature invalid, preventing the cryptographic verification of any part of the log file.
However, being able to distinguish manipulated log entries from non-manipulated ones may be of importance for after-the-fact investigations.
This thesis addresses this issue by presenting a new technique providing a trade-off between storage overhead and robustness, i.e. the ability to tolerate some modifications to the log file while preserving the cryptographic verifiability of unmodified log entries.
This robustness is achieved by the use of a special kind of aggregate signatures (called fault-tolerant aggregate signatures), which contain some redundancy.
The construction makes use of combinatorial methods guaranteeing that if the number of errors is below a certain threshold, then there will be enough redundancy to identify and verify the non-modified log entries.
Finally, this thesis presents a total of four attacks on three different schemes intended for securely storing log files presented in the literature [Yavuz et al., "Efficient, Compromise Resilient and Append-Only Cryptographic Schemes for Secure Audit Logging", Financial Cryptography 2012; Ma, "Practical Forward Secure Sequential Aggregate Signatures", ASIACCS 2008].
The attacks allow for virtually arbitrary log file forgeries or even recovery of the secret key used for authenticating the log file, which could then be used for mostly arbitrary log file forgeries, too.
All of these attacks exploit weaknesses of the specific schemes. Three of the attacks presented here contradict the security properties of the schemes claimed
and supposedly proven by the respective authors. This thesis briefly discusses these proofs and points out their flaws.
The fourth attack presented here is outside of the security model considered by the scheme’s authors, but nonetheless presents a realistic
threat.
In summary, this thesis advances the scientific state-of-the-art with regard to providing security for computer log files in a number of ways:
by introducing a new technique for obtaining security against log truncations,
by providing the first scheme where excerpts from log files can be verified for completeness,
by describing the first scheme that can achieve some notion of robustness while being able to aggregate log record
signatures, and
by analyzing the security of previously proposed schemes