417 research outputs found
An incremental algorithm for computing ranked full disjunctions
AbstractThe full disjunction is a variation of the join operator that maximally combines tuples from connected relations, while preserving all information in the relations. The full disjunction can be seen as a natural extension of the binary outerjoin operator to an arbitrary number of relations and is a useful operator for information integration. This paper presents the algorithm IncrementalFD for computing the full disjunction of a set of relations. IncrementalFD improves upon previous algorithms for computing the full disjunction in four ways. First, it has a lower total runtime when computing the full result and a lower runtime when computing only k tuples of the result, for any constant k. Second, for a natural class of ranking functions, IncrementalFD can be adapted to return tuples in ranking order. Third, a variation of IncrementalFD can be used to return approximate full disjunctions (which contain maximal approximately join consistent tuples). Fourth, IncrementalFD can be adapted to have a block-based execution, instead of a tuple-based execution
Practical Isolated Searchable Encryption in a Trusted Computing Environment
Cloud computing has become a standard computational paradigm due its numerous
advantages, including high availability, elasticity, and ubiquity. Both individual users and
companies are adopting more of its services, but not without loss of privacy and control.
Outsourcing data and computations to a remote server implies trusting its owners, a
problem many end-users are aware. Recent news have proven data stored on Cloud
servers is susceptible to leaks from the provider, third-party attackers, or even from
government surveillance programs, exposing users’ private data.
Different approaches to tackle these problems have surfaced throughout the years.
Naïve solutions involve storing data encrypted on the server, decrypting it only on the
client-side. Yet, this imposes a high overhead on the client, rendering such schemes
impractical. Searchable Symmetric Encryption (SSE) has emerged as a novel research
topic in recent years, allowing efficient querying and updating over encrypted datastores
in Cloud servers, while retaining privacy guarantees. Still, despite relevant recent advances,
existing SSE schemes still make a critical trade-off between efficiency, security,
and query expressiveness, thus limiting their adoption as a viable technology, particularly
in large-scale scenarios.
New technologies providing Isolated Execution Environments (IEEs) may help improve
SSE literature. These technologies allow applications to be run remotely with
privacy guarantees, in isolation from other, possibly privileged, processes inside the CPU,
such as the operating system kernel. Prominent example technologies are Intel SGX and
ARM TrustZone, which are being made available in today’s commodity CPUs.
In this thesis we study these new trusted hardware technologies in depth, while exploring
their application to the problem of searching over encrypted data, primarily focusing
in SGX. In more detail, we study the application of IEEs in SSE schemes, improving their
efficiency, security, and query expressiveness.
We design, implement, and evaluate three new SSE schemes for different query types,
namely Boolean queries over text, similarity queries over image datastores, and multimodal
queries over text and images. These schemes can support queries combining different
media formats simultaneously, envisaging applications such as privacy-enhanced medical diagnosis and management of electronic-healthcare records, or confidential photograph
catalogues, running without the danger of privacy breaks in Cloud-based provisioned
services
Mixed Integer Conic Optimization and its Applications
In this dissertation, we present our work on the theory and applications of Mixed Integer Linear Optimization (MILO) and Mixed Integer Second Order Cone Optimization (MISOCO). The dissertation is separated in three parts.In the first part, we focus on th
Conceptual coherence in the generation of referring expressions
One of the challenges in the automatic
generation of referring expressions is to
identify a set of domain entities coherently,
that is, from the same conceptual
perspective. We describe and evaluate
an algorithm that generates a conceptually
coherent description of a target set. The
design of the algorithm is motivated by the
results of psycholinguistic experiments.peer-reviewe
Latent Semantic Indexing (LSI) Based Distributed System and Search On Encrypted Data
Latent semantic indexing (LSI) was initially introduced to overcome the issues of synonymy and polysemy of the traditional vector space model (VSM). LSI, however, has challenges of its own, mainly scalability. Despite being introduced in 1990, there are few attempts that provide an efficient solution for LSI, most of the literature is focuses on LSI’s applications rather than improving the original algorithm. In this work we analyze the first framework to provide scalable implementation of LSI and report its performance on the distributed environment of RAAD.
The possibility of adopting LSI in the field of searching over encrypted data is also investigated. The importance of that field is stemmed from the need for cloud computing as an effective computing paradigm that provides an affordable access to high computational power. Encryption is usually applied to prevent unauthorized access to the data (the host is assumed to be curious), however this limits accessibility to the data given that search over encryption is yet to catch with the latest techniques adopted by the Information Retrieval (IR) community. In this work we propose a system that uses LSI for indexing and free-query text for retrieving.
The results show that the available LSI framework does scale on large datasets, however it had some limitations with respect to factors like dictionary size and memory limit. When replicating the exact settings of the baseline on RAAD, it performed relatively slower. This could be resulted by the fact that RAAD uses a distributed file system or because of network latency. The results also show that the proposed system for applying LSI on encrypted data retrieved documents in the same order as the baseline (unencrypted data)
- …