Search CORE

50 research outputs found

Index compression for information retrielval systems

Author: Blanco González Roi
Publication venue
Publication date: 01/01/2008
Field of study

[Abstract] Given the increasing amount of information that is available today, there is a clear need for Information Retrieval (IR) systems that can process this information in an efficient and effective way. Efficient processing means minimising the amount of time and space required to process data, whereas effective processing means identifying accurately which information is relevant to the user and which is not. Traditionally, efficiency and effectiveness are at opposite ends (what is beneficial to efficiency is usually harmful to effectiveness, and vice versa), so the challenge of IR systems is to find a compromise between efficient and effective data processing. This thesis investigates the efficiency of IR systems. It suggests several novel strategies that can render IR systems more efficient by reducing the index size of IR systems, referred to as index compression. The index is the data structure that stores the information handled in the retrieval process. Two different approaches are proposed for index compression, namely document reordering and static index pruning. Both of these approaches exploit document collection characteristics in order to reduce the size of indexes, either by reassigning the document identifiers in the collection in the index, or by selectively discarding information that is less relevant to the retrieval process by pruning the index. The index compression strategies proposed in this thesis can be grouped into two categories: (i) Strategies which extend state of the art in the field of efficiency methods in novel ways. (ii) Strategies which are derived from properties pertaining to the effectiveness of IR systems; these are novel strategies, because they are derived from effectiveness as opposed to efficiency principles, and also because they show that efficiency and effectiveness can be successfully combined for retrieval. The main contributions of this work are in indicating principled extensions of state of the art in index compression, and also in suggesting novel theoretically-driven index compression techniques which are derived from principles of IR effectiveness. All these techniques are evaluated extensively, in thorough experiments involving established datasets and baselines, which allow for a straight-forward comparison with state of the art. Moreover, the optimality of the proposed approaches is addressed from a theoretical perspective.[Resumen] Dada la creciente cantidad de información disponible hoy en día, existe una clara necesidad de sistemas de Recuperación de Información (RI) que sean capaces de procesar esa información de una manera efectiva y eficiente. En este contexto, eficiente significa cantidad de tiempo y espacio requeridos para procesar datos, mientras que efectivo significa identificar de una manera precisa qué información es relevante para el usuario y cual no lo es. Tradicionalmente, eficiencia y efectividad se encuentran en polos opuestos - lo que es beneficioso para la eficiencia, normalmente perjudica la efectividad y viceversa - así que un reto para los sistemas de RI es encontrar un compromiso adecuado entre el procesamiento efectivo y eficiente de los datos. Esta tesis investiga el problema de la eficiencia de los sistemas de RI. Sugiere diferentes estrategias novedosas que pueden permitir la reducción de los índices de los sistemas de RI, enmarcadas dentro da las técnicas conocidas como compresión de índices. El índice es la estructura de datos que almacena la información utilizada en el proceso de recuperación. Se presentan dos aproximaciones diferentes para la compresión de los índices, referidas como reordenación de documentos y pruneado estático del índice. Ambas aproximaciones explotan características de colecciones de documentos para reducir el tamaño final de los índices, mediante la reasignación de los identificadores de los documentos de la colección o bien descartando selectivamente la información que es "menos relevante" para el proceso de recuperación. Las estrategias de compresión propuestas en este tesis se pueden agrupar en dos categorías: (i) estrategias que extienden el estado del arte en la eficiencia de una manera novedosa y (ii) estrategias derivadas de propiedades relacionadas con los principios de la efectividad en los sistemas de RI; estas estrategias son novedosas porque son derivadas desde principios de la efectividad como contraposición a los de la eficiencia, e porque revelan como la eficiencia y la efectividad pueden ser combinadas de una manera efectiva para la recuperación de información. Las contribuciones de esta tesis abarcan la elaboración de técnicas del estado del arte en compresión de índices y también en la derivación de técnicas de compresión basadas en fundamentos teóricos derivados de los principios de la efectividad de los sistemas de RI. Todas estas técnicas han sido evaluadas extensamente con numerosos experimentos que involucran conjuntos de datos y técnicas de referencia bien establecidas en el campo, las cuales permiten una comparación directa con el estado del arte. Finalmente, la optimalidad de las aproximaciones presentadas es tratada desde una perspectiva teórica

Improving Efficiency, Expressiveness and Security of Searchable Encryption

Author: Demertzis Ioannis
Publication venue
Publication date: 01/01/2020
Field of study

A large part of our personal data, ranging from medical and financial records to our social activity, is stored online in cloud servers. Frequent data breaches threaten to expose these data to malicious third parties, often with catastrophic consequences (estimated to several billion of US dollars annually). In this thesis, we use, extend and improve Searchable Encryption (SE) in order to build the next generation encrypted databases/systems that will prevent such undesirable situations. Our goal is to build systems that are both practical and provably secure, while allowing expressive search and computation on encrypted data. Towards this goal, we have proposed new SE schemes that achieve the following: (i) have better search/computation time, (ii) allow expressive queries such as range, join, group-by, as well as dynamic query workloads, and (iii) provide new adjustable security-efficiency trade-offs---leading to robust and efficient schemes even against very powerful adversaries

Software similarity and classification

Author: Cesare Silvio
Publication venue: Deakin University, Faculty of Science, Engineering and Built Environment, School of Information Technology
Publication date: 01/06/2013
Field of study

This thesis analyses software programs in the context of their similarity to other software programs. Applications proposed and implemented include detecting malicious software and discovering security vulnerabilities

Error processes in the integration of digital cartographic data in geographic information systems.

Author: Rybaczuk Krysia
Publication venue
Publication date: 01/01/1992
Field of study

Errors within a Geographic Information System (GIS) arise from several factors. In the first instance receiving data from a variety of different sources results in a degree of incompatibility between such information. Secondly, the very processes used to acquire the information into the GIS may in fact degrade the quality of the data. If geometric overlay (the very raison d'etre of many GISs) is to be performed, such inconsistencies need to be carefully examined and dealt with. A variety of techniques exist for the user to eliminate such problems, but all of these tend to rely on the geometry of the information, rather than on its meaning or nature. This thesis explores the introduction of error into GISs and the consequences this has for any subsequent data analysis. Techniques for error removal at the overlay stage are also examined and improved solutions are offered. Furthermore, the thesis also looks at the role of the data model and the potential detrimental effects this can have, in forcing the data to be organised into a pre-defined structure

Durham e-Theses