8 research outputs found

    Browsing Digital Collections with Reconfigurable Faceted Thesauri

    Get PDF
    Faceted thesauri group classification terms into hierarchically arranged facets. They enable faceted browsing, a well-known browsing technique that makes it possible to navigate digital collections by recursively choosing terms in the facet hierarchy. In this paper we develop an approach to achieve faceted browsing in live collections, in which not only the contents but also the thesauri can be constantly reorganized. We start by introducing a digital collection model letting users reconfigure facet hierarchies. Then we introduce navigation automata as an efficient way of supporting faceted browsing in these collections. Since, in the worst-case, the number of states in these automata can grow exponentially, we propose two alternative indexing strategies able to bridge this complexity: inverted indexes and navigation dendrograms. Finally, by comparing these strategies in the context of Clavy, a system for managing collections with reconfigurable structures in digital humanities and educational settings, we provide evidence that navigation dendrogram organization outperforms the inverted index-based one

    Experimental Analysis of a New Multidimensional Storage and Retrieval Method

    Get PDF

    Gestión de colecciones digitales con esquemas de catalogación reconfigurables

    Get PDF
    Agradezco el apoyo recibido durante estos años por parte de todos los miembros de mi grupo de investigación ILSA en la Facultad de Informática de la Universidad Complutense de Madrid. También a los grupos de investigación LEETHI y LOEP pertenecientes también a la Universidad Complutense, y a la Fundación El Caño de Panamá, sin los que no habría podido realizar parte de los experimentos expuestos en los trabajos.A título personal, deseo agradecer a mis directores José Luis Sierra, Ana Fernández-Pampillón, Antonio Sarasa, y compañeros de grupo de investigación Alfredo Fernández Valmayor, Daniel Rodríguez, Bryan Temprado y César Ruiz por darme la oportunidad de desarrollar estos años de investigación con ellos sobre este campo, esfuerzo que concluye en esta tesis, y por todo lo que me han enseñado sobre cómo ser un buen investigador.Dentro de la universidad también deseo dar las gracias a mis compañeros del “Aula16”: Toni, Dan, Iván, Víctor, Jesús, Pablo, Cristina y Marta con los que he compartido muchas comidas, y cafés, a lo largo de estos años divagando sobre informática. También quiero dar las gracias a mis actuales compañeros del “420bip”: Susana, Vicky, Carlos y Noelia, que me han visto dando los últimos remates estos meses a esta tesis y me han ayudado en todo lo que han podido..

    Sequencing geographical data for efficient query processing on air in mobile computing.

    Get PDF
    Three cost models are derived to measure Data Broadcast Wait (DBW), Data Access Time in the multiplexing scheme (ATDataMul) where both data and indices are broadcast in the same channel, and Data Access Time in the separate channel scheme (ATDataSep) where data and indices are broadcast in two separate channels. Hypergraph representations are used to represent the spatial relationships of both point data and graph data. The broadcast data placement problem is then converted to the graph layout problem. A framework for classifying ordering heuristics for different types of geographical data is presented. A low-polynomial cost approximation graph layout method is used to solve the DBW minimization problem. Based on the proven monotonic relationship between ATData Sep and DBW, the same approximation method is also used for AT DataSep optimization. A novel method is developed to optimize ATDataMul. Experiments using both synthetic and real data are conducted to evaluate the performance of the ordering heuristics and optimization methods. The results show that R-Tree traversal ordering heuristic in conjunction with the optimization methods is effective for sequencing point data for spatial range query processing, while graph partition tree traversal ordering heuristic in conjunction with the optimization methods is suitable for sequencing graph data for network path query processing over air.Geographical data broadcasting is suitable for many large scale dissemination-based applications due to its independence of number of users, and thus it can serve as an important part of intelligent information infrastructures for modern cities. In broadcast systems, query response time is greatly affected by the order in which data items are being broadcast. However, existing broadcast ordering techniques are not suitable for geographical data because of the multi-dimension and rich semantics of geographical data. This research develops cost models and methods for placing geographical data items in a broadcast channel based on their spatial semantics to reduce response time and energy consumption for processing spatial queries on point data and graph data

    Efficient image duplicate detection based on image analysis

    Get PDF
    This thesis is about the detection of duplicated images. More precisely, the developed system is able to discriminate possibly modified copies of original images from other unrelated images. The proposed method is referred to as content-based since it relies only on content analysis techniques rather than using image tagging as done in watermarking. The proposed content-based duplicate detection system classifies a test image by associating it with a label that corresponds to one of the original known images. The classification is performed in four steps. In the first step, the test image is described by using global statistics about its content. In the second step, the most likely original images are efficiently selected using a spatial indexing technique called R-Tree. The third step consists in using binary detectors to estimate the probability that the test image is a duplicate of the original images selected in the second step. Indeed, each original image known to the system is associated with an adapted binary detector, based on a support vector classifier, that estimates the probability that a test image is one of its duplicate. Finally, the fourth and last step consists in choosing the most probable original by picking that with the highest estimated probability. Comparative experiments have shown that the proposed content-based image duplicate detector greatly outperforms detectors using the same image description but based on a simpler distance functions rather than using a classification algorithm. Additional experiments are carried out so as to compare the proposed system with existing state of the art methods. Accordingly, it also outperforms the perceptual distance function method, which uses similar statistics to describe the image. While the proposed method is slightly outperformed by the key points method, it is five to ten times less complex in terms of computational requirements. Finally, note that the nature of this thesis is essentially exploratory since it is one of the first attempts to apply machine learning techniques to the relatively recent field of content-based image duplicate detection

    Efficient image duplicate detection based on image analysis

    Get PDF
    This thesis is about the detection of duplicated images. More precisely, the developed system is able to discriminate possibly modified copies of original images from other unrelated images. The proposed method is referred to as content-based since it relies only on content analysis techniques rather than using image tagging as done in watermarking. The proposed content-based duplicate detection system classifies a test image by associating it with a label that corresponds to one of the original known images. The classification is performed in four steps. In the first step, the test image is described by using global statistics about its content. In the second step, the most likely original images are efficiently selected using a spatial indexing technique called R-Tree. The third step consists in using binary detectors to estimate the probability that the test image is a duplicate of the original images selected in the second step. Indeed, each original image known to the system is associated with an adapted binary detector, based on a support vector classifier, that estimates the probability that a test image is one of its duplicate. Finally, the fourth and last step consists in choosing the most probable original by picking that with the highest estimated probability. Comparative experiments have shown that the proposed content-based image duplicate detector greatly outperforms detectors using the same image description but based on a simpler distance functions rather than using a classification algorithm. Additional experiments are carried out so as to compare the proposed system with existing state of the art methods. Accordingly, it also outperforms the perceptual distance function method, which uses similar statistics to describe the image. While the proposed method is slightly outperformed by the key points method, it is five to ten times less complex in terms of computational requirements. Finally, note that the nature of this thesis is essentially exploratory since it is one of the first attempts to apply machine learning techniques to the relatively recent field of content-based image duplicate detection

    Performance comparison of index structures for multi-key retrieval

    No full text
    corecore