15 research outputs found

    Eight Biennial Report : April 2005 – March 2007

    No full text

    Tools for image retrieval in large multimedia databases

    Get PDF
    English: One of the challenges in the development of an image retrieval system is to achieve an efficient indexing scheme since both developers and users, who are used to make requests in order to find a multimedia element in a large database, can be frustrated due to the long computational time of the search. The traditional indexing schemes neither fulfill the dynamic indexing requirement,which allows to add or remove elements from the structure, nor fit well in high dimensional feature spaces due to the phenomenon so called "the curse of dimensionality." After analyzing several indexing techniques from the literature, we have decided to implement an indexing scheme called Hierarchical Cellular Tree (HCT), which was designed to bring an effective solution especially for indexing large multimedia databases. The HCT has allowed to improve the performance of our implemented image retrieval system based on the MPEG-7 visual descriptors. We have also made some contributions by proposing some modifications to the original HCT which have resulted in an improvement of its performance. Thus, we have proposed a redefinition of the covering radius, which does not consider only the elements belonging to the cell, but also all the elements holding from that cell. Since this consideration implies a much more computationally costly algorithm, we have proposed an approximation by excess for the covering radius value. However, we have also implemented a method which allows to update the covering radius to its actual value whenever it is desired. In addition to this, the pre-emptive insertion method has been adapted as a searching technique in order to improve the performance given by the retrieval scheme called Progressive Query, which was originally proposed to be used over the HCT. Furthermore, the HCT indexing scheme has been also adapted to a server/client architecture by using a messenger system called KSC, which allows to have the HCTloaded on a server waiting for the query requests which are launched for the several clients of the retrieval system. In addition to this, the tool used to request a search over the indexed database has been adapted to a graphic user interface, named GOS (Graphic Object Searcher), which allos the user to order the retrievals in a more friendly way.Castellano: Uno de los desafíos en en desarrollo de un sistema de búsqueda de imágenes es lograr un esquema de indexación eficiente ya que tanto desarrolladores como usuarios, quienes suelen hacer búsquedas de elementos multimedia en grandes bases de datos, pueden verse frustrados por el largo tiempo de búsqueda. Los esquemas tradicionales de indexación no cumplen el requisito de una indexación dinámica que permita añadir y eliminar elementos de la estructura ni son eficientes en espacios de características de alta dimensionalidad. Después de haber analizado distintas técnicas de indexación, hemos decidido implementar un esquema de indexación llamado Hierarchical Cellular Tree (HCT). Éste fue diseñado para dar una solución efectiva a la indexación de grandes bases de datos multimedia. El HCT ha permitido mejorar el rendimiento de nuestro sistema de búsqueda de imágenes basado en descriptores visuales MPEG-7. También hemos hecho varias contribuciones proponiendo alguna modificaciones sobre el HCT original que han resultado en una mejora del rendimiento. En efecto, hemos propuesto una redefinición del radio de cobertura que no considere únicamente los elementos pertenecientes a la célula sinó también todos los elementos que cuelgan de ella. Como que esta consideración implica un algoritmo mucho más costoso computacionalmente, hemos propuesto una aproximación por exceso para el valor del radio de cobertura. No obstante, también hemos implementado un método que permite actualizar el radio de cobertura a su valor exacte siempre que se quiera. Además, el método de inserción preventivo ha sido adaptado como método de búsqueda para así mejorar el rendimiento dado por el esquema de búsqueda Progressive Query propuesto originariamente para el HCT. Además, el HCT ha siso adaptado a una arquitectura servidor/cliente utilizando un sistema de mensajería llamado KSC que permite cargar el HCT en un servidor esperando las peticiones de búsqueda de los distintos clientes del sistema de búsqueda. Asimismo, la herramienta utilizada para lanzar las peticiones de búsqueda ha sido adaptada a una interfaz de usuario gráfica llamada GOS (Graphic Object Searcher) que permite al usuario ordenar las búsquedas de forma más amigable.Català: Un dels reptes en el desenvolupament d'un sistema de cerca d'imatges és aconseguir un esquema d'indexació eficient ja que tant desenvolupadors com usuaris, els quals solen fer cerques per trobar un element multimèdia en una gran base de dades, es poden veure frustrats com a conseqüència del llarg temps de cerca. Els esquemes tradicionals d'indexació no compleixen el requeriment d'una indexació dinàmica, que permeti afegir i treure elements de l'estructura, ni són eficients en espais de característiques d'alta dimensionalitat. Després d'haver analitzat diverses tècniques d'indexació, hem decidit implementar un esquema d'indexació anomenat Hierarchical Cellular Tree (HCT), el qual va ser dissenyat per donar una solució efectiva a la indexació de grans bases de dades multimèdia. El HCT ha permès millorar el rendiment del nostre sistema de cerca d'imatges basat en descriptors visuals MPEG-7. També hem fet diverses contribucions proposant algunes modificacions al HCT original que han resultat en una millora del seu rendiment. En efecte, hem proposat una redefinició del radi de cobertura que no considera únicament els elements pertanyents a la pròpia cèl·lula, sinó també tots els elements que pengen d'aquesta cèl·lula. Com que aquesta consideració implica un algorisme molt més costós computacionalment, hem proposat una aproximació per excés per al valor del radi de cobertura. No obstant això, hem implementat també un mètode que permet actualitzar el radi de cobertura al seu valor exacte sempre que es vulgui. A més a més, el mètode d'inserció preventiu ha estat adaptat com a mètode de cerca per tal de millorar el rendiment donat per l'esquema de cerca anomenat Progressive Query, el qual va ser originàriament proposat per a ser utilitzat sobre el HCT. A més, s'ha adaptat el HCT a una arquitectura de client/servidor utilitzant un sistema de missatgeria anomenat KSC, el qual permet tenir el HCT carregat en un servidor esperant noves peticions de cerca llançades pels clients del sistema de cerca. L'eina utilitzada per fer les peticions de cerca ha estat adaptada a una interfície d'usuari gràfica anomenada GOS (Graphic Object Searcher) que permet a l'usuari fer les cerques de forma més amigable

    Cartography

    Get PDF
    The terrestrial space is the place of interaction of natural and social systems. The cartography is an essential tool to understand the complexity of these systems, their interaction and evolution. This brings the cartography to an important place in the modern world. The book presents several contributions at different areas and activities showing the importance of the cartography to the perception and organization of the territory. Learning with the past or understanding the present the use of cartography is presented as a way of looking to almost all themes of the knowledge

    Leveraging Semantic Annotations for Event-focused Search & Summarization

    Get PDF
    Today in this Big Data era, overwhelming amounts of textual information across different sources with a high degree of redundancy has made it hard for a consumer to retrospect on past events. A plausible solution is to link semantically similar information contained across the different sources to enforce a structure thereby providing multiple access paths to relevant information. Keeping this larger goal in view, this work uses Wikipedia and online news articles as two prominent yet disparate information sources to address the following three problems: • We address a linking problem to connect Wikipedia excerpts to news articles by casting it into an IR task. Our novel approach integrates time, geolocations, and entities with text to identify relevant documents that can be linked to a given excerpt. • We address an unsupervised extractive multi-document summarization task to generate a fixed-length event digest that facilitates efficient consumption of information contained within a large set of documents. Our novel approach proposes an ILP for global inference across text, time, geolocations, and entities associated with the event. • To estimate temporal focus of short event descriptions, we present a semi-supervised approach that leverages redundancy within a longitudinal news collection to estimate accurate probabilistic time models. Extensive experimental evaluations demonstrate the effectiveness and viability of our proposed approaches towards achieving the larger goal.Im heutigen Big Data Zeitalters existieren überwältigende Mengen an Textinformationen, die über mehrere Quellen verteilt sind und ein hohes Maß an Redundanz haben. Durch diese Gegebenheiten ist eine Retroperspektive auf vergangene Ereignisse für Konsumenten nur schwer möglich. Eine plausible Lösung ist die Verknüpfung semantisch ähnlicher, aber über mehrere Quellen verteilter Informationen, um dadurch eine Struktur zu erzwingen, die mehrere Zugriffspfade auf relevante Informationen, bietet. Vor diesem Hintergrund benutzt diese Dissertation Wikipedia und Onlinenachrichten als zwei prominente, aber dennoch grundverschiedene Informationsquellen, um die folgenden drei Probleme anzusprechen: • Wir adressieren ein Verknüpfungsproblem, um Wikipedia-Auszüge mit Nachrichtenartikeln zu verbinden und das Problem in eine Information-Retrieval-Aufgabe umzuwandeln. Unser neuartiger Ansatz integriert Zeit- und Geobezüge sowie Entitäten mit Text, um relevante Dokumente, die mit einem gegebenen Auszug verknüpft werden können, zu identifizieren. • Wir befassen uns mit einer unüberwachten Extraktionsmethode zur automatischen Zusammenfassung von Texten aus mehreren Dokumenten um Ereigniszusammenfassungen mit fester Länge zu generieren, was eine effiziente Aufnahme von Informationen aus großen Dokumentenmassen ermöglicht. Unser neuartiger Ansatz schlägt eine ganzzahlige lineare Optimierungslösung vor, die globale Inferenzen über Text, Zeit, Geolokationen und mit Ereignis-verbundenen Entitäten zieht. • Um den zeitlichen Fokus kurzer Ereignisbeschreibungen abzuschätzen, stellen wir einen semi-überwachten Ansatz vor, der die Redundanz innerhalb einer langzeitigen Dokumentensammlung ausnutzt, um genaue probabilistische Zeitmodelle abzuschätzen. Umfangreiche experimentelle Auswertungen zeigen die Wirksamkeit und Tragfähigkeit unserer vorgeschlagenen Ansätze zur Erreichung des größeren Ziels

    Interoperability of Enterprise Software and Applications

    Get PDF

    New Fundamental Technologies in Data Mining

    Get PDF
    The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

    Préserver la vie privée des individus grâce aux Systèmes Personnels de Gestion des Données

    Get PDF
    Riding the wave of smart disclosure initiatives and new privacy-protection regulations, the Personal Cloud paradigm is emerging through a myriad of solutions offered to users to let them gather and manage their whole digital life. On the bright side, this opens the way to novel value-added services when crossing multiple sources of data of a given person or crossing the data of multiple people. Yet this paradigm shift towards user empowerment raises fundamental questions with regards to the appropriateness of the functionalities and the data management and protection techniques which are offered by existing solutions to laymen users. Our work addresses these questions on three levels. First, we review, compare and analyze personal cloud alternatives in terms of the functionalities they provide and the threat models they target. From this analysis, we derive a general set of functionality and security requirements that any Personal Data Management System (PDMS) should consider. We then identify the challenges of implementing such a PDMS and propose a preliminary design for an extensive and secure PDMS reference architecture satisfying the considered requirements. Second, we focus on personal computations for a specific hardware PDMS instance (i.e., secure token with mass storage of NAND Flash). In this context, we propose a scalable embedded full-text search engine to index large document collections and manage tag-based access control policies. Third, we address the problem of collective computations in a fully-distributed architecture of PDMSs. We discuss the system and security requirements and propose protocols to enable distributed query processing with strong security guarantees against an attacker mastering many colluding corrupted nodes.Surfant sur la vague des initiatives de divulgation restreinte de données et des nouvelles réglementations en matière de protection de la vie privée, le paradigme du Cloud Personnel émerge à travers une myriade de solutions proposées aux utilisateurs leur permettant de rassembler et de gérer l'ensemble de leur vie numérique. Du côté positif, cela ouvre la voie à de nouveaux services à valeur ajoutée lors du croisement de plusieurs sources de données d'un individu ou du croisement des données de plusieurs personnes. Cependant, ce changement de paradigme vers la responsabilisation de l'utilisateur soulève des questions fondamentales quant à l'adéquation des fonctionnalités et des techniques de gestion et de protection des données proposées par les solutions existantes aux utilisateurs lambda. Notre travail aborde ces questions à trois niveaux. Tout d'abord, nous passons en revue, comparons et analysons les alternatives de cloud personnel au niveau des fonctionnalités fournies et des modèles de menaces ciblés. De cette analyse, nous déduisons un ensemble général d'exigences en matière de fonctionnalité et de sécurité que tout système personnel de gestion des données (PDMS) devrait prendre en compte. Nous identifions ensuite les défis liés à la mise en œuvre d'un tel PDMS et proposons une conception préliminaire pour une architecture PDMS étendue et sécurisée de référence répondant aux exigences considérées. Ensuite, nous nous concentrons sur les calculs personnels pour une instance matérielle spécifique du PDMS (à savoir, un dispositif personnel sécurisé avec un stockage de masse de type NAND Flash). Dans ce contexte, nous proposons un moteur de recherche plein texte embarqué et évolutif pour indexer de grandes collections de documents et gérer des politiques de contrôle d'accès basées sur des étiquettes. Troisièmement, nous abordons le problème des calculs collectifs dans une architecture entièrement distribuée de PDMS. Nous discutons des exigences d'architectures système et de sécurité et proposons des protocoles pour permettre le traitement distribué des requêtes avec de fortes garanties de sécurité contre un attaquant maîtrisant de nombreux nœuds corrompus

    A teachable semi-automatic web information extraction system based on evolved regular expression patterns

    Get PDF
    This thesis explores Web Information Extraction (WIE) and how it has been used in decision making and to support businesses in their daily operations. The research focuses on a WIE system based on Genetic Programming (GP) with an extensible model to enhance the automatic extractor. This uses a human as a teacher to identify and extract relevant information from the semi-structured HTML webpages. Regular expressions, which have been chosen as the pattern matching tool, are automatically generated based on the training data to provide an improved grammar and lexicon. This particularly benefits the GP system which may need to extend its lexicon in the presence of new tokens in the web pages. These tokens allow the GP method to produce new extraction patterns for new requirements

    Untangling the Web: A Guide To Internet Research

    Get PDF
    [Excerpt] Untangling the Web for 2007 is the twelfth edition of a book that started as a small handout. After more than a decade of researching, reading about, using, and trying to understand the Internet, I have come to accept that it is indeed a Sisyphean task. Sometimes I feel that all I can do is to push the rock up to the top of that virtual hill, then stand back and watch as it rolls down again. The Internet—in all its glory of information and misinformation—is for all practical purposes limitless, which of course means we can never know it all, see it all, understand it all, or even imagine all it is and will be. The more we know about the Internet, the more acute is our awareness of what we do not know. The Internet emphasizes the depth of our ignorance because our knowledge can only be finite, while our ignorance must necessarily be infinite. My hope is that Untangling the Web will add to our knowledge of the Internet and the world while recognizing that the rock will always roll back down the hill at the end of the day

    Adaptive anwendungsspezifische Verarbeitung von XML-Dokumenten

    Get PDF
    In dieser Arbeit wird ein Konzept vorgeschlagen, mit dem neue hohere Operatoren auf der Grundlage existierender Operatoren einer XML-Transformationsprache aufgebaut werden können. Durch das Zusammenfassen von immer wieder auftretenden Operatorkombinationen zu höheren Operatoren können Transformationsdefinitionen bspw. kürzer und verständlicher beschrieben werden. Zur Umsetzung des Konzeptes ist die Ausführungsumgebung XTC entstanden. XTC koordiniert den Ablauf, um höhere Operatoren in niedrigere, letztendlich elementare Operatoren einer Basistransformationssprache zu überführen. Neben XTC wird das Generatorsystem XOpGen entwickelt, welches den Implementierungsaufwand für die neuen höheren Operatoren weiter verringert. Das Potential von höheren Operatoren wird an der vom W3C standardisierten XML-Transformationssprache XSLT demonstriert. XSLT wird mit verschiedenen, sowohl universellen als auch domänenspezifischen, Operatoren erweitert
    corecore