Search CORE

10 research outputs found

Dynamic Multimodal Fusion in Video Search

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Die Sphere-Search-Suchmaschine zur graphbasierten Suche auf heterogenen, semistrukturierten Daten

Author: Graupmann Jens
Publication venue: Sonstige Einrichtungen. Sonstige Einrichtungen
Publication date: 01/01/2006
Field of study

In dieser Arbeit wird die neuartige SphereSearch-Suchmaschine vorgestellt, die ein einheitliches ranglistenbasiertes Retrieval auf heterogenen XML- und Web-Daten ermöglicht. Ihre Fähigkeiten umfassen die Auswertung von vagen Struktur- und Inhaltsbedingungen sowie ein auf IR-Statistiken und einem graph-basierten Datenmodell basierendes Relevanz-Ranking. Web-Dokumente im HTML- und PDFFormat werden zunächst automatisch in ein XML-Zwischenformat konvertiert und anschließend mit Hilfe von Annotations-Tools durch zusätzliche Tags semantisch angereichtert. Die graph-basierte Suchmaschine bietet auf semi-strukturierten Daten vielfältige Suchmöglichkeiten, die von keiner herkömmlichen Web- oder XMLSuchmaschine ausgedrückt werden können: konzeptbewusste und kontextbewusste Suche, die sowohl die implizite Struktur von Daten als auch ihren Kontext berücksichtigt. Die Vorteile der SphereSearch-Suchmaschine werden durch Experimente auf verschiedenen Dokumentenkorpora demonstriert. Diese umfassen eine große, vielfältige Tags beinhaltende, nicht-schematische Enzyklopädie, die um externe Dokumente erweitert wurde, sowie einen Standard-XML-Benchmark.This thesis presents the novel SphereSearch Engine that provides unified ranked retrieval on heterogeneous XML andWeb data. Its search capabilities include vague structure and text content conditions, and relevance ranking based on IR statistics and a graph-based data model. Web pages in HTML or PDF are automatically converted into an intermediate XML format, with the option of generating semantic tags by means of linguistic annotation tools. For semi-structured data the graphbased query engine is leveraged to provide very rich search options that cannot be expressed in traditional Web or XML search engines: concept-aware and linkaware querying that takes into account the implicit structure and context of Web pages. The benefits of the SphereSearch engine are demonstrated by experiments with a large and richly tagged but non-schematic open encyclopedia extended with external documents and a standard XML benchmark

Universaar

MPG.PuRe

Acronym

Modèle flexible pour la Recherche d'Information dans des corpus de documents semi-structurés

Author: Sauvagnat Karen
Publication venue: HAL CCSD
Publication date: 30/06/2005
Field of study

Structural information contained in semi-structured documents can be used to focus on relevant information. The aim of Information Retrieval System is then to retrieve relevant information units instead of whole documents. We propose here the XFIRM model (XML Flexible Information Retrieval model), which is based on: (i) a generic data representation model, allowing the modelling of documents having heterogeneous structures; (ii) a flexible query language that allows the expression of users needs according to many precision degrees, by expressing (or not) conditions on the documents structure; (iii) a retrieval model based on a relevance propagation method, which aims at finding the most exhaustive and specific information units answering the query. The interest of our propositions has been shown thanks to the prototype we developedLa nature de sources d'information évolue, et les documents numériques traditionnels plats ne contenant que du texte s'enrichissent d'information structurelle et multimédia. Cette évolution est accélérée par l'expansion du Web, et les documents semi-structurés de type XML (eXtensible Markup Language) tendent à former la majorité des documents numériques mis à disposition des utilisateurs. Le développement d'outils automatisés permettant un accès efficace à ce nouveau type d'information numérique apparaît comme une nécessité. Afin de valoriser au mieux l'ensemble des informations disponibles, les méthodes existantes de Recherche d'Information (RI) doivent être adaptées. L'information structurelle des documents peut en effet servir à affiner le concept de granule documentaire. Le but pour les Systèmes de Recherche d'Information (SRI) est alors de retrouver des unités d'information (et non plus de documents) pertinentes à des requêtes utilisateur. Afin de répondre à cette problématique fondamentale, de nouveaux modèles prenant en compte l'information structurelle des documents, tant au niveau de l'indexation, de l'interrogation que de la recherche doivent être construits. L'objectif de nos travaux est de proposer un modèle permettant d'effectuer des recherches flexibles dans des corpus de document semi-structurés. Ceci nous a conduit à proposer le modèle XFIRM (XML Flexible Information Retrieval Model ) reposant sur : (i) Un modèle de représentation des données générique, permettant de modéliser des documents possédant des structures différentes ; (ii) Un langage de requête flexible, permettant à l'utilisateur d'exprimer son besoin selon divers degrés de précision, en exprimant ou non des conditions sur la structure des documents ; (iii) Un modèle de recherche basée sur une méthode de propagation de la pertinence. Ce modèle a pour but de trouver les unités d'information les plus exhaustives et spécifiques répondant à une requête utilisateur, que celle-ci contienne ou non des conditions de structure. Les documents semi-structurés peuvent être représentés sous forme arborescente, et le but est alors de trouver les sous-arbres de taille minimale répondant à la requête. Les recherches sur le contenu seul des documents sont effectuées en prenant en compte les importances diverses des feuilles des sous-arbres, et en plaçant ces derniers dans leur contexte, c'est à dire, en tenant compte de la pertinence du document. Les recherches portant à la fois sur le contenu et la structure des documents sont effectuées grâce à plusieurs propagations de pertinence dans l'arbre du document, et ce afin d'effectuer une correspondance vague entre l'arbre du document et l'arbre de la requête. L'évaluation de notre modèle, grâce au prototype que nous avons développé, montre l'intérêt de nos propositions, que ce soit pour effectuer des recherches sur le contenu seul des documents que sur le contenu et la structure

Thèses en Ligne

Scientific Publications of the University of Toulouse II Le Mirail

Focused Retrieval

Author: Itakura Kalista Yuki
Publication venue: 'University of Waterloo'
Publication date: 01/01/2010
Field of study

Traditional information retrieval applications, such as Web search, return atomic units of retrieval, which are generically called ``documents''. Depending on the application, a document may be a Web page, an email message, a journal article, or any similar object. In contrast to this traditional approach, focused retrieval helps users better pin-point their exact information needs by returning results at the sub-document level. These results may consist of predefined document components~---~such as pages, sections, and paragraphs~---~or they may consist of arbitrary passages, comprising any sub-string of a document. If a document is marked up with XML, a focused retrieval system might return individual XML elements or ranges of elements. This thesis proposes and evaluates a number of approaches to focused retrieval, including methods based on XML markup and methods based on arbitrary passages. It considers the best unit of retrieval, explores methods for efficient sub-document retrieval, and evaluates formulae for sub-document scoring. Focused retrieval is also considered in the specific context of the Wikipedia, where methods for automatic vandalism detection and automatic link generation are developed and evaluated

University of Waterloo's Institutional Repository

Un modèle de recherche d'information agrégée basée sur les réseaux bayésiens dans des documents semi-structurés

Author: Naffakhi Najeh
Publication venue
Publication date: 08/07/2013
Field of study

Nous proposons un modèle de recherche d'information basé sur les réseaux bayésiens. Dans ce modèle, la requête de l'utilisateur déclenche un processus de propagation pour sélectionner les éléments pertinents. Dans notre modèle, nous cherchons à renvoyer à l'utilisateur un agrégat au lieu d'une liste d'éléments. En fait, l'agrégat formulé à partir d'un document est considéré comme étant un ensemble d'éléments ou une unité d'information (portion d'un document) qui répond le mieux à la requête de l'utilisateur. Cet agrégat doit répondre à trois aspects à savoir la pertinence, la non-redondance et la complémentarité pour qu'il soit qualifié comme une réponse à cette requête. L'utilité des agrégats retournés est qu'ils donnent à l'utilisateur un aperçu sur le contenu informationnel de cette requête dans la collection de documents. Afin de valider notre modèle, nous l'avons évalué dans le cadre de la campagne d'évaluation INEX 2009 (utilisant plus que 2 666 000 documents XML de l'encyclopédie en ligne Wikipédia). Les expérimentations montrent l'intérêt de cette approche en mettant en évidence l'impact de l'agrégation de tels éléments.The work described in this thesis are concerned with the aggregated search on XML elements. We propose new approaches to aggregating and pruning using different sources of evidence (content and structure). We propose a model based on Bayesian networks. The dependency relationships between query-terms and terms-elements are quantified by probability measures. In this model, the user's query triggers a propagation process to find XML elements. In our model, we search to return to the user an aggregate instead of a list of XML elements. In fact, the aggregate made from a document is considered an information unit (or a portion of this document) that best meets the user's query. This aggregate must meet three aspects namely relevance, non-redundancy and complementarity in order to answer the query. The value returned aggregates is that they give the user an overview of the information need in the collection

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Représentation, gestion et exploitation de données hétérogènes en e-Health

Author: Hainaut Jean-Baptiste
Publication venue
Publication date: 01/01/2011
Field of study

Repository of the University of Namur

Semantics of video shots for content-based retrieval

Author: Volkmer T
Publication venue: RMIT University
Publication date: 01/01/2007
Field of study

Content-based video retrieval research combines expertise from many different areas, such as signal processing, machine learning, pattern recognition, and computer vision. As video extends into both the spatial and the temporal domain, we require techniques for the temporal decomposition of footage so that specific content can be accessed. This content may then be semantically classified - ideally in an automated process - to enable filtering, browsing, and searching. An important aspect that must be considered is that pictorial representation of information may be interpreted differently by individual users because it is less specific than its textual representation. In this thesis, we address several fundamental issues of content-based video retrieval for effective handling of digital footage. Temporal segmentation, the common first step in handling digital video, is the decomposition of video streams into smaller, semantically coherent entities. This is usually performed by detecting the transitions that separate single camera takes. While abrupt transitions - cuts - can be detected relatively well with existing techniques, effective detection of gradual transitions remains difficult. We present our approach to temporal video segmentation, proposing a novel algorithm that evaluates sets of frames using a relatively simple histogram feature. Our technique has been shown to range among the best existing shot segmentation algorithms in large-scale evaluations. The next step is semantic classification of each video segment to generate an index for content-based retrieval in video databases. Machine learning techniques can be applied effectively to classify video content. However, these techniques require manually classified examples for training before automatic classification of unseen content can be carried out. Manually classifying training examples is not trivial because of the implied ambiguity of visual content. We propose an unsupervised learning approach based on latent class modelling in which we obtain multiple judgements per video shot and model the users' response behaviour over a large collection of shots. This technique yields a more generic classification of the visual content. Moreover, it enables the quality assessment of the classification, and maximises the number of training examples by resolving disagreement. We apply this approach to data from a large-scale, collaborative annotation effort and present ways to improve the effectiveness for manual annotation of visual content by better design and specification of the process. Automatic speech recognition techniques along with semantic classification of video content can be used to implement video search using textual queries. This requires the application of text search techniques to video and the combination of different information sources. We explore several text-based query expansion techniques for speech-based video retrieval, and propose a fusion method to improve overall effectiveness. To combine both text and visual search approaches, we explore a fusion technique that combines spoken information and visual information using semantic keywords automatically assigned to the footage based on the visual content. The techniques that we propose help to facilitate effective content-based video retrieval and highlight the importance of considering different user interpretations of visual content. This allows better understanding of video content and a more holistic approach to multimedia retrieval in the future

RMIT Research Repository

JuruXML - an XML retrieval system at INEX 02

Author: Aya Soffer
David Carmel
Einat Amitay
Matan M
Yoelle Maarek
Yosi Mass
Publication venue
Publication date
Field of study

XML documents represent a middle range between unstructured data such as textual documents and fully structured data encoded in databases. Typically, information retrieval techniques are used to support search on the “unstructured ” end of this scale, while database techniques are used for the other end. To date, most of the work on XML query and search has stemmed from the structured side and is strongly inspired by database techniques. We describe here an approach that originates from the “unstructured ” end and is based on augmentation of information retrieval techniques. It is specifically targeted to support the information needs of end-users, more specifically a generic querying mechanism, and ranking of results for approximate needs. We describe our query format and ranking mechanism and demonstrate how it was used to run the INEX topics

CiteSeerX