349 research outputs found

    Language-based multimedia information retrieval

    Get PDF
    This paper describes various methods and approaches for language-based multimedia information retrieval, which have been developed in the projects POP-EYE and OLIVE and which will be developed further in the MUMIS project. All of these project aim at supporting automated indexing of video material by use of human language technologies. Thus, in contrast to image or sound-based retrieval methods, where both the query language and the indexing methods build on non-linguistic data, these methods attempt to exploit advanced text retrieval technologies for the retrieval of non-textual material. While POP-EYE was building on subtitles or captions as the prime language key for disclosing video fragments, OLIVE is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which then serve as the basis for text-based retrieval functionality

    An analysis of the use of graphics for information retrieval

    Get PDF
    Several research groups have addressed the problem of retrieving vector graphics. This work has, however, focused either on domain-dependent areas or was based on very simple graphics languages. Here we take a fresh look at the issue of graphics retrieval in general and in particular at the tasks which retrieval systems must support. The paper presents a series of case studies which explored the needs of professionals in the hope that these needs can help direct future graphics IR research. Suggested modelling techniques for some of the graphic collections are also presented

    An Integrated Approach for Automatic\ud Aggregation of Learning Knowledge Objects

    Get PDF
    This paper presents the Knowledge Puzzle, an ontology-based platform designed to facilitate domain\ud knowledge acquisition from textual documents for knowledge-based systems. First, the\ud Knowledge Puzzle Platform performs an automatic generation of a domain ontology from documents’\ud content through natural language processing and machine learning technologies. Second,\ud it employs a new content model, the Knowledge Puzzle Content Model, which aims to model\ud learning material from annotated content. Annotations are performed semi-automatically based\ud on IBM’s Unstructured Information Management Architecture and are stored in an Organizational\ud memory (OM) as knowledge fragments. The organizational memory is used as a knowledge\ud base for a training environment (an Intelligent Tutoring System or an e-Learning environment).\ud The main objective of these annotations is to enable the automatic aggregation of Learning\ud Knowledge Objects (LKOs) guided by instructional strategies, which are provided through\ud SWRL rules. Finally, a methodology is proposed to generate SCORM-compliant learning objects\ud from these LKOs

    Graph-based methods for Significant Concept Selection

    Get PDF
    It is well known in information retrieval area that one important issue is the gap between the query and document vocabularies. Concept-based representation of both the document and the query is one of the most effective approaches that lowers the effect of text mismatch and allows the selection of relevant documents that deal with the shared semantics hidden behind both. However, identifying the best representative concepts from texts is still challenging. In this paper, we propose a graph-based method to select the most significant concepts to be integrated into a conceptual indexing system. More specifically, we build the graph whose nodes represented concepts and weighted edges represent semantic distances. The importance of concepts are computed using centrality algorithms that levrage between structural and contextual importance. We experimentally evaluated our method of concept selection using the standard ImageClef2009 medical data set. Results showed that our approach significantly improves the retrieval effectiveness in comparison to state-of-the-art retrieval models

    OLIVE: Speech-Based Video Retrieval

    Get PDF
    This paper describes the Olive project which aims to support automated indexing of video material by use of human language technologies. Olive is making use of speech recognition to automatically derive transcriptions of the sound tracks, generating time-coded linguistic elements which serve as the basis for text-based retrieval functionality. The retrieval demonstrator builds on and extends the architecture from the Pop-Eye project, a system applying human language technology on subtitles for the disclosure of video fragments

    Relating folksonomies with Dublin Core

    Get PDF
    This article presents a research carried out to continue the project Kinds of Tags, which intends to identify elements required for metadata originating from folksonomies. It will provide information that may be used by intelligent applications to assign tags to metadata elements. Despite the unquestionably high value of DC and DC Terms, the pilot study revealed a significant number of tags for which no corresponding properties yet existed. A need for new properties was determined. This article presents the problem, motivation and methodology of the underlying research. It further presents and discusses the findings from the pilot study.(undefined

    Lexicalização de ontologias : o relacionamento entre conteúdo e significado no contexto da Recuperação da Informação

    Get PDF
    Esta proposta visa representar a linguagem natural na forma adequada às ontologias e vice-versa. Para tanto, propõe-se à criação semiautomática de base de léxicos em português brasileiro, contendo informações morfológicas, sintáticas e semânticas apropriadas para a leitura por máquinas, permitindo vincular dados estruturados e não estruturados, bem como integrar a leitura em modelo de recuperação da informação para aumentar a precisão. Os resultados alcançados demonstram a utilização da metodologia, no domínio de risco financeiro em português, para a elaboração da ontologia, da base léxico-semântica e da proposta do modelo de recuperação da informação semântica. Para avaliar a performance do modelo proposto, foram selecionados documentos contendo as principais definições do domínio de risco financeiro. Esses foram indexados com e sem anotação semântica. Para possibilitar a comparação entre as abordagens, foram criadas duas bases, a primeira representando a busca tradicional, e a segunda contendo o índice construído, a partir dos textos com as anotações semânticas para representar a busca semântica. A avaliação da proposta é baseada na revocação e na precisão. As consultas submetidas ao modelo mostram que a busca semântica supera o desempenho da tradicional e validam a metodologia empregada. O procedimento, embora adicione complexidade em sua elaboração, pode ser reproduzido em qualquer outro domínio.The proposal presented in this study seeks to properly represent natural language to ontologies and vice-versa. Therefore, the semi-automatic creation of a lexical database in Brazilian Portuguese containing morphological, syntactic, and semantic information that can be read by machines was proposed, allowing the link between structured and unstructured data and its integration into an information retrieval model to improve precision. The results obtained demonstrated that the methodology can be used in the risco financeiro (financial risk) domain in Portuguese for the construction of an ontology and the lexical-semantic database and the proposal of a semantic information retrieval model. In order to evaluate the performance of the proposed model, documents containing the main definitions of the financial risk domain were selected and indexed with and without semantic annotation. To enable the comparison between the approaches, two databases were created based on the texts with the semantic annotations to represent the semantic search. The first one represents the traditional search and the second contained the index built based on the texts with the semantic annotations to represent the semantic search. The evaluation of the proposal was based on recall and precision. The queries submitted to the model showed that the semantic search outperforms the traditional search and validates the methodology used. Although more complex, the procedure proposed can be used in all kinds of domains

    Une approche d'ingénierie ontologique pour l'acquisition et l'exploitation des connaissances à partir de documents textuels : vers des objets de connaissances et d'apprentissage

    Full text link
    Thèse numérisée par la Division de la gestion de documents et des archives de l'Université de Montréal
    corecore