107,628 research outputs found

    Research on We—Media Information Retrieval Technology of Knowledge Map

    Get PDF
    As the development direction of information retrieval technology gradually evolves toward the relationship of search entities, traditional relational databases are difficult to satisfy, and graph databases are specifically created to handle the relationships between data. This article explains the basic concept of graph database, and takes the example of domain-specific database information retrieval as an example, analyzes its advantages and disadvantages, and analyzes the challenges faced by the graph database in full-text information retrieval

    Automatic detection and extraction of artificial text in video

    Get PDF
    A significant challenge in large multimedia databases is the provision of efficient means for semantic indexing and retrieval of visual information. Artificial text in video is normally generated in order to supplement or summarise the visual content and thus is an important carrier of information that is highly relevant to the content of the video. As such, it is a potential ready-to-use source of semantic information. In this paper we present an algorithm for detection and localisation of artificial text in video using a horizontal difference magnitude measure and morphological processing. The result of character segmentation, based on a modified version of the Wolf-Jolion algorithm [1][2] is enhanced using smoothing and multiple binarisation. The output text is input to an “off-the-shelf” noncommercial OCR. Detection, localisation and recognition results for a 20min long MPEG-1 encoded television programme are presented

    Contextual Information Retrieval based on Algorithmic Information Theory and Statistical Outlier Detection

    Full text link
    The main contribution of this paper is to design an Information Retrieval (IR) technique based on Algorithmic Information Theory (using the Normalized Compression Distance- NCD), statistical techniques (outliers), and novel organization of data base structure. The paper shows how they can be integrated to retrieve information from generic databases using long (text-based) queries. Two important problems are analyzed in the paper. On the one hand, how to detect "false positives" when the distance among the documents is very low and there is actual similarity. On the other hand, we propose a way to structure a document database which similarities distance estimation depends on the length of the selected text. Finally, the experimental evaluations that have been carried out to study previous problems are shown.Comment: Submitted to 2008 IEEE Information Theory Workshop (6 pages, 6 figures

    Enhancing Automatic Annotation for Optimal Image Retrieval

    Get PDF
    Image search and retrieval based on content is very cumbersome task particularly when the image database is large. The accuracy of the retrieval as well as the processing speed are two important measures used for assessing and comparing the effectiveness of various systems. Text retrieval is more mature and advanced than image content retrieval. In this dissertation, the focus is on converting image content into text tags that can be easily searched using standard search engines where the size and speed issues of the database have been already dealt with. Therefore, image tagging becomes an essential tool for image retrieval from large image databases. Automation of image tagging has received considerable attention by many researchers in recent years. The optimal goal of image description is to automatically annotate images with tags that semantically represent the image content. The speed and accuracy of Image retrieval from large databases are few of the important domains that can benefit from automatic tagging. In this work, several state of the art image classification and image tagging techniques are reviewed. We propose a new self-learning multilayered tagging framework that can address the limitations of current approaches and provide mutual accuracy improvement between the recognition layer and the annotation layer. Our results indicate that the proposed framework can improve the overall accuracy of information retrieval in a variety of image databases

    Word Embedding Driven Concept Detection in Philosophical Corpora

    Get PDF
    During the course of research, scholars often explore large textual databases for segments of text relevant to their conceptual analyses. This study proposes, develops and evaluates two algorithms for automated concept detection in theoretical corpora: ACS and WMD retrieval. Both novel algorithms are compared to key word retrieval, using a test set from the Digital Ricoeur corpus tagged by scholarly experts. WMD retrieval outperforms key word search on the concept detection task. Thus, WMD retrieval is a promising tool for concept detection and information retrieval systems focused on theoretical corpora

    Appunti di informatica per le biblioteche

    Get PDF
    This text is an overview about library automation. It starts with information retrieval, metadata, MARC, XML and other useful definition of standards and formats. The authors then approach tools such as ILS, OPAC and discovery tools, and sources (full text, such as digital libraries and e-journals, and bibliographic ones, such as citation databases). The last part is a brief history of library automation in Italy

    Building End-User Thesauri from Full-Text

    Get PDF
    We are interested in the possible contribution of end-user thesauri to the improvement of information retrieval by end- users. Thesauri (from the Greek for treasure or treasury) in information retrieval attempt to record and display relations among concepts and terms -- to be treasuries of concepts and the terms that represent them. End-user thesauri are designed to guide and facilitate end-user searching of textual databases (both full-text databases and reference databases that contain only surrogates of full-texts, such as abstracts). End-user thesauri link: the vocabulary of the searcher and the vocabulary of the database, functioning as part of the user database interface. End-user thesauri are not designed to guide indexing, although they can be used to suggest terms, much like writers have used Roget's thesaurus for centuries
    • 

    corecore