107,628 research outputs found
Research on WeâMedia Information Retrieval Technology of Knowledge Map
As the development direction of information retrieval technology gradually evolves toward the relationship of search entities, traditional relational databases are difficult to satisfy, and graph databases are specifically created to handle the relationships between data. This article explains the basic concept of graph database, and takes the example of domain-specific database information retrieval as an example, analyzes its advantages and disadvantages, and analyzes the challenges faced by the graph database in full-text information retrieval
Automatic detection and extraction of artificial text in video
A significant challenge in large multimedia databases is the
provision of efficient means for semantic indexing and retrieval of visual information. Artificial text in video is normally generated in order to supplement or summarise the visual content and thus is an important carrier of information that is highly relevant to the content of the video. As such, it is a potential ready-to-use source of semantic information. In this paper we present an algorithm for detection and localisation of artificial text in video using a horizontal difference magnitude measure and morphological processing. The result of character segmentation, based on a modified version of the Wolf-Jolion
algorithm [1][2] is enhanced using smoothing and multiple
binarisation. The output text is input to an âoff-the-shelfâ noncommercial OCR. Detection, localisation and recognition results for a 20min long MPEG-1 encoded television programme are presented
Contextual Information Retrieval based on Algorithmic Information Theory and Statistical Outlier Detection
The main contribution of this paper is to design an Information Retrieval
(IR) technique based on Algorithmic Information Theory (using the Normalized
Compression Distance- NCD), statistical techniques (outliers), and novel
organization of data base structure. The paper shows how they can be integrated
to retrieve information from generic databases using long (text-based) queries.
Two important problems are analyzed in the paper. On the one hand, how to
detect "false positives" when the distance among the documents is very low and
there is actual similarity. On the other hand, we propose a way to structure a
document database which similarities distance estimation depends on the length
of the selected text. Finally, the experimental evaluations that have been
carried out to study previous problems are shown.Comment: Submitted to 2008 IEEE Information Theory Workshop (6 pages, 6
figures
Enhancing Automatic Annotation for Optimal Image Retrieval
Image search and retrieval based on content is very cumbersome task particularly when the image database is large. The accuracy of the retrieval as well as the processing speed are two important measures used for assessing and comparing the effectiveness of various systems.
Text retrieval is more mature and advanced than image content retrieval. In this dissertation, the focus is on converting image content into text tags that can be easily searched using standard search engines where the size and speed issues of the database have been already dealt with.
Therefore, image tagging becomes an essential tool for image retrieval from large image databases. Automation of image tagging has received considerable attention by many researchers in recent years. The optimal goal of image description is to automatically annotate images with tags that semantically represent the image content. The speed and accuracy of Image retrieval from large databases are few of the important domains that can benefit from automatic tagging.
In this work, several state of the art image classification and image tagging techniques are reviewed. We propose a new self-learning multilayered tagging framework that can address the limitations of current approaches and provide mutual accuracy improvement between the recognition layer and the annotation layer. Our results indicate that the proposed framework can improve the overall accuracy of information retrieval in a variety of image databases
Word Embedding Driven Concept Detection in Philosophical Corpora
During the course of research, scholars often explore large textual databases for segments of text relevant to their conceptual analyses. This study proposes, develops and evaluates two algorithms for automated concept detection in theoretical corpora: ACS and WMD retrieval. Both novel algorithms are compared to key word retrieval, using a test set from the Digital Ricoeur corpus tagged by scholarly experts. WMD retrieval outperforms key word search on the concept detection task. Thus, WMD retrieval is a promising tool for concept detection and information retrieval systems focused on theoretical corpora
Appunti di informatica per le biblioteche
This text is an overview about library automation. It starts with information retrieval, metadata, MARC, XML and other useful definition of standards and formats. The authors then approach tools such as ILS, OPAC and discovery tools, and sources (full text, such as digital libraries and e-journals, and bibliographic ones, such as citation databases). The last part is a brief history of library automation in Italy
Building End-User Thesauri from Full-Text
We are interested in the possible contribution of end-user thesauri to the improvement of information retrieval by end- users. Thesauri (from the Greek for treasure or treasury) in information retrieval attempt to record and display relations among concepts and terms -- to be treasuries of concepts and the terms that represent them. End-user thesauri are designed to guide and facilitate end-user searching of textual databases (both full-text databases and reference databases that contain only surrogates of full-texts, such as abstracts). End-user thesauri link: the vocabulary of the searcher and the vocabulary of the database, functioning as part of the user database interface. End-user thesauri are not designed to guide indexing, although they can be used to suggest terms, much like writers have used Roget's thesaurus for centuries
- âŠ