756 research outputs found

    Experiences in Automatic Keywording of Particle Physics Literature

    Get PDF
    Attributing keywords can assist in the classification and retrieval of documents in the particle physics literature. As information services face a future with less available manpower and more and more documents being written, the possibility of keyword attribution being assisted by automatic classification software is explored. A project being carried out at CERN (the European Laboratory for Particle Physics) for the development and integration of automatic keywording is described

    The Relevance of Relevance: Forgetting Strategies and Contingency in Postmodern Memory

    Get PDF
    We live in a “search engine society”. Underlying this self-description of post-modern society there is the crucial dependency of social memory from archives. Apart from moral and legal concerns, search engines are sociologically intriguing subject because of their close connection with the evolution of social memory. In this contribution I argue that search engines are non-semantic indexing systems which turn the circular interplay between users and the machine into a cybernetic system. The main function of this cybernetic system is to minimize the deviation from a difference, that between relevant and not-relevant. Through mechanical archives, post-modern social memory can cope with increasing knowledge complexity. The main challenge in this respect is how to preserve the capability of discarding in order to produce information

    Hybrid Information Retrieval Model For Web Images

    Full text link
    The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Computer Science & Emerging Technologies (IJCSET), Vol. 3, No. 1, February 201

    Use of normalized word vector approach in document classification for an LKMC

    Get PDF
    In order to realize the objective of expanding library services to provide knowledge managementsupport for small businesses, a series of requirements must be met. This particular phase of a largerresearch project focuses on one of the requirements: the need for a document classificationsystem to rapidly determine the content of digital documents. Document classification techniquesare examined to assess the available alternatives for realization of Library Knowledge ManagementCenters (LKMCs). After evaluating prominent techniques the authors opted to investigate aless well-known method, the Normalized Word Vector (NWV) approach, which has been usedsuccessfully in classifying highly unstructured documents, i.e., student essays. The authors proposeutilizing the NWV approach for LKMC automatic document classification with the goal ofdeveloping a system whereby unfamiliar documents can be quickly classified into existing topiccategories. This conceptual paper will outline an approach to test NWV's suitability in this area

    Towards the Automatic Classification of Documents in User-generated Classifications

    Get PDF
    There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing
    corecore