Search CORE

756 research outputs found

Experiences in Automatic Keywording of Particle Physics Literature

Author: Dallman David
Montejo Ráez Arturo
Publication venue: Union of Concerned Scientists
Publication date: 01/01/2001
Field of study

Attributing keywords can assist in the classification and retrieval of documents in the particle physics literature. As information services face a future with less available manpower and more and more documents being written, the possibility of keyword attribution being assisted by automatic classification software is explored. A project being carried out at CERN (the European Laboratory for Particle Physics) for the development and integration of automatic keywording is described

E-LIS

The Relevance of Relevance: Forgetting Strategies and Contingency in Postmodern Memory

Author: Cevolini Alberto
Publication venue
Publication date: 01/01/2024
Field of study

We live in a “search engine society”. Underlying this self-description of post-modern society there is the crucial dependency of social memory from archives. Apart from moral and legal concerns, search engines are sociologically intriguing subject because of their close connection with the evolution of social memory. In this contribution I argue that search engines are non-semantic indexing systems which turn the circular interplay between users and the machine into a cybernetic system. The main function of this cybernetic system is to minimize the deviation from a difference, that between relevant and not-relevant. Through mechanical archives, post-modern social memory can cope with increasing knowledge complexity. The main challenge in this respect is how to preserve the capability of discarding in order to produce information

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Hybrid Information Retrieval Model For Web Images

Author: Bassil Youssef
Publication venue
Publication date: 20/02/2012
Field of study

The Bing Bang of the Internet in the early 90's increased dramatically the number of images being distributed and shared over the web. As a result, image information retrieval systems were developed to index and retrieve image files spread over the Internet. Most of these systems are keyword-based which search for images based on their textual metadata; and thus, they are imprecise as it is vague to describe an image with a human language. Besides, there exist the content-based image retrieval systems which search for images based on their visual information. However, content-based type systems are still immature and not that effective as they suffer from low retrieval recall/precision rate. This paper proposes a new hybrid image information retrieval model for indexing and retrieving web images published in HTML documents. The distinguishing mark of the proposed model is that it is based on both graphical content and textual metadata. The graphical content is denoted by color features and color histogram of the image; while textual metadata are denoted by the terms that surround the image in the HTML document, more particularly, the terms that appear in the tags p, h1, and h2, in addition to the terms that appear in the image's alt attribute, filename, and class-label. Moreover, this paper presents a new term weighting scheme called VTF-IDF short for Variable Term Frequency-Inverse Document Frequency which unlike traditional schemes, it exploits the HTML tag structure and assigns an extra bonus weight for terms that appear within certain particular HTML tags that are correlated to the semantics of the image. Experiments conducted to evaluate the proposed IR model showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences, http://www.lacsc.org/; International Journal of Computer Science & Emerging Technologies (IJCSET), Vol. 3, No. 1, February 201

arXiv.org e-Print Archive

CiteSeerX

ExcelingTech Publishing Company (E-Journals)

Use of normalized word vector approach in document classification for an LKMC

Author: Nitse P.
Parker K.
Tay A.
Williams Robert
Publication venue: 'Informing Science Institute'
Publication date: 01/01/2008
Field of study

In order to realize the objective of expanding library services to provide knowledge managementsupport for small businesses, a series of requirements must be met. This particular phase of a largerresearch project focuses on one of the requirements: the need for a document classificationsystem to rapidly determine the content of digital documents. Document classification techniquesare examined to assess the available alternatives for realization of Library Knowledge ManagementCenters (LKMCs). After evaluating prominent techniques the authors opted to investigate aless well-known method, the Normalized Word Vector (NWV) approach, which has been usedsuccessfully in classifying highly unstructured documents, i.e., student essays. The authors proposeutilizing the NWV approach for LKMC automatic document classification with the goal ofdeveloping a system whereby unfamiliar documents can be quickly classified into existing topiccategories. This conceptual paper will outline an approach to test NWV's suitability in this area

espace@Curtin

Towards the Automatic Classification of Documents in User-generated Classifications

Author: Morshed Ahsan-Ul
Publication venue
Publication date: 01/01/2006
Field of study

There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing

Unitn-eprints Research