756 research outputs found
Experiences in Automatic Keywording of Particle Physics Literature
Attributing keywords can assist in the classification and retrieval of documents in the particle physics literature. As information services face a future with less available manpower and more and more documents being written, the possibility of keyword attribution being assisted by automatic classification software is explored. A project being carried out at CERN (the European Laboratory for Particle Physics) for the development and integration of automatic keywording is described
The Relevance of Relevance: Forgetting Strategies and Contingency in Postmodern Memory
We live in a “search engine society”. Underlying this self-description of
post-modern society there is the crucial dependency of social memory
from archives. Apart from moral and legal concerns, search engines are
sociologically intriguing subject because of their close connection with
the evolution of social memory. In this contribution I argue that search
engines are non-semantic indexing systems which turn the circular interplay
between users and the machine into a cybernetic system. The
main function of this cybernetic system is to minimize the deviation
from a difference, that between relevant and not-relevant. Through mechanical
archives, post-modern social memory can cope with increasing
knowledge complexity. The main challenge in this respect is how to preserve
the capability of discarding in order to produce information
Hybrid Information Retrieval Model For Web Images
The Bing Bang of the Internet in the early 90's increased dramatically the
number of images being distributed and shared over the web. As a result, image
information retrieval systems were developed to index and retrieve image files
spread over the Internet. Most of these systems are keyword-based which search
for images based on their textual metadata; and thus, they are imprecise as it
is vague to describe an image with a human language. Besides, there exist the
content-based image retrieval systems which search for images based on their
visual information. However, content-based type systems are still immature and
not that effective as they suffer from low retrieval recall/precision rate.
This paper proposes a new hybrid image information retrieval model for indexing
and retrieving web images published in HTML documents. The distinguishing mark
of the proposed model is that it is based on both graphical content and textual
metadata. The graphical content is denoted by color features and color
histogram of the image; while textual metadata are denoted by the terms that
surround the image in the HTML document, more particularly, the terms that
appear in the tags p, h1, and h2, in addition to the terms that appear in the
image's alt attribute, filename, and class-label. Moreover, this paper presents
a new term weighting scheme called VTF-IDF short for Variable Term
Frequency-Inverse Document Frequency which unlike traditional schemes, it
exploits the HTML tag structure and assigns an extra bonus weight for terms
that appear within certain particular HTML tags that are correlated to the
semantics of the image. Experiments conducted to evaluate the proposed IR model
showed a high retrieval precision rate that outpaced other current models.Comment: LACSC - Lebanese Association for Computational Sciences,
http://www.lacsc.org/; International Journal of Computer Science & Emerging
Technologies (IJCSET), Vol. 3, No. 1, February 201
Use of normalized word vector approach in document classification for an LKMC
In order to realize the objective of expanding library services to provide knowledge managementsupport for small businesses, a series of requirements must be met. This particular phase of a largerresearch project focuses on one of the requirements: the need for a document classificationsystem to rapidly determine the content of digital documents. Document classification techniquesare examined to assess the available alternatives for realization of Library Knowledge ManagementCenters (LKMCs). After evaluating prominent techniques the authors opted to investigate aless well-known method, the Normalized Word Vector (NWV) approach, which has been usedsuccessfully in classifying highly unstructured documents, i.e., student essays. The authors proposeutilizing the NWV approach for LKMC automatic document classification with the goal ofdeveloping a system whereby unfamiliar documents can be quickly classified into existing topiccategories. This conceptual paper will outline an approach to test NWV's suitability in this area
Towards the Automatic Classification of Documents in User-generated Classifications
There is a huge amount of information scattered on the World Wide Web. As the information flow occurs at a high speed in the WWW, there is a need to organize it in the right manner so that a user can access it very easily. Previously the organization of information was generally done manually, by matching the document contents to some pre-defined categories. There are two approaches for this text-based categorization: manual and automatic. In the manual approach, a human expert performs the classification task, and in the second case supervised classifiers are used to automatically classify resources. In a supervised classification, manual interaction is required to create some training data before the automatic classification task takes place. In our new approach, we intend to propose automatic classification of documents through semantic keywords and building the formulas generation by these keywords. Thus we can reduce this human participation by combining the knowledge of a given classification and the knowledge extracted from the data. The main focus of this PhD thesis, supervised by Prof. Fausto Giunchiglia, is the automatic classification of documents into user-generated classifications. The key benefits foreseen from this automatic document classification is not only related to search engines, but also to many other fields like, document organization, text filtering, semantic index managing
- …