23,776 research outputs found

    A New Similarity Measure for Document Classification and Text Mining

    Get PDF
    Accurate, efficient and fast processing of textual data and classification of electronic documents have become an important key factor in knowledge management and related businesses in today’s world. Text mining, information retrieval, and document classification systems have a strong positive impact on digital libraries and electronic content management, e-marketing, electronic archives, customer relationship management, decision support systems, copyright infringement, and plagiarism detection, which strictly affect economics, businesses, and organizations. In this study, we propose a new similarity measure that can be used with k-nearest neighbors (k-NN) and Rocchio algorithms, which are some of the well-known algorithms for document classification, information retrieval, and some other text mining purposes. We have tested our novel similarity measure with some structured textual data sets and we have compared the results with some other standard distance metrics and similarity measures such as Cosine similarity, Euclidean distance, and Pearson correlation coefficient. We have obtained some promising results, which show that this proposed similarity measure could be alternatively used within all suitable algorithms, methods, and models for text mining, document classification, and relevant knowledge management systems. Keywords: text mining, document classification, similarity measures, k-NN, Rocchio algorith

    Z39.50 broadcast searching and Z-server response times: perspectives from CC-interop

    Get PDF
    This paper begins by briefly outlining the evolution of Z39.50 and the current trends, including the work of the JISC CC-interop project. The research crux of the paper focuses on an investigation conducted with respect to testing Z39.50 server (Z-server) response times in a broadcast (parallel) searching environment. Customised software was configured to broadcast a search to all test Z-servers once an hour, for eleven weeks. The results were logged for analysis. Most Z-servers responded rapidly. 'Network congestion' and local OPAC usage were not found to significantly influence Z-server performance. Response time issues encountered by implementers may be the result of non-response by the Z-server and how Z-client software deals with this. The influence of 'quick and dirty' Z39.50 implementations is also identified as a potential cause of slow broadcast searching. The paper indicates various areas for further research, including setting shorter time-outs and greater end-user behavioural research to ascertain user requirements in this area. The influence more complex searches, such as Boolean, have on response times and suboptimal Z39.50 implementations are also emphasised for further study. This paper informs the LIS research community and has practical implications for those establishing Z39.50 based distributed systems, as well as those in the Web Services community. The paper challenges popular LIS opinion that Z39.50 is inherently sluggish and thus unsuitable for the demands of the modern user

    Management of e-Resources in R amp; D Centers: A Case Study of the Information Center at NAL13;

    Get PDF
    The developments in information technology and their applications to library and information services have given new dimension to the entire spectrum of information management. The information generated is usually stored in four physical media: paper, film, optical, and magnetic disks. The e-document be it a book, journal, technical report, conference proceedings is portable; has random access to its contents; and the document can also be a multimedia object, in that it may contain not only text, but also graphics, drawings, photographs or video. Now we have the emergence of publications over the electronic networks and the activity took off in a big way following the invention of the World Wide Web. The Open Access movement is becoming the order of the day. More than 3000 journals are free on net for anybody to access. A number of Institutional repositories and e-Prints archives have thrown challenge to the publishing industry. Consortium approach through different pricing, management and licensing models is enabling the libraries to provide access to thousands of e- journals, e-books and other kinds of e-documents. The Information center at NAL with its state-of-the-art library has progressed a good deal in this direction by acquiring different kind of documents especially e-form, cataloguing amp; processing them appropriately, storing and giving access to its patrons not only in library premises, but on to the desk tops spread in three different campuses through laboratory LAN and also extending selected services through Internet for the benefit of any body from any part of the world. 13; Created and maintained by ICAST the Portal x2018;AeroInfox2019; (www.aeroinfo.org.in) serves as one window information search facility for Web sources in aerospace science and technology. This virtual library facilitates multiple approach to information seekers as the web sources are indexed and organised using different schemes of classification including NASA subject categories. Care is taken to cover Indian aerospace sources exhaustively. The ICAST site (www.icast.org.in), apart from giving detailed information about library sources including books, journals, E-journals, databases and technical reports makes available different search tools for its users. Other details like working hours, library rules, staff details, contact persons, etc are provided. One can submit an online query and suggest documents for acquisition using online forms provided. The Library Database (OPAC) is probably is single largest in the country with more than 3.25 lakh bibliographic records of books, technical reports, patents, standards, journals, etc. ICAST users can search International databases like Aerospace Database, NTIS, J-Gate, Medline, etc through campus LAN. Users can access more than 2500 full text journals covering titles published by Elsevier (ScienceDirect), ASME, AIAA, Springer, John Wiley, OUP, CUP, AMS, World Scientific, few Annual Series, etc. Created by ICAST an e-journals gateway with browse and search (alphabetical and subject wise) facility for titles provides access to more than 700 journals available free on the net. The Centre provides a number of web/e-mail based innovative information services including Journal Contents Service, News Clipping Service, Monthly Documents Additions Lists covering both Books and Technical Reports, Web Alert Service and Union Catalogue of Journals -CSIR and Aerospace Libraries, etc

    Developing information services for special library users by designing a low cost digital library : the experiment of NOC-Digital Library

    Get PDF
    This research originates from a belief that special libraries in developing countries need to modernise and implement their ICT infrastructure and articulate information policies that will facilitate the exploitation of information resources to the optimum to increase national productivity. Special libraries and information centres in developing countries in general and in the Arab world in particular should start building their local digital libraries, as the benefit of establishing such electronic services is considerably massive and well known for expansion of research activities and for delivering services that satisfy the needs of targeted users. The aim of this paper is to provide general guideline for design a low cost digital library providing services that are most frequently required by various categories of special library users in developing countries. This paper also aims at illustrating strategies and method approaches that can be adopted for building such projects. The paper intends to describe the phases and stages implemented for building a low cost digital library services for the NOC. It also aims at highlighting the barriers and obstacles facing Arabic content in the digitization stage

    How to make best use of the intellectual output of a country? a simple approach to the design of a digital library of theses and dissertations in Indian universities

    Get PDF
    In India there are hundreds of universities and other academic institutions catering for the needs of millions of users. INFLIBNET is in an excellent position to spearhead in building such digital libraries, and in this paper I would like to propose some simple guidelines for this

    Why Print and Electronic Resources Are Essential to the Academic Law Library

    Get PDF
    Libraries have supported multiple formats for decades, from paper and microforms to audiovisual tapes and CDs. However, the newest medium, digital transmission, has presented a wider scope of challenges and caused library patrons to question the established and recognized multiformat library. Within the many questions posed, two distinct ones echo repeatedly. The first doubts the need to sustain print in an increasingly digital world, and the second warns of the dangers of relying on a still-developing technology. This article examines both of these positions and concludes that abandoning either format would translate into a failure of service to patrons, both present and future

    Digital archiving of manuscripts and other heritage items for conservation and information retrieval

    Get PDF
    Expression of cultural heritage looking from the informatics angle falls into text, images, video and sound categories. ICT can be used to conserve all these heritage items like; the text information consisting of palm leaf manuscripts, stone tablets, handwritten paper documents, old printed records, books, microfilms, fiche etc, images including paintings, drawings, photographs and the like, sound items which includes musical concerts, poetry recitations, chanting of mantras, talks of important persons etc, and video items like archival films historical importance. To retrieve required information from such a large mass of materials in different formats and to transmit them across space and time, there are several limitations. Digital technology allows hitherto unavailable facilities for durable storage and speedy and efficient transmission / retrieval of information contained in all the above formats. Hypertext and hypermedia features of digital media enable integrating text with graphics, sound, video and animation. This paper discusses the international and national efforts for digitizing heritage items, digital archiving solutions available, the possibilities of the media, and the need to follow standards prescribed by organizations like UNESCO to enable easy exchange and pooling of information and documents generated in digital archiving systems at national and international level. The need to develop language technology for local scripts for organizing and preserving our cultural heritage is also stressed

    Browsing a digital library: A new approach for the New Zealand digital library

    Get PDF
    Browsing is part of the information seeking process, used when information needs are ill-defined or unspecific. Browsing and searching are often interleaved during information seeking to accommodate changing awareness of information needs. Digital Libraries often support full-text search, but are not so helpful in supporting browsing. Described here is a novel browsing system created for the Greenstone software used by the New Zealand Digital Library that supports users in a more natural approach to the information seeking process. © Springer-Verlag Berlin Heidelberg 2003

    Access to information in digital libraries : users and digital divide

    Get PDF
    Recognising the importance of information and knowledge in all spheres of human life, the recently held World Summit on Information Society came up with a plan of action for building a global information society. The goal of the world information society initiatives is the same as that of digital library research and development - to make information and knowledge accessibleto everyone in the world. Digital libraries have progressed very rapidly over the past ten or soyears. This paper addresses the two most important aspects of the information society - information users and digital divide. Findings of some large-scale studies on human information behaviour on the web and digital libraries have been discussed. The major findings of a study on access to electronic resources by university students are the presented. Proposed that a one-stop window approach with a task-based information organisation and access system may be the way forward

    Catalogers explore a new frontier: establishing a NEASC evidence center

    Get PDF
    This article describes how cataloging staff at the Roger Williams University Library established, managed, and planned to preserve an online NEASC Evidence Center for the University’s reaccreditation process. It highlights use of MARC and AACR2rev for effective organization of the Center’s records and the continuing importance of professional cataloging skills
    corecore