104,898 research outputs found

    Automatic Classification of Text Databases through Query Probing

    Get PDF
    Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier's rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.Comment: 7 pages, 1 figur

    Electronic Resources and Academic Libraries, 1980-2000: A Historical Perspective

    Get PDF
    published or submitted for publicatio

    Library Research Instruction for Doctor of Ministry Students: Outcomes of Instruction Provided by a Theological Librarian and by a Program Faculty Member

    Full text link
    At some seminaries the question of who is more effective teaching library research is an open question. There are two camps of thought: (1) that the program faculty member is more effective in providing library research instruction as he or she is intimately engaged in the subject of the course(s), or 2) that the theological librarian is more effective in providing library research instruction as he or she is more familiar with the scope of resources that are available, as well as how to obtain “hard to get” resources

    Keywords given by authors of scientific articles in database descriptors

    Get PDF
    This paper analyses the keywords given by authors of scientific articles and the descriptors assigned to the articles in order to ascertain the presence of the keywords in the descriptors. 640 INSPEC, CAB abstracts, ISTA and LISA database records were consulted. After detailed comparisons it was found that keywords provided by authors have an important presence in the database descriptors studied, since nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21% while normalized, are still detected in the descriptors. This means that almost 46% of keywords appear in the descriptors, either as such or after normalization. Elsewhere, three distinct indexing policies appear, one represented by INSPEC and LISA (indexers seem to have freedom to assign the descriptors they deem necessary); another is represented by CAB (no record has fewer than four descriptors and, in general, a large number of descriptors is employed; in contrast, in ISTA, a certain institutional code towards economy in indexing, since 84% of records contain only four descriptors

    XML content warehousing: Improving sociological studies of mailing lists and web data

    Get PDF
    In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

    Natural language processing

    Get PDF
    Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

    Patenting insurance related business methods: predictability and risk

    Get PDF
    This paper raises and responds to questions concerning the patentability of business method patents. It explores the utility of patent applications in informing business method innovators of the risks associated with using the patent system. The insurance industry was chosen since its survival depends on an ability to adapt rapidly in the face of unrelenting, unpredictable change. Inventive changes in the insurance industry include new business models and e-business technologies to improve operating efficiency or to build customer focus. Using the European Patent Office's esp@cenet free patent database, a sample of patent applications for insurance industry innovations was retrieved. The paper then analyses the information contained in the patent application documents. A patent application requires public description of the invention in full enough detail to enable a person familiar with that business to produce it. If the application is successful, a granted patent gives the owner the valuable commercial advantage of a 20-year monopoly. If unsuccessful, the applicant will have disclosed the innovation to competitors

    US government information: selected current issues in public access vs. private competition

    Get PDF
    Web information systems are having a profound effect on the way information is being disseminated today. Current technological advances have caused many government agencies to re-evaluate their practice of contracting with private sector vendors who have traditionally repackaged and marketed the agency\u27s raw data. These new opportunities for government agencies wishing to make information publicly accessible have blurred the traditional distinctions between public and private dissemination activities. Low-cost public dissemination of information has resulted in private sector vendors arguing that public electronic distribution and publication creates unfair competition. New partnerships, such as the recent venture between the National Technical Information Service (NTIS) and the commercial search engine, Northern Light, in developing the ``usgovsearch\u27\u27 product are also being explored. From another viewpoint, library associations are strongly supporting legislation that would broaden,strengthen, and enhance public access to electronic government information. Key issues to be discussed are: (1) the debate concerning public vs. private access to government information; (2) Does electronic access to government information eliminate the need for printed documents? and (3) Joint efforts -- when should the government team up with private sector allies to charge for information services and access

    Searching with Tags: Do Tags Help Users Find Things?

    Get PDF
    This study examines the question of whether tags can be useful in the process of information retrieval. Participants searched a social bookmarking tool specialising in academic articles (CiteULike) and an online journal database (Pubmed). Participant actions were captured using screen capture software and they were asked to describe their search process. Users did make use of tags in their search process, as a guide to searching and as hyperlinks to potentially useful articles. However, users also made use of controlled vocabularies in the journal database to locate useful search terms and of links to related articles supplied by the database
    corecore