104,898 research outputs found
Automatic Classification of Text Databases through Query Probing
Many text databases on the web are "hidden" behind search interfaces, and
their documents are only accessible through querying. Search engines typically
ignore the contents of such search-only databases. Recently, Yahoo-like
directories have started to manually organize these databases into categories
that users can browse to find these valuable resources. We propose a novel
strategy to automate the classification of search-only text databases. Our
technique starts by training a rule-based document classifier, and then uses
the classifier's rules to generate probing queries. The queries are sent to the
text databases, which are then classified based on the number of matches that
they produce for each query. We report some initial exploratory experiments
that show that our approach is promising to automatically characterize the
contents of text databases accessible on the web.Comment: 7 pages, 1 figur
Electronic Resources and Academic Libraries, 1980-2000: A Historical Perspective
published or submitted for publicatio
Library Research Instruction for Doctor of Ministry Students: Outcomes of Instruction Provided by a Theological Librarian and by a Program Faculty Member
At some seminaries the question of who is more effective teaching library research is an open question. There are two camps of thought: (1) that the program faculty member is more effective in providing library research instruction as he or she is intimately engaged in the subject of the course(s), or 2) that the theological librarian is more effective in providing library research instruction as he or she is more familiar with the scope of resources that are available, as well as how to obtain “hard to get” resources
Keywords given by authors of scientific articles in database descriptors
This paper analyses the keywords given by authors of scientific articles and the descriptors assigned to the articles in order to ascertain the presence of the keywords in the descriptors. 640 INSPEC, CAB abstracts, ISTA and LISA database records were consulted. After detailed comparisons it was found that keywords provided by authors have an important presence in the database descriptors studied, since nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21% while normalized, are still detected in the descriptors. This means that almost 46% of keywords appear in the descriptors, either as such or after normalization. Elsewhere, three distinct indexing policies appear, one represented by INSPEC and LISA (indexers seem to have freedom to assign the descriptors they deem necessary); another is represented by CAB (no record has fewer than four descriptors and, in general, a large number of descriptors is employed; in contrast, in ISTA, a certain institutional code towards economy in indexing, since 84% of records contain only four descriptors
XML content warehousing: Improving sociological studies of mailing lists and web data
In this paper, we present the guidelines for an XML-based approach for the
sociological study of Web data such as the analysis of mailing lists or
databases available online. The use of an XML warehouse is a flexible solution
for storing and processing this kind of data. We propose an implemented
solution and show possible applications with our case study of profiles of
experts involved in W3C standard-setting activity. We illustrate the
sociological use of semi-structured databases by presenting our XML Schema for
mailing-list warehousing. An XML Schema allows many adjunctions or crossings of
data sources, without modifying existing data sets, while allowing possible
structural evolution. We also show that the existence of hidden data implies
increased complexity for traditional SQL users. XML content warehousing allows
altogether exhaustive warehousing and recursive queries through contents, with
far less dependence on the initial storage. We finally present the possibility
of exporting the data stored in the warehouse to commonly-used advanced
software devoted to sociological analysis
Natural language processing
Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems
Patenting insurance related business methods: predictability and risk
This paper raises and responds to questions concerning the patentability of business method patents. It explores the utility of patent applications in informing business method innovators of the risks associated with using the patent system. The insurance industry was chosen since its survival depends on an ability to adapt rapidly in the face of unrelenting, unpredictable change. Inventive changes in the insurance industry include new business models and e-business technologies to improve operating efficiency or to build customer focus.
Using the European Patent Office's esp@cenet free patent database, a sample of patent applications for insurance industry innovations was retrieved. The paper then analyses the information contained in the patent application documents. A patent application requires public description of the invention in full enough detail to enable a person familiar with that business to produce it. If the application is successful, a granted patent gives the owner the valuable commercial advantage of a 20-year monopoly. If unsuccessful, the applicant will have disclosed the innovation to competitors
US government information: selected current issues in public access vs. private competition
Web information systems are having a profound effect on the way information is being disseminated today. Current technological advances have caused many government agencies to re-evaluate their practice of contracting with private sector vendors who have traditionally repackaged and marketed the agency\u27s raw data. These new opportunities for government agencies wishing to make information publicly accessible have blurred the traditional distinctions between public and private dissemination activities. Low-cost public dissemination of information has resulted in private sector vendors arguing that public electronic distribution and publication creates unfair competition. New partnerships, such as the recent venture between the National Technical Information Service (NTIS) and the commercial search engine, Northern Light, in developing the ``usgovsearch\u27\u27 product are also being explored. From another viewpoint, library associations are strongly supporting legislation that would broaden,strengthen, and enhance public access to electronic government information. Key issues to be discussed are: (1) the debate concerning public vs. private access to government information; (2) Does electronic access to government information eliminate the need for printed documents? and (3) Joint efforts -- when should the government team up with private sector allies to charge for information services and access
Searching with Tags: Do Tags Help Users Find Things?
This study examines the question of whether tags can be useful in the process of information retrieval. Participants searched a social bookmarking tool specialising in academic articles (CiteULike) and an online journal database (Pubmed). Participant actions were captured using screen capture software and they were asked to describe their search process. Users did make use of tags in their search process, as a guide to searching and as hyperlinks to potentially useful articles. However, users also made use of controlled vocabularies in the journal database to locate useful search terms and of links to related articles supplied by the database
- …