Search CORE

104,898 research outputs found

Automatic Classification of Text Databases through Query Probing

Author: D. Hawking
D. Koller
D. Koller
J. P. Callan
J. Xu
L. Gravano
M. Perkowitz
S. Gauch
W. Meng
W. Meng
W. W. Cohen
W. W. Cohen
Publication venue
Publication date: 01/01/2000
Field of study

Many text databases on the web are "hidden" behind search interfaces, and their documents are only accessible through querying. Search engines typically ignore the contents of such search-only databases. Recently, Yahoo-like directories have started to manually organize these databases into categories that users can browse to find these valuable resources. We propose a novel strategy to automate the classification of search-only text databases. Our technique starts by training a rule-based document classifier, and then uses the classifier's rules to generate probing queries. The queries are sent to the text databases, which are then classified based on the number of matches that they produce for each query. We report some initial exploratory experiments that show that our approach is promising to automatically characterize the contents of text databases accessible on the web.Comment: 7 pages, 1 figur

arXiv.org e-Print Archive

CiteSeerX

Crossref

Columbia University Academic Commons

Electronic Resources and Academic Libraries, 1980-2000: A Historical Perspective

Author: Miller Ruth H.
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/2000
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Library Research Instruction for Doctor of Ministry Students: Outcomes of Instruction Provided by a Theological Librarian and by a Program Faculty Member

Author: Birch Rodney
Kamilos Charles D
Publication venue: Digital Commons @ George Fox University
Publication date: 01/01/2014
Field of study

At some seminaries the question of who is more effective teaching library research is an open question. There are two camps of thought: (1) that the program faculty member is more effective in providing library research instruction as he or she is intimately engaged in the subject of the course(s), or 2) that the theological librarian is more effective in providing library research instruction as he or she is more familiar with the scope of resources that are available, as well as how to obtain “hard to get” resources

Directory of Open Access Journals

Digital Commons @ George Fox University

Keywords given by authors of scientific articles in database descriptors

Author: Alimohammadi
Ansari
Boger
Craven
Craven
Gbur
Gil-Leiva
Gil-Leiva
Gil-Leiva
Gil-Leiva
Gross
Hartley
Hersh
Hmeidi
International Association for Standardization (ISO)
Jones
Kishida
Ko
Lancheng
Montejo Ráez
Ripplinger
Silvester
Taghva
Tillotson
Turney
Voorbij
Publication venue
Publication date: 01/01/2007
Field of study

This paper analyses the keywords given by authors of scientific articles and the descriptors assigned to the articles in order to ascertain the presence of the keywords in the descriptors. 640 INSPEC, CAB abstracts, ISTA and LISA database records were consulted. After detailed comparisons it was found that keywords provided by authors have an important presence in the database descriptors studied, since nearly 25% of all the keywords appeared in exactly the same form as descriptors, with another 21% while normalized, are still detected in the descriptors. This means that almost 46% of keywords appear in the descriptors, either as such or after normalization. Elsewhere, three distinct indexing policies appear, one represented by INSPEC and LISA (indexers seem to have freedom to assign the descriptors they deem necessary); another is represented by CAB (no record has fewer than four descriptors and, in general, a large number of descriptors is employed; in contrast, in ISTA, a certain institutional code towards economy in indexing, since 84% of records contain only four descriptors

E-LIS

Crossref

XML content warehousing: Improving sociological studies of mailing lists and web data

Author: Colazzo Dario
Dudouet François-Xavier
Manolescu Ioana
Nguyen Benjamin
Senellart Pierre
Vion Antoine
Publication venue
Publication date: 01/01/2011
Field of study

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing and processing this kind of data. We propose an implemented solution and show possible applications with our case study of profiles of experts involved in W3C standard-setting activity. We illustrate the sociological use of semi-structured databases by presenting our XML Schema for mailing-list warehousing. An XML Schema allows many adjunctions or crossings of data sources, without modifying existing data sets, while allowing possible structural evolution. We also show that the existence of hidden data implies increased complexity for traditional SQL users. XML content warehousing allows altogether exhaustive warehousing and recursive queries through contents, with far less dependence on the initial storage. We finally present the possibility of exporting the data stored in the warehouse to commonly-used advanced software devoted to sociological analysis

arXiv.org e-Print Archive

Base de publications de l'université Paris-Dauphine

Crossref

INRIA a CCSD electronic archive server

HAL UVSQ

HAL-Rennes 1

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney

Patenting insurance related business methods: predictability and risk

Author: Soetendorp Ruth
Publication venue
Publication date: 05/01/2003
Field of study

This paper raises and responds to questions concerning the patentability of business method patents. It explores the utility of patent applications in informing business method innovators of the risks associated with using the patent system. The insurance industry was chosen since its survival depends on an ability to adapt rapidly in the face of unrelenting, unpredictable change. Inventive changes in the insurance industry include new business models and e-business technologies to improve operating efficiency or to build customer focus. Using the European Patent Office's esp@cenet free patent database, a sample of patent applications for insurance industry innovations was retrieved. The paper then analyses the information contained in the patent application documents. A patent application requires public description of the invention in full enough detail to enable a person familiar with that business to produce it. If the application is successful, a granted patent gives the owner the valuable commercial advantage of a 20-year monopoly. If unsuccessful, the applicant will have disclosed the innovation to competitors

Bournemouth University Research Online

US government information: selected current issues in public access vs. private competition

Author: McMullen Susan
Publication venue: DOCS@RWU
Publication date: 30/03/2000
Field of study

Web information systems are having a profound effect on the way information is being disseminated today. Current technological advances have caused many government agencies to re-evaluate their practice of contracting with private sector vendors who have traditionally repackaged and marketed the agency\u27s raw data. These new opportunities for government agencies wishing to make information publicly accessible have blurred the traditional distinctions between public and private dissemination activities. Low-cost public dissemination of information has resulted in private sector vendors arguing that public electronic distribution and publication creates unfair competition. New partnerships, such as the recent venture between the National Technical Information Service (NTIS) and the commercial search engine, Northern Light, in developing the ``usgovsearch\u27\u27 product are also being explored. From another viewpoint, library associations are strongly supporting legislation that would broaden,strengthen, and enhance public access to electronic government information. Key issues to be discussed are: (1) the debate concerning public vs. private access to government information; (2) Does electronic access to government information eliminate the need for printed documents? and (3) Joint efforts -- when should the government team up with private sector allies to charge for information services and access

DOCS@RWU

HELIN Digital Commons

Searching with Tags: Do Tags Help Users Find Things?

Author: Campbell D. Grant
Kipp Margaret E.I.
Publication venue
Publication date: 01/01/2010
Field of study

This study examines the question of whether tags can be useful in the process of information retrieval. Participants searched a social bookmarking tool specialising in academic articles (CiteULike) and an online journal database (Pubmed). Participant actions were captured using screen capture software and they were asked to describe their search process. Users did make use of tags in their search process, as a guide to searching and as hyperlinks to potentially useful articles. However, users also made use of controlled vocabularies in the journal database to locate useful search terms and of links to related articles supplied by the database

E-LIS