475 research outputs found
Safeguarding Old and New Journal Tables for the VO: Status for Extragalactic and Radio Data
Independent of established data centers, and partly for my own research,
since 1989 I have been collecting the tabular data from over 2600 articles
concerned with radio sources and extragalactic objects in general. Optical
character recognition (OCR) was used to recover tables from 740 papers. Tables
from only 41 percent of the 2600 articles are available in the CDS or CATS
catalog collections, and only slightly better coverage is estimated for the NED
database. This fraction is not better for articles published electronically
since 2001. Both object databases (NED, SIMBAD, LEDA) as well as catalog
browsers (VizieR, CATS) need to be consulted to obtain the most complete
information on astronomical objects. More human resources at the data centers
and better collaboration between authors, referees, editors, publishers, and
data centers are required to improve data coverage and accessibility. The
current efforts within the Virtual Observatory (VO) project, to provide
retrieval and analysis tools for different types of published and archival data
stored at various sites, should be balanced by an equal effort to recover and
include large amounts of published data not currently available in this way.Comment: 11 pages, 4 figures; accepted for publication in Data Science
Journal, vol. 8 (2009), http://dsj.codataweb.org; presented at Special
Session "Astronomical Data and the Virtual Observatory" on the conference
"CODATA 21", Kiev, Ukraine, October 5-8, 200
A Machine Learning Approach to the Classification of Dialogue Utterances
The purpose of this paper is to present a method for automatic classification
of dialogue utterances and the results of applying that method to a corpus.
Superficial features of a set of training utterances (which we will call cues)
are taken as the basis for finding relevant utterance classes and for
extracting rules for assigning these classes to new utterances. Each cue is
assumed to partially contribute to the communicative function of an utterance.
Instead of relying on subjective judgments for the tasks of finding classes and
rules, we opt for using machine learning techniques to guarantee objectivity.Comment: 12 pages, using nemlap.sty, harvard.sty and agsm.bst, to appear in
Proceedings of NeMLaP-2, Bilkent University, Ankara, Turke
- …