345 research outputs found
Optimising Selective Sampling for Bootstrapping Named Entity Recognition
Training a statistical named entity recognition system in a new domain requires costly manual annotation of large quantities of in-domain data. Active learning promises to reduce the annotation cost by selecting only highly informative data points. This paper is concerned with a real active learning experiment to bootstrap a named entity recognition system for a new domain of radio astronomical abstracts. We evaluate several committee-based metrics for quantifying the disagreement between classifiers built using multiple views, and demonstrate that the choice of metric can be optimised in simulation experiments with existing annotated data from different domains. A final evaluation shows that we gained substantial savings compared to a randomly sampled baseline. 1
Grounding Gene Mentions with Respect to Gene Database Identifiers
We describe our submission for task 1B of the BioCreAtIvE competition which is concerned with grounding gene mentions with respect to databases of organism gene identifiers. Several approaches to gene identification, lookup, and disambiguation are presented. Results are presented with two possible baseline systems and a discussion of the source of precision and recall errors as well as an estimate of precision and recall for an organism-specific tagger bootstrapped from gene synonym lists and the task 1B training data. 1
Quantum error correction of systematic errors using a quantum search framework
Composite pulses are a quantum control technique for canceling out systematic
control errors. We present a new composite pulse sequence inspired by quantum
search. Our technique can correct a wider variety of systematic errors --
including, for example, nonlinear over-rotational errors -- than previous
techniques. Concatenation of the pulse sequence can reduce a systematic error
to an arbitrarily small level.Comment: 6 pages, 2 figure
Solving tile drainage problems by using model data
Our purpose in this bulletin is to report, to analyze, and to use in problem solving, extensive model data of tile drainage of land.
The data were obtained with a glassbead-glycerol model (Grover et al., 1960; Grover and Kirkham, 1961) and inc1ude: (a) values of depths and of corresponding times of fall of the surface of saturation to these depths at various distances from the drain tubes and (b) values of the drain tube discharge rates. The zero reference time for the fall of the surface of saturation and also for the discharge rate is the instant at which the surface of saturation passes through the simulated soil surface from a ponded condition.
Models were made of 109 different combinations of drain depth, drain spacing and soil stratification. For each of these 109 model conditions, the surfaces of saturation were photographed at about eight different depths through the transparent front face of the model. Photographs were read under a magnifying glass to obtain distances and times of fall. Times were obtained from a clock that was started at the zero reference time and photographed with the water tables
Sentence classification experiments for legal text summarisation
Abstract. We describe experiments in building a classifier which determines the rhetorica
A rhetorical status classifier for legal text summarisation
We describe a classifier which determines the rhetorical status of sentences in texts from a corpus of judgments of the UK House of Lords. Our summarisation system is based on the work of Teufel and Moens where sentences are classified for rhetorical status to aid sentence selection. We experiment with a variety of linguistic features with results comparable to Teufel and Moens, thereby demonstrating the feasibility of porting this kind of system to a new domain.
Summarising Legal Texts: Sentential Tense and Argumentative Roles
We report on the SUM project which applies automatic summarisation techniques to the legal domain. We pursue a methodology based on Teufel and Moens (2002) where sentences are classified according to their argumentative role. We describe some experiments with judgments of the House of Lords where we have performed automatic linguistic annotation of a small sample set in order to explore correlations between linguistic features and argumentative roles. We use state-of-the-art NLP techniques to perform the linguistic annotation using XML-based tools and a combination of rulebased and statistical methods. We focus here on the predictive capacity of tense and aspect features for a classifier
The HOLJ corpus: supporting summarisation of legal texts
We describe an XML-encoded corpus of texts in the legal domain which was gathered for an automatic summarisation project. We describe two distinct layers of annotation: manual annotation of the rhetorical status of sentences and an entirely automatic annotation process incorporating a host of individual linguistic processors. The manual rhetorical status annotation has been developed as training and testing material for a summarisation system based on the work of Teufel and Moens, while the automatic layer of annotation encodes linguistic information as features for a machine learning approach to rhetorical status classification. 1 Project Overvie
- …