1,451 research outputs found

    Improving the Performance of a Tagger Generator in an Information Extraction Application

    Get PDF
    In this paper we present an experience in the extraction of named entities from Spanish texts using stacking. Named Entity Extraction (NEE) is a subtask of Information Extraction that involves the identification of groups of words that make up the name of an entity, and the classification of these names into a set of predefined categories. Our approach is corpus-based, we use a re-trainable tagger generator to obtain a named entity extractor from a set of tagged examples. The main contribution of our work is that we obtain the systems needed in a stacking scheme without making use of any additional training material or tagger generators. Instead of it, we have generated the variability needed in stacking by applying corpus transformation to the original training corpus. Once we have several versions of the training corpus we generate several extractors and combine them by means of a machine learning algorithm. Experiments show that the combination of corpus transformation and stacking improve the performance of the tagger generator in this kind of natural language processing applications. The best of our experiments achieves an improvement of more than six percentual points respect to the predefined baseline

    Applying Stacking and Corpus Transformation to a Chunking Task

    Get PDF
    In this paper we present an application of the stacking technique to a chunking task: named entity recognition. Stacking consists in applying machine learning techniques for combining the results of different models. Instead of using several corpus or several tagger generators to obtain the models needed in stacking, we have applied three transformations to a single training corpus and then we have used the four versions of the corpus to train a single tagger generator. Taking as baseline the results obtained with the original corpus (Fβ=1 value of 81.84), our experiments show that the three transformations improve this baseline (the best one reaches 84.51), and that applying stacking also improves this baseline reaching an Fβ=1 measure of 88.43

    A Survey of Paraphrasing and Textual Entailment Methods

    Full text link
    Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

    Measuring the Top Yukawa Coupling at 100 TeV

    Full text link
    We propose a measurement of the top Yukawa coupling at a 100 TeV hadron collider, based on boosted Higgs and top decays. We find that the top Yukawa coupling can be measured to 1%, with excellent handles for reducing systematic and theoretical uncertainties, both from side bands and from ttˉH/ttˉZt\bar{t}H/t\bar{t}Z ratios.Comment: v2: expanded contents and authorshi

    A facility to Search for Hidden Particles (SHiP) at the CERN SPS

    Get PDF
    A new general purpose fixed target facility is proposed at the CERN SPS accelerator which is aimed at exploring the domain of hidden particles and make measurements with tau neutrinos. Hidden particles are predicted by a large number of models beyond the Standard Model. The high intensity of the SPS 400~GeV beam allows probing a wide variety of models containing light long-lived exotic particles with masses below O{\cal O}(10)~GeV/c2^2, including very weakly interacting low-energy SUSY states. The experimental programme of the proposed facility is capable of being extended in the future, e.g. to include direct searches for Dark Matter and Lepton Flavour Violation.Comment: Technical Proposa

    Dictionary writing system (DWS) plus corpus query package (CQP): the case of TshwaneLex

    Get PDF
    In this article the integrated corpus query functionality of the dictionary compilation software TshwanelLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as Such the encouraging outcomes of this study are far-reaching
    corecore