Search CORE

1,451 research outputs found

Improving the Performance of a Tagger Generator in an Information Extraction Application

Author: Cañete Valdeón José Miguel
Cruz Mata Fermín
Enríquez de Salamanca Ros Fernando
Ortega Rodríguez Francisco Javier
Troyano Jiménez José Antonio
Publication venue: Graz University of Technology, Institut für Informations systeme und Computer Medien (IICM)
Publication date: 01/01/2007
Field of study

In this paper we present an experience in the extraction of named entities from Spanish texts using stacking. Named Entity Extraction (NEE) is a subtask of Information Extraction that involves the identification of groups of words that make up the name of an entity, and the classification of these names into a set of predefined categories. Our approach is corpus-based, we use a re-trainable tagger generator to obtain a named entity extractor from a set of tagged examples. The main contribution of our work is that we obtain the systems needed in a stacking scheme without making use of any additional training material or tagger generators. Instead of it, we have generated the variability needed in stacking by applying corpus transformation to the original training corpus. Once we have several versions of the training corpus we generate several extractors and combine them by means of a machine learning algorithm. Experiments show that the combination of corpus transformation and stacking improve the performance of the tagger generator in this kind of natural language processing applications. The best of our experiments achieves an improvement of more than six percentual points respect to the predefined baseline

idUS. Depósito de Investigación Universidad de Sevilla

Recommended from our members

Proceedings of QG2010: The Third Workshop on Question Generation

Author: Boyer Kristy Elizabeth
Piwek Paul
Publication venue: questiongeneration.org
Publication date: 18/06/2010
Field of study

These are the peer-reviewed proceedings of "QG2010, The Third Workshop on Question Generation". The workshop included a special track for "QGSTEC2010: The First Question Generation Shared Task and Evaluation Challenge". QG2010 was held as part of The Tenth International Conference on Intelligent Tutoring Systems (ITS2010)

Open Research Online (The Open University)

Applying Stacking and Corpus Transformation to a Chunking Task

Author: Carrillo Montero Vicente
Cruz Mata Fermín
Díaz Madrigal Víctor Jesús
Enríquez de Salamanca Ros Fernando
Troyano Jiménez José Antonio
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

In this paper we present an application of the stacking technique to a chunking task: named entity recognition. Stacking consists in applying machine learning techniques for combining the results of different models. Instead of using several corpus or several tagger generators to obtain the models needed in stacking, we have applied three transformations to a single training corpus and then we have used the four versions of the corpus to train a single tagger generator. Taking as baseline the results obtained with the original corpus (Fβ=1 value of 81.84), our experiments show that the three transformations improve this baseline (the best one reaches 84.51), and that applying stacking also improves this baseline reaching an Fβ=1 measure of 88.43

idUS. Depósito de Investigación Universidad de Sevilla

A Survey of Paraphrasing and Textual Entailment Methods

Author: Androutsopoulos Ion
Malakasiotis Prodromos
Publication venue: 'AI Access Foundation'
Publication date: 30/05/2010
Field of study

Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of natural language expressions, such that a human who reads (and trusts) the first element of a pair would most likely infer that the other element is also true. Paraphrasing can be seen as bidirectional textual entailment and methods from the two areas are often similar. Both kinds of methods are useful, at least in principle, in a wide range of natural language processing applications, including question answering, summarization, text generation, and machine translation. We summarize key ideas from the two areas by considering in turn recognition, generation, and extraction methods, also pointing to prominent articles and resources.Comment: Technical Report, Natural Language Processing Group, Department of Informatics, Athens University of Economics and Business, Greece, 201

arXiv.org e-Print Archive

Crossref

Measuring the Top Yukawa Coupling at 100 TeV

Author: Mangano Michelangelo L.
Plehn Tilman
Reimitz Peter
Schell Torben
Shao Hua-Sheng
Publication venue: 'IOP Publishing'
Publication date: 29/07/2015
Field of study

We propose a measurement of the top Yukawa coupling at a 100 TeV hadron collider, based on boosted Higgs and top decays. We find that the top Yukawa coupling can be measured to 1%, with excellent handles for reducing systematic and theoretical uncertainties, both from side bands and from

t\bar{t}H/t\bar{t}Z

ratios.Comment: v2: expanded contents and authorshi

arXiv.org e-Print Archive

CERN Document Server

A facility to Search for Hidden Particles (SHiP) at the CERN SPS

Author: Alaoui M. A. El
Anelli M.
Aoki S.
Arduini G.
Back J. J.
Bagulya A.
Baldini W.
Baranov A.
Barker G. J.
Barsuk S.
Battistin M.
Bauche J.
Bay A.
Bayliss V.
Bellagamba L.
Bencivenni G.
Bertani M.
Bezshyyko O.
Bick D.
Bingefors N.
Blondel A.
Bogomilov M.
Bonacorsi D.
Bondarenko D.
Bonivento W.
Borburgh J.
Boyarsky A.
Bradshaw T.
Brenner R.
Breton D.
Brook N.
Bruschi M.
Buonaura A.
Buontempo S.
Cadeddu S.
Calcaterra A.
Calviani M.
Campanelli M.
Capoccia C.
Cecchetti A.
Chatterjee A.
Chauveau J.
Chepurnov A.
Chernyavskiy M.
Ciambrone P.
Cicalo C.
Conti G.
Cornelis K.
Courthold M.
D'Ambrosio N.
Dallavalle M. G.
De Lellis G.
De Serio M.
Dedenko L.
Di Crescenzo A.
Di Marco N.
Dib C.
Dietrich J.
Dijkstra H.
Domenici D.
Donskov S.
Druzhkin D.
Ebert J.
Egede U.
Egorov A.
Egorychev V.
Enik T.
Etenko A.
Fabbri F.
Fabbri L.
Fedorova G.
Felici G.
Ferro-Luzzi M.
Fini R. A.
Franke M.
Fraser M.
Galati G.
Giacobbe B.
Goddard B.
Golinka-Bezshyyko L.
Golubkov D.
Golutvin A.
Gorbunov D.
Graverini E.
Grenard J-L
Guler A. M.
Hagner C.
Hakobyan H.
Helo J. C.
Horvath D.
Iacovacci M.
Iaselli G.
Jacobsson R.
Kadenko I.
Kamiscioglu C.
Kamiscioglu M.
Khaustov G.
Khotjansev A.
Kilminster B.
Kim V.
Kitagawa N.
Kodama K.
Kolesnikov A.
Kolev D.
Komatsu M.
Konovalova N.
Koretskiy S.
Korolko I.
Korzenev A.
Kovalenko S.
Kudenko Y.
Kuznetsova E.
Lacker H.
Lai A.
Lanfranchi G.
Lauria A.
Lebbolo H.
Levy J. -M.
Lista L.
Loverre P.
Lukiashin A.
Lyubovitskij V. E.
Malinin A.
Manfredi M.
Marrone A.
Matev R.
Mermod P.
Messomo E. N.
Mikado S.
Mikhaylov Yu.
Miller J.
Milstead D.
Mineev O.
Mingazheva R.
Mitselmakher G.
Miyanishi M.
Monacelli P.
Montanari A.
Montesi M. C.
Morello G.
Morishima K.
Movtchan S.
Murzin V.
Naganawa N.
Naka T.
Nakamura M.
Nakano T.
Nurakhov N.
Obinyakov B.
Ocalan K.
Ogawa S.
Oreshkin V.
Orlov A.
Osborne J.
Pacholek P.
Panman J.
Paoloni A.
Paparella L.
Pastore A.
Patel M.
Perillo-Marcone A.
Petridis K.
Petrushin M.
Poli-Lener M.
Polukhina N.
Polyakov V.
Prokudin M.
Puddu G.
Pupilli F.
Rademakers F.
Rakai A.
Rawlings T.
Redi F.
Ricciardi S.
Rinaldesi R.
Roganova T.
Rogozhnikov A.
Rokujo H.
Romaniouk A.
Rosa G.
Rostovtseva I.
Rovelli T.
Ruchayskiy O.
Ruf T.
Saitta G.
Samoylenko V.
Samsonov V.
Saputi A.
Sato O.
Schmidt-Parzefall W.
Serra N.
Sgobba S.
Shaposhnikov M.
Shatalov P.
Shaykhiev A.
Shchutska L.
Shevchenko V.
Shibuya H.
SHiP Collaboration
Shitov Y.
Silverstein S.
Simone S.
Skorokhvatov M.
Smirnov S.
Solodko E.
Sosnovtsev V.
Spighi R.
Spinetti M.
Starkov N.
Storaci B.
Strabel C.
Strolin P.
Takahashi S.
Teterin P.
Tioukov V.
Tommasini D.
Treille D.
Tsenov R.
Tshchedrina T.
Ull A. Sanz
Ustyuzhanin A.
van Herwijnen E.
Vankova-Kirilova G.
Vannucci F.
Venturi V.
Villa M.
Vincke Heinz
Vincke Helmut
Vladymyrov M.
Xella S.
Yalvac M.
Yershov N.
Yilmaz D.
Yilmazer A. U.
Zaitsev Y.
Zoccoli A.
Publication venue
Publication date: 08/04/2015
Field of study

A new general purpose fixed target facility is proposed at the CERN SPS accelerator which is aimed at exploring the domain of hidden particles and make measurements with tau neutrinos. Hidden particles are predicted by a large number of models beyond the Standard Model. The high intensity of the SPS 400~GeV beam allows probing a wide variety of models containing light long-lived exotic particles with masses below

{\cal O}

(10)~GeV/c

^2

, including very weakly interacting low-energy SUSY states. The experimental programme of the proposed facility is capable of being extended in the future, e.g. to include direct searches for Dark Matter and Lepton Flavour Violation.Comment: Technical Proposa

arXiv.org e-Print Archive

OPUS

CERN Document Server

Dictionary writing system (DWS) plus corpus query package (CQP): the case of TshwaneLex

Author: DE PAUW Guy
de Schryver Gilles-Maurice
Publication venue
Publication date: 01/01/2007
Field of study

In this article the integrated corpus query functionality of the dictionary compilation software TshwanelLex is analysed. Attention is given to the handling of both raw corpus data and annotated corpus data. With regard to the latter it is shown how, with a minimum of human effort, machine learning techniques can be employed to obtain part-of-speech tagged corpora that can be used for lexicographic purposes. All points are illustrated with data drawn from English and Northern Sotho. The tools and techniques themselves, however, are language-independent, and as Such the encouraging outcomes of this study are far-reaching

Ghent University Academic Bibliography