Search CORE

119 research outputs found

A Support Tool for Tagset Mapping

Author: Teufel Simone
Publication venue
Publication date: 01/01/1995
Field of study

Many different tagsets are used in existing corpora; these tagsets vary according to the objectives of specific projects (which may be as far apart as robust parsing vs. spelling correction). In many situations, however, one would like to have uniform access to the linguistic information encoded in corpus annotations without having to know the classification schemes in detail. This paper describes a tool which maps unstructured morphosyntactic tags to a constraint-based, typed, configurable specification language, a ``standard tagset''. The mapping relies on a manually written set of mapping rules, which is automatically checked for consistency. In certain cases, unsharp mappings are unavoidable, and noise, i.e. groups of word forms {\sl not} conforming to the specification, will appear in the output of the mapping. The system automatically detects such noise and informs the user about it. The tool has been tested with rules for the UPenn tagset \cite{up} and the SUSANNE tagset \cite{garside}, in the framework of the EAGLES\footnote{LRE project EAGLES, cf. \cite{eagles}.} validation phase for standardised tagsets for European languages.Comment: EACL-Sigdat 95, contains 4 ps figures (minor graphic changes

arXiv.org e-Print Archive

CiteSeerX

Argumentative zoning information extraction from scientific text

Author: Teufel Simone
Publication venue: The University of Edinburgh
Publication date: 01/01/1999
Field of study

Let me tell you, writing a thesis is not always a barrel of laughs—and strange things can happen, too. For example, at the height of my thesis paranoia, I had a re-current dream in which my cat Amy gave me detailed advice on how to restructure the thesis chapters, which was awfully nice of her. But I also had a lot of human help throughout this time, whether things were going fine or beserk. Most of all, I want to thank Marc Moens: I could not have had a better or more knowledgable supervisor. He always took time for me, however busy he might have been, reading chapters thoroughly in two days. He both had the calmness of mind to give me lots of freedom in research, and the right judgement to guide me away, tactfully but determinedly, from the occasional catastrophe or other waiting along the way. He was great fun to work with and also became a good friend. My work has profitted from the interdisciplinary, interactive and enlightened atmosphere at the Human Communication Centre and the Centre for Cognitive Science (which is now called something else). The Language Technology Group was a great place to work in, as my research was grounded in practical applications develope

CiteSeerX

Edinburgh Research Archive

Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017)

Author: Dietmar Wolfram
Mayr Philipp
Nakov Preslav I
Simone Teufel
Publication venue
Publication date: 08/06/2017
Field of study

The large scale of scholarly publications poses a challenge for scholars in information seeking and sensemaking. Bibliometrics, information retrieval (IR), text mining and NLP techniques could help in these search and look-up activities, but are not yet widely used. This workshop is intended to stimulate IR researchers and digital library professionals to elaborate on new approaches in natural language processing, information retrieval, scientometrics, text mining and recommendation techniques that can advance the state-of-the-art in scholarly document understanding, analysis, and retrieval at scale. The BIRNDL workshop at SIGIR 2017 will incorporate an invited talk, paper sessions and the third edition of the Computational Linguistics (CL) Scientific Summarization Shared Task.Comment: 2 pages, workshop paper accepted at the SIGIR 201

arXiv.org e-Print Archive

Crossref

Whose idea was this, and why does it matter? Attributing scientific work to citations

Author: Siddharthan Advaith
Teufel Simone
Publication venue
Publication date: 01/01/2007
Field of study

Scientific papers revolve around citations, and for many discourse level tasks one needs to know whose work is being talked about at any point in the discourse. In this paper, we introduce the scientific attribution task, which links different linguistic expressions to citations. We discuss the suitability of different evaluation metrics and evaluate our classification approach to deciding attribution both intrinsically and in an extrinsic evaluation where information about scientific attribution is shown to improve performance on Argumentative Zoning, a rhetorical classification task

CiteSeerX

Open Research Online (The Open University)

An annotation scheme for citation function

Author: Siddharthan Advaith
Teufel Simone
Tidhar Dan
Publication venue
Publication date: 01/01/2006
Field of study

We study the interplay of the discourse structure of a scientific argument with formal citations. One subproblem of this is to classify academic citations in scientific articles according to their rhetorical function, e.g., as a rival approach, as a part of the solution, or as a flawed approach that justifies the current research. Here, we introduce our annotation scheme with 12 categories, and present an agreement study

Crossref

Open Research Online (The Open University)

An overview of evaluation methods in TREC ad hoc information retrieval and TREC question answering

Author: Simone Teufel
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2007
Field of study

Abstract This chapter gives an overview of the current evaluation strategies and problems in the fields of information retrieval (IR) and question answering (QA), as instantiated in the Text Retrieval Conference (TREC). Whereas IR has a long tradition as a task, QA is a relatively new task which had to quickly develop its evaluation metrics, based on experiences gained in IR. This chapter will contrast the two tasks, their difficulties, and their evaluation metrics. We will end this chapter by pointing out limitations of the current evaluation strategies and potential future developments

CiteSeerX

Corpora for the conceptualisation and zoning of scientific papers

Author: Batchelor Colin
Liakata Maria
Siddharthan Advaith
Teufel Simone
Publication venue
Publication date: 01/01/2010
Field of study

We present two complementary annotation schemes for sentence based annotation of full scientific papers, CoreSC and AZ-II, which have been applied to primary research articles in chemistry. The AZ scheme is based on the rhetorical structure of a scientific paper and follows the knowledge claims made by the authors. It has been shown to be reliably annotated by independent human coders and has proven useful for various information access tasks. AZ-II is its extended version, which has been successfully applied to chemistry. The CoreSC scheme takes a different view of scientific papers, treating them as the humanly readable representations of scientific investigations. It therefore seeks to retrieve the structure of the investigation from the paper as generic high-level Core Scientific Concepts (CoreSC). CoreSCs have been annotated by 16 chemistry experts over a total of 265 full papers in physical chemistry and biochemistry. We describe the differences and similarities between the two schemes in detail and present the two corpora produced using each scheme. There are 36 shared papers in the corpora, which allows us to quantitatively compare aspects of the annotation schemes. We show the correlation between the two schemes, their strengths and weaknesses and discuss the benefits of combining a rhetorical based analysis of the papers with a content-based one

CiteSeerX

Open Research Online (The Open University)

Identifying problems and solutions in scientific text.

Author: Heffernan Kevin
Teufel Simone
Publication venue: Scientometrics
Publication date: 06/04/2018
Field of study

Research is often described as a problem-solving activity, and as a result, descriptions of problems and solutions are an essential part of the scientific discourse used to describe research activity. We present an automatic classifier that, given a phrase that may or may not be a description of a scientific problem or a solution, makes a binary decision about problemhood and solutionhood of that phrase. We recast the problem as a supervised machine learning problem, define a set of 15 features correlated with the target categories and use several machine learning algorithms on this task. We also create our own corpus of 2000 positive and negative examples of problems and solutions. We find that we can distinguish problems from non-problems with an accuracy of 82.3%, and solutions from non-solutions with an accuracy of 79.7%. Our three most helpful features for the task are syntactic information (POS tags), document and word embeddings

Crossref

Apollo (Cambridge)

Recommended from our members

MEAD - A Platform for Multidocument Multilingual Text Summarization

Author: Allison Timothy
Blair-Goldensohn Sasha
Blitzer John
Celebi Arda
Dimitrov Stanko
Drabek Elliott
Hakim Ali
Lam Wai
Liu Danyu
Otterbacher Jahna
Qi Hong
Radev Dragomir R.
Saggion Horacio
Teufel Simone
Topper Michael
Winkel Adam
Zhang Zhu
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2004
Field of study

This paper describes the functionality of MEAD, a comprehensive, public domain, open source, multidocument multilingual summarization environment that has been thus far downloaded by more than 500 organizations. MEAD has been used in a variety of summarization applications ranging from summarization for mobile devices to Web page summarization within a search engine and to novelty detection

Columbia University Academic Commons

Predicting the impact of scientific concepts using full‐text features

Author: Barrio Pablo
Biran Or
Bothe Suvarna
Chaturvedi Snigdha
Collins Michael
Daume Hal
Fleischmann Kenneth R.
Gravano Luis
Jha Rahul
King Ben
McInerney Kevin
McKeown Kathy
Moon Taesun
Neelakantan Arvind
O’Seaghdha Diarmuid
Paparrizos John
Radev Dragomir
Templeton Clay
Teufel Simone
Thadani Kapil
Publication venue: 'Wiley'
Publication date: 11/01/2016
Field of study

Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/134425/1/asi23612.pd

Crossref

Deep Blue Documents at the University of Michigan