Search CORE

15 research outputs found

NP Animacy Identification for Anaphora Resolution

Author: Evans R. J.
Orasan C.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2007
Field of study

In anaphora resolution for English, animacy identification can play an integral role in the application of agreement restrictions between pronouns and candidates, and as a result, can improve the accuracy of anaphora resolution systems. In this paper, two methods for animacy identification are proposed and evaluated using intrinsic and extrinsic measures. The first method is a rule-based one which uses information about the unique beginners in WordNet to classify NPs on the basis of their animacy. The second method relies on a machine learning algorithm which exploits a WordNet enriched with animacy information for each sense. The effect of word sense disambiguation on the two methods is also assessed. The intrinsic evaluation reveals that the machine learning method reaches human levels of performance. The extrinsic evaluation demonstrates that animacy identification can be beneficial in anaphora resolution, especially in the cases where animate entities are identified with high precision

arXiv.org e-Print Archive

Crossref

Wolverhampton Intellectual Repository and E-theses

RGCL at GermEval 2019: offensive language detection with deep learning

Author: Mitkov R
Orasan Constantin
Plum A
Ranasinghe Tharindu
Publication venue: German Society for Computational Linguistics & Language Technology
Publication date: 26/08/2019
Field of study

This paper describes the system submitted by the RGCL team to GermEval 2019 Shared Task 2: Identification of Offensive Language. We experimented with five different neural network architectures in order to classify Tweets in terms of offensive language. By means of comparative evaluation, we select the best performing for each of the three subtasks. Overall, we demonstrate that using only minimal preprocessing we are able to obtain competitive results

Wolverhampton Intellectual Repository and E-theses

Large-scale data harvesting for biographical data

Author: Mitkov R
Orasan Constantin
Plum Alistair
Wandl-Vogt Eveline
Zampieri Marcos
Publication venue: CEUR
Publication date: 01/01/2019
Field of study

This paper explores automatic methods to identify relevant biography candidates in large databases, and extract biographical information from encyclopedia entries and databases. In this work, relevant candidates are defined as people who have made an impact in a certain country or region within a pre-defined time frame. We investigate the case of people who had an impact in the Republic of Austria and died between 1951 and 2019. We use Wikipedia and Wikidata as data sources and compare the performance of our information extraction methods on these two databases. We demonstrate the usefulness of a natural language processing pipeline to identify suitable biography candidates and, in a second stage, extract relevant information about them. Even though they are considered by many as an identical resource, our results show that the data from Wikipedia and Wikidata differs in some cases and they can be used in a complementary way providing more data for the compilation of biographies

Open Repository and Bibliography - Luxembourg

Wolverhampton Intellectual Repository and E-theses

Measuring text simplification with the crowd

Author: Action Plain Language
Biran O.
Bott S.
Carroll J.
Clercq O. De
Feng L.
Gunning R.
Kincaid J. P.
Orasan C.
Yatskar M.
Štajner S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 04/12/2015
Field of study

Text can often be complex and difficult to read, especially for peo ple with cognitive impairments or low literacy skills. Text simplifi cation is a process that reduces the complexity of both wording and structure in a sentence, while retaining its meaning. However, this is currently a challenging task for machines, and thus, providing effective on-demand text simplification to those who need it re mains an unsolved problem. Even evaluating the simplicity of text remains a challenging problem for both computers, which cannot understand the meaning of text, and humans, who often struggle to agree on what constitutes a good simplification. This paper focuses on the evaluation of English text simplifica tion using the crowd. We show that leveraging crowds can result in a collective decision that is accurate and converges to a consen sus rating. Our results from 2,500 crowd annotations show that the crowd can effectively rate levels of simplicity. This may allow sim plification systems and system builders to get better feedback about how well content is being simplified, as compared to standard mea sures which classify content into ‘simplified ’ or ‘not simplified’ categories. Our study provides evidence that the crowd could be used to evaluate English text simplification, as well as to create simplified text in future work

CiteSeerX

Crossref

Are decision trees a feasible knowledge representation to guide extraction of critical information from randomized controlled trial reports?

Author: A Aguirre-Junco
A Geissbuhler
A Keech
A Taddio
Ad Hoc working group for Critical Appraisal of the Medical Literature
AD Oxman
C Orasan
CD Mulrow
D Demner-Fushman
DG Altman
DG Covell
DL Sackett
DM D'Alessandro
E Coiera
E Coiera
Enrico Coiera
F Salager-Meyer
G Georg
Grace Y Chung
GY Cheng
HS Sacks
I Sim
J Cohen
J Hartley
J Swales
JJ Cimino
JW Ely
JW Ely
K Fozi
KA L'Abbe
L McKnight
M Clarke
M Clarke
M Dawes
M Fiszman
M Hunink
MC Weinstein
MH Ebell
ML Chambliss
MY Tsay
N Elhadad
NC Ide
PJ Devereaux
R Xu
RB Haynes
RL Kane
S Teufel
SP Balasubramanian
W Hersh
WS Richardson
Y Niu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background This paper proposes the use of decision trees as the basis for automatically extracting information from published randomized controlled trial (RCT) reports. An exploratory analysis of RCT abstracts is undertaken to investigate the feasibility of using decision trees as a semantic structure. Quality-of-paper measures are also examined. Methods A subset of 455 abstracts (randomly selected from a set of 7620 retrieved from Medline from 1998 – 2006) are examined for the quality of RCT reporting, the identifiability of RCTs from abstracts, and the completeness and complexity of RCT abstracts with respect to key decision tree elements. Abstracts were manually assigned to 6 sub-groups distinguishing whether they were primary RCTs versus other design types. For primary RCT studies, we analyzed and annotated the reporting of intervention comparison, population assignment and outcome values. To measure completeness, the frequencies by which complete intervention, population and outcome information are reported in abstracts were measured. A qualitative examination of the reporting language was conducted. Results Decision tree elements are manually identifiable in the majority of primary RCT abstracts. 73.8% of a random subset was primary studies with a single population assigned to two or more interventions. 68% of these primary RCT abstracts were structured. 63% contained pharmaceutical interventions. 84% reported the total number of study subjects. In a subset of 21 abstracts examined, 71% reported numerical outcome values. Conclusion The manual identifiability of decision tree elements in the abstract suggests that decision trees could be a suitable construct to guide machine summarisation of RCTs. The presence of decision tree elements could also act as an indicator for RCT report quality in terms of completeness and uniformity.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Macquarie University ResearchOnline

Comparing pronoun resolution algorithms

Author: Aone C.
Bagga A.
Bagga A.
Baldwin B.
Barbu C.
Barbu C.
Barbu C.
Barbu C.
Bean D.
Biber D.
Boyd A.
Charniak E.
Dagan I.
Evans R.
Ferrandez A.
Gaizauskas R.
Ge N.
Kennedy C.
Lappin S.
Luo X.
McCord M.
McCord M.
Mitkov R.
Mitkov R.
Mitkov R.
Mitkov R.
Mitkov R.
Mitkov R.
Muller C.
Muller C.
Ng V.
Orasan C.
Orasan C.
Preiss J.
Preiss J.
Preiss J.
Saiz-Noeda M.
Schiehlen M.
Strube M.
Stuckardt R.
Stuckardt R.
Stuckardt R.
Tanev H.
Tapanainen P.
Tetreault J.
Trouilleux F.
Tutin A.
Vilain M.
Publication venue: 'Wiley'
Publication date: 01/01/2007
Field of study

This paper discusses the comparative evaluation of five well-known pronoun resolution algorithms conducted with the help of a purpose-built tool for consistent evaluation in anaphora resolution, termed the evaluation workbench. The workbench enables the evaluation and comparison of pronoun resolution algorithms on the basis of the same preprocessing tools and test data. The tool is controlled by the user who can conduct the evaluation according to a variety of parameters, with regard to the types of anaphors and the samples used for evaluation. The extensive comparative evaluation of the pronoun resolution algorithms showed that their performance was significantly lower than the figures reported in the original papers describing the algorithms. The evaluation study concluded that the main reason for this drop in performance is the fact that all algorithms operate in a fully automatic mode

Crossref

Open Research Online

Wolverhampton Intellectual Repository and E-theses

The QALL-ME Framework: A Specifiable-Domain Multilingual Question Answering Architecture.

Author: B. Magnini
C. Orasan
C. Spurk
D. Tomas
G. Neumann
I. Dornescu
J. L. Vicedo
M. Kouylekov
M. Negri
O. Ferrandez
R. Izquierdo
S. Ferrandez
Publication venue
Publication date: 01/01/2011
Field of study

This paper presents the QALL-ME Framework, a reusable architecture for building multilingual Question Answering (QA) systems working on structured data. The framework is released as free open source software with a set of demo components and extensive documentation. As main characteristics of the QALL-ME Framework we point out: (i) the framework domain portability, achieved by an ontology modelling of the target domain; (ii) the context awareness regarding space and time of the question; (iii) the use of textual entailment engines as the core of the question interpretation; and (iv) the framework’s Service Oriented Architecture, which is realized using interchangeable web services. Furthermore we present a running example to clarify how the framework processes questions as well as a case study that successfully shows a QA application built with the QALL-ME Framework for cinema/movie events in the tourism domain

Crossref

Archivio della ricerca - Fondazione Bruno Kessler

SkipCor: Skip-Mention Coreference Resolution Using Linear-Chain Conditional Random Fields

Author: C Orasan
DC Wimalasuriya
E Fosler-Lussier
GA Miller
Lovro Šubelj
Marko Bajec
N Nguyen
Neil R. Smalheiser
S Huang
S Sarawagi
Slavko Žitnik
WM Soon
Publication venue: 'Public Library of Science (PLoS)'
Publication date
Field of study

Crossref

Sentence retrieval for abstracts of randomized controlled trials

Author: A Keech
A McCallum
AD Oxman
B Settle
C Orasan
C Sporleder
C Sutton
D Demner-Fushman
D Marcu
D Moher
DG Covell
DL Sackett
DM D'Alessandro
F Salanger-Meyer
Grace Y Chung
GY Chung
I Sim
I Tbahriti
J Lafferty
J Lin
J Swales
JW Ely
K Hirohata
L McKnight
M Dawes
M Shimbo
MY Tsay
P Ruch
R P
R Xu
R Xu
W Mann
WS Richardson
Y Tsuruoka
Y Yamamoto
Publication venue: BMC
Publication date: 01/02/2009
Field of study

Abstract Background The practice of evidence-based medicine (EBM) requires clinicians to integrate their expertise with the latest scientific research. But this is becoming increasingly difficult with the growing numbers of published articles. There is a clear need for better tools to improve clinician's ability to search the primary literature. Randomized clinical trials (RCTs) are the most reliable source of evidence documenting the efficacy of treatment options. This paper describes the retrieval of key sentences from abstracts of RCTs as a step towards helping users find relevant facts about the experimental design of clinical studies. Method Using Conditional Random Fields (CRFs), a popular and successful method for natural language processing problems, sentences referring to Intervention, Participants and Outcome Measures are automatically categorized. This is done by extending a previous approach for labeling sentences in an abstract for general categories associated with scientific argumentation or rhetorical roles: Aim, Method, Results and Conclusion. Methods are tested on several corpora of RCT abstracts. First structured abstracts with headings specifically indicating <it>Intervention</it>, <it>Participant </it>and <it>Outcome Measures </it>are used. Also a manually annotated corpus of structured and unstructured abstracts is prepared for testing a classifier that identifies sentences belonging to each category. Results Using CRFs, sentences can be labeled for the four rhetorical roles with <it>F</it>-scores from 0.93–0.98. This outperforms the use of Support Vector Machines. Furthermore, sentences can be automatically labeled for <it>Intervention</it>, <it>Participant </it>and <it>Outcome Measures</it>, in unstructured and structured abstracts where the section headings do not specifically indicate these three topics. <it>F</it>-scores of up to 0.83 and 0.84 are obtained for <it>Intervention </it>and <it>Outcome Measure </it>sentences. Conclusion Results indicate that some of the methodological elements of RCTs are identifiable at the sentence level in both structured and unstructured abstract reports. This is promising in that sentences labeled automatically could potentially form concise summaries, assist in information retrieval and finer-grained extraction.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Human Hair Outer Root Sheath Cells and Platelet-Lysis Exosomes Promote Hair Inductivity of Dermal Papilla Cell

Author: A Owczarczyk-Saczonek
AC Gupta
ADF Ferreira
AJ Stefanis
BA Morgan
BS Park
BY Choi
CC Yang
CH Won
E Kalabusheva
FJ Vizoso
GK Epstein
H Fukuoka
H Mansoor
H Wolff
HE Abaci
J Lee
J Qiao
J Wang
JA Pawitan
JP Cole
K Godse
L Chen
M Nilforoushzadeh
M Ohyama
MC Barsotti
MS Orasan
N Bakhtyar
N Zhu
P Gentile
P Rosati
R Alves
R Lehmann
R Sharma
RR Driskell
S Patel
SS Khatu
SS Rho
T Tong
X Wang
ZJ Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref