Search CORE

15,128 research outputs found

Using distributional similarity to organise biomedical terminology

Author: Dowdall James
Keller Bill
Schneider Gerold
Weeds Julie
Weir David
Publication venue: 'John Benjamins Publishing Company'
Publication date: 01/01/2005
Field of study

We investigate an application of distributional similarity techniques to the problem of structural organisation of biomedical terminology. Our application domain is the relatively small GENIA corpus. Using terms that have been accurately marked-up by hand within the corpus, we consider the problem of automatically determining semantic proximity. Terminological units are dened for our purposes as normalised classes of individual terms. Syntactic analysis of the corpus data is carried out using the Pro3Gres parser and provides the data required to calculate distributional similarity using a variety of dierent measures. Evaluation is performed against a hand-crafted gold standard for this domain in the form of the GENIA ontology. We show that distributional similarity can be used to predict semantic type with a good degree of accuracy

ZORA

Sussex Research Online

Resources for Evaluation of Summarization Techniques

Author: Kan Min-Yen
Klavans Judith L.
Lee Susan
McKeown Kathleen R.
Publication venue
Publication date: 01/01/1998
Field of study

We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st

arXiv.org e-Print Archive

CiteSeerX

Recommended from our members

Developing professionalism in new IT graduates? Who needs it?

Author: Bromley Kay S.
Hinton Jacky M.
Palmer Mark I.
Parr Sue M.
Rae Simon A.
Streater Kevin
Publication venue
Publication date: 16/02/2011
Field of study

A new graduate may require a period of ‘acclimatisation’ through a process of ‘developing their professionalism’ to fit into their work environment. The e-Skills UK Technology Counts Insights 2010 report suggests that 110,500 new entrants a year are required to fill IT & Telecoms professional job roles, with 20,800 coming from education (predominantly graduate level and higher). However, 43% of recruiters were reporting a lack of suitable candidates for IT & Telecoms posts where growing importance will be placed on relationship management, business process analysis and design, project and programme management. IT & Telecoms professionals are increasingly expected to be multi-skilled, with sophisticated business and interpersonal skills as well as technical competence. As the report also says: ‘UK growth will continue to be primarily in high-value roles with an increasing need for customer and business-oriented skills as well as sophisticated technical competencies.’ The diverse needs and requirements of the IT sector, as specified by various employer groups and professional bodies including BCS, IET, eSkills, the CBI and the SFIA Foundation, are discussed. According to the CBI, ‘62% of entrants to the IT sector need to draw on managerial and professional business skills almost immediately.’ For organisations to succeed, their IT graduate recruits must supplement their IT skills with managerial and professional business skills. Well considered CPD will ensure that recent graduates can enhance their ‘academic’ skills with the necessary work-based skills for the benefit of both themselves and their new employer. The focus of the improvement will balance the student-centred needs for development and the engaging employer’s commercial needs

Open Research Online (The Open University)

Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

Author: A Roberts
A Shah
Aleksandar Savkov
B Efron
G Hripcsak
G Savova
J Cohen
J Foster
J-W Fan
Jackie Cassell
John Carroll
K Verspoor
KH Krippendorff
LK Tanabe
M Bada
MP Marcus
Rob Koeling
S Abney
W Sun
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

Crossref

Springer - Publisher Connector

PubMed Central

Sussex Research Online

Drawing Elena Ferrante's Profile. Workshop Proceedings, Padova, 7 September 2017

Author: Arjuna Tuzzi
Michele A. Cortelazzo
Publication venue: place:Padova
Publication date: 01/01/2018
Field of study

Elena Ferrante is an internationally acclaimed Italian novelist whose real identity has been kept secret by E/O publishing house for more than 25 years. Owing to her popularity, major Italian and foreign newspapers have long tried to discover her real identity. However, only a few attempts have been made to foster a scientific debate on her work. In 2016, Arjuna Tuzzi and Michele Cortelazzo led an Italian research team that conducted a preliminary study and collected a well-founded, large corpus of Italian novels comprising 150 works published in the last 30 years by 40 different authors. Moreover, they shared their data with a select group of international experts on authorship attribution, profiling, and analysis of textual data: Maciej Eder and Jan Rybicki (Poland), Patrick Juola (United States), Vittorio Loreto and his research team, Margherita Lalli and Francesca Tria (Italy), George Mikros (Greece), Pierre Ratinaud (France), and Jacques Savoy (Switzerland). The chapters of this volume report the results of this endeavour that were first presented during the international workshop Drawing Elena Ferrante's Profile in Padua on 7 September 2017 as part of the 3rd IQLA-GIAT Summer School in Quantitative Analysis of Textual Data. The fascinating research findings suggest that Elena Ferrante\u2019s work definitely deserves \u201cmany hands\u201d as well as an extensive effort to understand her distinct writing style and the reasons for her worldwide success

Archivio istituzionale della ricerca - Università di Padova