Search CORE

49 research outputs found

A Dataset and Evaluation Metrics for Abstractive Compression of Sentences and Short Paragraphs

Author: Amershi S.
Brockett C.
Toutanova K.
Tran K.M.
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2016
Field of study

International Migration, Integration and Social Cohesion online publications

Methods and algorithms for unsupervised learning of morphology

Author: A. Gelbukh
A. Gispert de
B. Can
C. Monson
D. Blackwell
D. Harman
D.R. Morrison
E. Arısoy
E. Minkov
H. Ishwaran
H. Poon
H. Poon
J. Goldsmith
K. Järvelin
K. Kettunen
K. Kirchhoff
K. Sirts
K. Toutanova
L. Aunimo
M. Creutz
M. Kurimo
M.A. Hafer
M.R. Brent
N.A. Smith
P.F. Brown
R. Krovetz
S. Bordag
S. Manandhar
S. Neuvel
Z.S. Harris
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

This is an accepted manuscript of a chapter published by Springer in Computational Linguistics and Intelligent Text Processing. CICLing 2014. Lecture Notes in Computer Science, vol 8403 in 2014 available online: https://doi.org/10.1007/978-3-642-54906-9_15 The accepted version of the publication may differ from the final published version.This paper is a survey of methods and algorithms for unsupervised learning of morphology. We provide a description of the methods and algorithms used for morphological segmentation from a computational linguistics point of view. We survey morphological segmentation methods covering methods based on MDL (minimum description length), MLE (maximum likelihood estimation), MAP (maximum a posteriori), parametric and non-parametric Bayesian approaches. A review of the evaluation schemes for unsupervised morphological segmentation is also provided along with a summary of evaluation results on the Morpho Challenge evaluations.Published versio

Crossref

Wolverhampton Intellectual Repository and E-theses

A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Answering Confucius: The Reason Why We Complicate

Author: K. Toutanova
R. Navigli
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

English nominal compound detection with Wikipedia-based methods?

Author: F. Bonin
I.A. Sag
J.D. Lafferty
J.R. Finkel
K. Toutanova
Publication venue: Axel Springer
Publication date: 01/01/2013
Field of study

Crossref

SZTE Publicatio Repozitórium - SZTE - Repository of Publications

SUBTLEX-UK: a new and improved word frequency database for British English

Author: Baayen R. H.
Cuetos F.
Dimitropoulou M.
Kuperman V.
Kučera H.
Toutanova K.
Zipf G. K.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2014
Field of study

We present word frequencies based on subtitles of British television programmes. We show that the SUBTLEX-UK word frequencies explain more of the variance in the lexical decision times of the British Lexicon Project than the word frequencies based on the British National Corpus and the SUBTLEX-US frequencies. In addition to the word form frequencies, we also present measures of contextual diversity part-of-speech specific word frequencies, word frequencies in children programmes, and word bigram frequencies, giving researchers of British English access to the full range of norms recently made available for other languages. Finally, we introduce a new measure of word frequency, the Zipf scale, which we hope will stop the current misunderstandings of the word frequency effect

Nottingham ePrints

Nottingham eTheses

Crossref

Repository@Nottingham

Ghent University Academic Bibliography