Search CORE

26,549 research outputs found

GenomeVIP: A cloud platform for genomic variant discovery and interpretation

Author: Chen Ken
DeNardo Erin
Ding Li
Fenyö David
Handsaker Robert E
Huang Kuan-lin
Koboldt Daniel C
Mashl R. Jay
Niu Beifang
Raphael Benjamin J
Scott Adam D
Wendl Michael C
Wyczalkowski Matthew A
Ye Kai
Yellapantula Venkata D
Yoon Christopher J
Publication venue: Digital Commons@Becker
Publication date: 01/01/2017
Field of study

Quantity superlatives in Germanic, or, ‘Life on the fault line between adjective and determiner'

Author: Coppock Elizabeth
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 22/03/2018
Field of study

This paper concerns the superlative forms of the words many, much, few, and little, and their equivalents in other Germanic languages (German, Dutch, Swedish, Norwegian, Danish, Dalecarlian, Icelandic, and Faroese). It demonstrates that every possible relationship between definiteness and interpretation is attested. It also demonstrates that agreement mismatches are found with relative readings and with proportional readings, but different kinds of agreement mismatches in each case. One consistent pattern is that a quantity superlative with adverbial morphology and neuter singular agreement features is used with relative superlatives. On the other hand, quantity superlatives with proportional readings always agree in number. I conclude that quantity superlatives are not structurally analogous to quality superlatives on either relative or proportional readings, but they depart from a plain attributive structure in different ways. On relative readings they can be akin to pseudopartitives (as in a cup of tea), while proportional readings are more closely related to partitives (as in a piece of the cake). More specifically, I suggest that the agreement features of a superlative exhibits depend on the domain from which the target is drawn (the target-domain hypothesis). When the target is a degree, as it is with adverbial superlatives and certain relative superlatives, default neuter singular emerges. Definiteness there is driven by the same process that drives definiteness with adverbial superlatives. With proportional readings, the target argument of the superlative is a subpart or subset of the domain indicated by the substance noun, hence number agreement. Subtle aspects of how the comparison class and the superlative marker are construed determine definiteness for proportional readings.http://eecoppock.info/germanic.pdfAccepted manuscrip

Boston University Institutional Repository (OpenBU)

A Large-Scale Comparison of Historical Text Normalization Systems

Author: Bollmann Marcel
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character-based statistical machine translation, and neural encoder--decoder models, but studies have used different datasets, different evaluation methods, and have come to different conclusions. This paper presents the largest study of historical text normalization done so far. We critically survey the existing literature and report experiments on eight languages, comparing systems spanning all categories of proposed normalization techniques, analysing the effect of training data quantity, and using different evaluation methods. The datasets and scripts are made publicly available.Comment: Accepted at NAACL 201

arXiv.org e-Print Archive

Crossref

Publikationer från Linköpings universitet

Copenhagen University Research Information System

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

Author: Jones Gareth J.F.
Leveling Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2010
Field of study

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

Irish Universities

DCU Online Research Access Service

AFRILEX 2002: 7th international conference of the African Association for Lexicography: Culture and dictionaries: programme and abstracts

Author: de Schryver Gilles-Maurice
Publication venue: (SF)2 Press
Publication date: 01/01/2002
Field of study

Ghent University Academic Bibliography

AAA: Fair Evaluation for Abuse Detection Systems Wanted

Author: Bevilacqua Michele
Calabrese Agostina
Navigli Roberto
Ross Björn
Tripodi Rocco
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 21/06/2021
Field of study

Edinburgh Research Explorer

AHPSort: an AHP-based method for sorting problems

Author: Ishizaka Alessio
Nemery P.
Pearman C.
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2012
Field of study

Portsmouth University Research Portal (Pure)

Apportioning Development Effort in a Probabilistic LR Parsing System through Evaluation

Author: Briscoe Ted
Carroll John
Publication venue
Publication date: 01/01/1996
Field of study

We describe an implemented system for robust domain-independent syntactic parsing of English, using a unification-based grammar of part-of-speech and punctuation labels coupled with a probabilistic LR parser. We present evaluations of the system's performance along several different dimensions; these enable us to assess the contribution that each individual part is making to the success of the system as a whole, and thus prioritise the effort to be devoted to its further enhancement. Currently, the system is able to parse around 80% of sentences in a substantial corpus of general text containing a number of distinct genres. On a random sample of 250 such sentences the system has a mean crossing bracket rate of 0.71 and recall and precision of 83% and 84% respectively when evaluated against manually-disambiguated analyses.Comment: 10 pages, 1 Postscript figure. To Appear in Proceedings of the Conference on Empirical Methods in Natural Language Processing, University of Pennsylvania, May 199

arXiv.org e-Print Archive

CiteSeerX

Sussex Research Online