Search CORE

1,774 research outputs found

Rapid Development of Morphological Descriptions for Full Language Processing Systems

Author: Carter David
Publication venue
Publication date: 01/01/1995
Field of study

I describe a compiler and development environment for feature-augmented two-level morphology rules integrated into a full NLP system. The compiler is optimized for a class of languages including many or most European ones, and for rapid development and debugging of descriptions of new languages. The key design decision is to compose morphophonological and morphosyntactic information, but not the lexicon, when compiling the description. This results in typical compilation times of about a minute, and has allowed a reasonably full, feature-based description of French inflectional morphology to be developed in about a month by a linguist new to the system.Comment: 8 pages, LaTeX (2.09 preferred); eaclap.sty; Procs of Euro ACL-9

arXiv.org e-Print Archive

CiteSeerX

Crossref

A Formal Framework for Linguistic Annotation

Author: Bird Steven
Liberman Mark
Publication venue
Publication date: 01/01/1999
Field of study

`Linguistic annotation' covers any descriptive or analytic notations applied to raw language data. The basic data may be in the form of time functions -- audio, video and/or physiological recordings -- or it may be textual. The added notations may include transcriptions of all sorts (from phonetic features to discourse structures), part-of-speech and sense tagging, syntactic analysis, `named entity' identification, co-reference annotation, and so on. While there are several ongoing efforts to provide formats and tools for such annotations and to publish annotated linguistic databases, the lack of widely accepted standards is becoming a critical problem. Proposed standards, to the extent they exist, have focussed on file formats. This paper focuses instead on the logical structure of linguistic annotations. We survey a wide variety of existing annotation formats and demonstrate a common conceptual core, the annotation graph. This provides a formal framework for constructing, maintaining and searching linguistic annotations, while remaining consistent with many alternative data structures and file formats.Comment: 49 page

arXiv.org e-Print Archive

CiteSeerX

ScholarlyCommons@Penn

GreekLex 2: a comprehensive lexical database with part-of-speech, syllabic, phonological, and stress information

Author: A Botinis
A Grimani
A Protopapas
A Protopapas
A Protopapas
A Protopapas
A Protopapas
Antonios Kyparissiadis
B New
C Burani
E Keuleers
E Selkirk
G Babiniotis
G Mikros
J Arciuli
J Arciuli
J Arciuli
J Yang
JL Fleiss
K Rastle
L Colombo
M Brysbaert
M Brysbaert
M Carreiras
M Coltheart
M Conrad
M Conrad
M Dimitropoulou
M Gimenes
M Ktori
M Triantafyllidis
M Triantafyllidis
M Tzakosta
Manuel Perea
MJ Hofmann
N Hatzigeorgiu
Nicola J. Pitchford
NO Schiller
O Jouravlev
P Brown
P Gakis
P Monaghan
Q Cai
RH Baayen
RH Baayen
S Mathey
T Yarkoni
Timothy Ledgeway
Walter J. B. van Heuven
WJB Van Heuven
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 23/02/2017
Field of study

Databases containing lexical properties on any given orthography are crucial for psycholinguistic research. In the last ten years, a number of lexical databases have been developed for Greek. However, these lack important part-of-speech information. Furthermore, the need for alternative procedures for calculating syllabic measurements and stress information, as well as combination of several metrics to investigate linguistic properties of the Greek language are highlighted. To address these issues, we present a new extensive lexical database of Modern Greek (GreekLex 2) with part-of-speech information for each word and accurate syllabification and orthographic information predictive of stress, as well as several measurements of word similarity and phonetic information. The addition of detailed statistical information about Greek part-of-speech, syllabification, and stress neighbourhood allowed novel analyses of stress distribution within different grammatical categories and syllabic lengths to be carried out. Results showed that the statistical preponderance of stress position on the pre-final syllable that is reported for Greek language is dependent upon grammatical category. Additionally, analyses showed that a proportion higher than 90% of the tokens in the database would be stressed correctly solely by relying on stress neighbourhood information. The database and the scripts for orthographic and phonological syllabification as well as phonetic transcription are available at http://www.psychology.nottingham.ac.uk/greeklex/

Nottingham ePrints

Public Library of Science (PLOS)

Nottingham eTheses

Crossref

Repository@Nottingham

Directory of Open Access Journals

PubMed Central

FigShare

Numerical orthographic coding: merging Open Bigrams and Spatial Coding theories

Author: Courrieu Pierre
Madec Sylvain
Rey Arnaud
Publication venue: HAL CCSD
Publication date: 23/04/2019
Field of study

Simple numerical versions of the Spatial Coding and of the Open Bigrams coding of character strings are presented, together with a natural merging of these two approaches. Comparing the predictive performance of these three orthographic coding schemes on orthographic masked priming data, we observe that the merged coding scheme always provides the best fits. Testing the ability of the orthographic codes, used as regressors, to capture relevant regularities in lexical decision data, we also observe that the merged code provides the best fits and that both the spatial coding component and the open bigrams component provide specific and significant contributions. This gives us a new lighting on probable mechanisms involved in orthographic coding, together with new tools for modelling behavioural and electrophysiological data collected in word recognition tasks

Gene/protein name recognition based on support vector machine using dictionary as features

Author: Doi Hirohumi
Doi Kouichi
Fation Sevrani
Mitsumori Tomohiro
Murata Masaki
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

Subspace procrustes analysis

Author: Angulo Bahón Cecilio
De La Torre Fernando
Escalera Sergio
Igual Laura
Perez-Sala Xavier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Postprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC