Search CORE

12,059 research outputs found

Recommended from our members

Beyond definition: Organising semantic information in bilingual dictionaries

Author: Fraser BL
Publication venue: INT J LEXICOGR
Publication date: 01/03/2008
Field of study

This paper considers the process of organising semantic information in bilingual dictionaries with diachronic coverage, from selecting the textual source-material to designing the entries. The discussion centres on practical aspects of ancient Greek lexicography. First, the traditional semantic frameworks are described. Then, more recent approaches are noted, notably those of Adrados and of Chadwick, both of which aim to integrate contextual data within a semantic framework. Since the relevance of contextual information varies with lemma part of speech, different configurations are required for entries describing nouns, adjectives, and verbs. These are illustrated by three entries from a Greek-English dictionary currently being written at Cambridge. In order to organise data to this level of specificity, stylistic templates are indispensable, and digital software provides a means of providing them. However, systems designed for writing new dictionaries require different features from those designed for encoding pre-existing texts. A description is given of how the lexicographic requirements of the Cambridge dictionary were met by a user-designed system

Apollo (Cambridge)

Morphological paradigms in language processing and language disorders

Author: Harald Clahsen
Harald Clahsen
Ingrid Sonnenstuhl
Meike Hadler
Sonja Eisenbeiss
Publication venue: Essex Research Reports in Linguistics
Publication date: 01/01/2000
Field of study

We present results from two cross-modal morphological priming experiments investigating regular person and number inflection on finite verbs in German. We found asymmetries in the priming patterns between different affixes that can be predicted from the structure of the paradigm. We also report data from language disorders which indicate that inflectional errors produced by language-impaired adults and children tend to occur within a given paradigm dimension, rather than randomly across the paradigm. We conclude that morphological paradigms are used by the human language processor and can be systematically affected in language disorders

University of Essex Research Repository

CiteSeerX

Kölner UniversitätsPublikationsServer

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

Lexical typology : a programmatic sketch

Author: Behrens Leila
Sasse Hans-Jürgen
Publication venue
Publication date: 01/01/1997
Field of study

The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar

Hochschulschriftenserver - Universität Frankfurt am Main

Sharing Cultural Heritage: the Clavius on the Web Project

Author: Abrate Matteo
Del Grosso Angelo Mario
Giovannetti Emiliano
Lo Duca Angelica
Luzzi Damiana
MANCINI LORENZO
Marchetti Andrea
Pedretti Irene
Piccini Silvia
Publication venue
Publication date: 01/01/2014
Field of study

In the last few years the amount of manuscripts digitized and made available on the Web has been constantly increasing. However, there is still a considarable lack of results concerning both the explicitation of their content and the tools developed to make it available. The objective of the Clavius on the Web project is to develop a Web platform exposing a selection of Christophorus Clavius letters along with three different levels of analysis: linguistic, lexical and semantic. The multilayered annotation of the corpus involves a XML-TEI encoding followed by a tokenization step where each token is univocally identified through a CTS urn notation and then associated to a part-of-speech and a lemma. The text is lexically and semantically annotated on the basis of a lexicon and a domain ontology, the former structuring the most relevant terms occurring in the text and the latter representing the domain entities of interest (e.g. people, places, etc.). Moreover, each entity is connected to linked and non linked resources, including DBpedia and VIAF. Finally, the results of the three layers of analysis are gathered and shown through interactive visualization and storytelling techniques. A demo version of the integrated architecture was developed

Archivio della ricerca- Università di Roma La Sapienza

Ontologies and Information Extraction

Author: Nazarenko Adeline
Nédellec Claire
Publication venue
Publication date: 01/01/2005
Field of study

This report argues that, even in the simplest cases, IE is an ontology-driven process. It is not a mere text filtering method based on simple pattern matching and keywords, because the extracted pieces of texts are interpreted with respect to a predefined partial domain model. This report shows that depending on the nature and the depth of the interpretation to be done for extracting the information, more or less knowledge must be involved. This report is mainly illustrated in biology, a domain in which there are critical needs for content-based exploration of the scientific literature and which becomes a major application domain for IE

arXiv.org e-Print Archive

HAL Descartes

HAL-Paris 13

Evaluating phonological status : significance of paradigm uniformity vs. prosodic group effects

Author: Brinckmann Caren
Raffelsiefen Renate
Publication venue
Publication date: 28/04/2009
Field of study

A central concern of linguistic phonetics is to define criteria for determining the phonological status of sounds or sound properties observed in phonetic surface form. Based on acoustic measurements we show that the occurrence of syllabic sonorants vs. schwa-sonorant sequences in German is determined exclusively by segmental and prosodic structure, with no paradigm uniformity effects. We argue that these findings are consistent with a uniform representation of syllabic sonorants as schwa sonorant sequences in the lexicon. The stability of schwa in CVC-suffixes (e.g. the German diminutive suffix -chen), as opposed to its phonetic absence in a segmentally comparable underived context, is argued to be conditioned by the prosodic organisation of such suffixes external to the phonological word of the stem

Hochschulschriftenserver - Universität Frankfurt am Main

Mapping the Changes in the Mental Lexicon of Pre-Intermediate Learners of English

Author: Dóczi Brigitta
Publication venue
Publication date: 01/01/2012
Field of study

ELTE Digital Institutional Repository (EDIT)

Effects of Lexical Class and Word Frequency on the L1 and L2 English-Based Lexical Connections

Author: Zareva Alla
Publication venue: ODU Digital Commons
Publication date: 01/01/2011
Field of study

Three groups of participants—L1 speakers of English, L2 advanced, and intermediate users of English—responded in writing to a word association test containing words balanced for lexical class (nouns, verbs, adjectives) and frequency of occurrence (high, mid, low). The questions addressed in the study concerned the way two word-related factors (i.e., lexical category and word frequency) interplayed with two learner-related characteristics (i.e., proficiency and word familiarity) and influenced 1) the participants’ knowledge of vocabulary, 2) their preference to build specific types of lexical connections among the words they know, and 3) their ability to maintain networks of associations as an indicator of the connectivity of their lexicons. The findings revealed a complex picture of interactions between the word-related and learner-related factors but, whenever the effects of the variables could be disentangled, proficiency and lexical class had a stronger influence on the organization of the L1 and L2 lexicons than word frequency alone

Old Dominion University

Implanting Rational Knowledge into Distributed Representation at Morpheme Level

Author: Lin Zi
Liu Yang
Publication venue
Publication date: 26/11/2018
Field of study

Previously, researchers paid no attention to the creation of unambiguous morpheme embeddings independent from the corpus, while such information plays an important role in expressing the exact meanings of words for parataxis languages like Chinese. In this paper, after constructing the Chinese lexical and semantic ontology based on word-formation, we propose a novel approach to implanting the structured rational knowledge into distributed representation at morpheme level, naturally avoiding heavy disambiguation in the corpus. We design a template to create the instances as pseudo-sentences merely from the pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical information and tackle the data sparseness problem, the instance proliferation technique is applied based on similarity to expand the collection of pseudo-sentences. The distributed representation for morphemes can then be trained on these pseudo-sentences using word2vec. For evaluation, we validate the paradigmatic and syntagmatic relations of morpheme embeddings, and apply the obtained embeddings to word similarity measurement, achieving significant improvements over the classical models by more than 5 Spearman scores or 8 percentage points, which shows very promising prospects for adoption of the new source of knowledge.Comment: AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications