12,059 research outputs found
Recommended from our members
Beyond definition: Organising semantic information in bilingual dictionaries
This paper considers the process of organising semantic information in bilingual dictionaries with diachronic coverage, from selecting the textual source-material to designing the entries. The discussion centres on practical aspects of ancient Greek lexicography. First, the traditional semantic frameworks are described. Then, more recent approaches are noted, notably those of Adrados and of Chadwick, both of which aim to integrate contextual data within a semantic framework. Since the relevance of contextual information varies with lemma part of speech, different configurations are required for entries describing nouns, adjectives, and verbs. These are illustrated by three entries from a Greek-English dictionary currently being written at Cambridge. In order to organise data to this level of specificity, stylistic templates are indispensable, and digital software provides a means of providing them. However, systems designed for writing new dictionaries require different features from those designed for encoding pre-existing texts. A description is given of how the lexicographic requirements of the Cambridge dictionary were met by a user-designed system
Morphological paradigms in language processing and language disorders
We present results from two cross-modal morphological priming experiments investigating regular person and number inflection on finite verbs in German. We found asymmetries in the priming patterns between different affixes that can be predicted from the structure of the paradigm. We also report data from language disorders which indicate that inflectional errors produced by language-impaired adults and children tend to occur within a given paradigm dimension, rather than randomly across the paradigm. We conclude that morphological paradigms are used by the human language processor and can be systematically affected in language disorders
From Frequency to Meaning: Vector Space Models of Semantics
Computers understand very little of the meaning of human language. This
profoundly limits our ability to give instructions to computers, the ability of
computers to explain their actions to us, and the ability of computers to
analyse and process text. Vector space models (VSMs) of semantics are beginning
to address these limits. This paper surveys the use of VSMs for semantic
processing of text. We organize the literature on VSMs according to the
structure of the matrix in a VSM. There are currently three broad classes of
VSMs, based on term-document, word-context, and pair-pattern matrices, yielding
three classes of applications. We survey a broad range of applications in these
three categories and we take a detailed look at a specific open source project
in each category. Our goal in this survey is to show the breadth of
applications of VSMs for semantics, to provide a new perspective on VSMs for
those who are already familiar with the area, and to provide pointers into the
literature for those who are less familiar with the field
Lexical typology : a programmatic sketch
The present paper is an attempt to lay the foundation for Lexical Typology as a new kind of linguistic typology.1 The goal of Lexical Typology is to investigate crosslinguistically significant patterns of interaction between lexicon and grammar
Sharing Cultural Heritage: the Clavius on the Web Project
In the last few years the amount of manuscripts digitized and made available on the Web has been constantly increasing. However, there is still a considarable lack of results concerning both the explicitation of their content and the tools developed to make it available. The objective of the Clavius on the Web project is to develop a Web platform exposing a selection of Christophorus Clavius letters along with three different levels of analysis: linguistic, lexical and semantic. The multilayered annotation of the corpus involves a XML-TEI encoding followed by a tokenization step where each token is univocally identified through a CTS urn notation and then associated to a part-of-speech and a lemma. The text is lexically and semantically annotated on the basis of a lexicon and a domain ontology, the former structuring the most relevant terms occurring in the text and the latter representing the domain entities of interest (e.g. people, places, etc.). Moreover, each entity is connected to linked and non linked resources, including DBpedia and VIAF. Finally, the results of the three layers of analysis are gathered and shown through interactive visualization and storytelling techniques. A demo version of the integrated architecture was developed
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Evaluating phonological status : significance of paradigm uniformity vs. prosodic group effects
A central concern of linguistic phonetics is to define criteria for determining the phonological status of sounds or sound properties observed in phonetic surface form. Based on acoustic measurements we show that the occurrence of syllabic sonorants vs. schwa-sonorant sequences in German is determined exclusively by segmental and prosodic structure, with no paradigm uniformity effects. We argue that these findings are consistent with a uniform representation of syllabic sonorants as schwa sonorant sequences in the lexicon. The stability of schwa in CVC-suffixes (e.g. the German diminutive suffix -chen), as opposed to its phonetic absence in a segmentally comparable underived context, is argued to be conditioned by the prosodic organisation of such suffixes external to the phonological word of the stem
Effects of Lexical Class and Word Frequency on the L1 and L2 English-Based Lexical Connections
Three groups of participants—L1 speakers of English, L2 advanced, and intermediate users of English—responded in writing to a word association test containing words balanced for lexical class (nouns, verbs, adjectives) and frequency of occurrence (high, mid, low). The questions addressed in the study concerned the way two word-related factors (i.e., lexical category and word frequency) interplayed with two learner-related characteristics (i.e., proficiency and word familiarity) and influenced 1) the participants’ knowledge of vocabulary, 2) their preference to build specific types of lexical connections among the words they know, and 3) their ability to maintain networks of associations as an indicator of the connectivity of their lexicons. The findings revealed a complex picture of interactions between the word-related and learner-related factors but, whenever the effects of the variables could be disentangled, proficiency and lexical class had a stronger influence on the organization of the L1 and L2 lexicons than word frequency alone
Implanting Rational Knowledge into Distributed Representation at Morpheme Level
Previously, researchers paid no attention to the creation of unambiguous
morpheme embeddings independent from the corpus, while such information plays
an important role in expressing the exact meanings of words for parataxis
languages like Chinese. In this paper, after constructing the Chinese lexical
and semantic ontology based on word-formation, we propose a novel approach to
implanting the structured rational knowledge into distributed representation at
morpheme level, naturally avoiding heavy disambiguation in the corpus. We
design a template to create the instances as pseudo-sentences merely from the
pieces of knowledge of morphemes built in the lexicon. To exploit hierarchical
information and tackle the data sparseness problem, the instance proliferation
technique is applied based on similarity to expand the collection of
pseudo-sentences. The distributed representation for morphemes can then be
trained on these pseudo-sentences using word2vec. For evaluation, we validate
the paradigmatic and syntagmatic relations of morpheme embeddings, and apply
the obtained embeddings to word similarity measurement, achieving significant
improvements over the classical models by more than 5 Spearman scores or 8
percentage points, which shows very promising prospects for adoption of the new
source of knowledge.Comment: AAAI 201
- …