Search CORE

38,718 research outputs found

Word graphs: The third set

Author: Hoede C.
Zhang Lei
Publication venue: Department of Applied Mathematics, University of Twente
Publication date: 01/01/2000
Field of study

This is the third paper in a series of natural language processing in term of knowledge graphs. A word is a basic unit in natural language processing. This is why we study word graphs. Word graphs were already built for prepositions and adwords (including adjectives, adverbs and Chinese quantity words) in two other papers. In this paper, we propose the concept of the logic word and classify logic words into groups in terms of semantics and the way they are used in describing reasoning processes. A start is made with the building of the lexicon of logic words in terms of knowledge graphs

University of Twente Research Information

Why We Still Need Knowledge of Language

Author: Smith Barry
Publication venue
Publication date: 01/01/2006
Field of study

Articl

PhilPapers

SAS-SPACE

Producing power-law distributions and damping word frequencies with two-stage language models

Author: Goldwater Sharon
Griffiths Thomas L.
Johnson Mark
Publication venue
Publication date: 01/01/2011
Field of study

Standard statistical models of language fail to capture one of the most striking properties of natural languages: the power-law distribution in the frequencies of word tokens. We present a framework for developing statisticalmodels that can generically produce power laws, breaking generativemodels into two stages. The first stage, the generator, can be any standard probabilistic model, while the second stage, the adaptor, transforms the word frequencies of this model to provide a closer match to natural language. We show that two commonly used Bayesian models, the Dirichlet-multinomial model and the Dirichlet process, can be viewed as special cases of our framework. We discuss two stochastic processes-the Chinese restaurant process and its two-parameter generalization based on the Pitman-Yor process-that can be used as adaptors in our framework to produce power-law distributions over word frequencies. We show that these adaptors justify common estimation procedures based on logarithmic or inverse-power transformations of empirical frequencies. In addition, taking the Pitman-Yor Chinese restaurant process as an adaptor justifies the appearance of type frequencies in formal analyses of natural language and improves the performance of a model for unsupervised learning of morphology.48 page(s

Edinburgh Research Explorer

Macquarie University ResearchOnline

An Essentialist Theory of the Meaning of Slurs

Author: Neufeld Eleonore
Publication venue
Publication date: 01/01/2019
Field of study

In this paper, I develop an essentialist model of the semantics of slurs. I defend the view that slurs are a species of kind terms: Slur concepts encode mini-theories which represent an essence-like element that is causally connected to a set of negatively-valenced stereotypical features of a social group. The truth-conditional contribution of slur nouns can then be captured by the following schema: For a given slur S of a social group G and a person P, S is true of P iff P bears the “essence” of G—whatever this essence is—which is causally responsible for stereotypical negative features associated with G and predicted of P. Since there is no essence that is causally responsible for stereotypical negative features of a social group, slurs have null-extension, and consequently, many sentences containing them are either meaningless or false. After giving a detailed outline of my theory, I show that it receives strong linguistic support. In particular, it can account for a wide range of linguistic cases that are regarded as challenging, central data for any theory of slurs. Finally, I show that my theory also receives convergent support from cognitive psychology and psycholinguistics

PhilPapers

Dependency parsing of Turkish

Author: Eryigit Gulsen
Eryiğit Gülşen
Nivre Joakim
Oflazer Kemal
Publication venue: 'MIT Press - Journals'
Publication date: 01/09/2006
Field of study

The suitability of different parsing methods for different languages is an important topic in syntactic parsing. Especially lesser-studied languages, typologically different from the languages for which methods have originally been developed, poses interesting challenges in this respect. This article presents an investigation of data-driven dependency parsing of Turkish, an agglutinative free constituent order language that can be seen as the representative of a wider class of languages of similar type. Our investigations show that morphological structure plays an essential role in finding syntactic relations in such a language. In particular, we show that employing sublexical representations called inflectional groups, rather than word forms, as the basic parsing units improves parsing accuracy. We compare two different parsing methods, one based on a probabilistic model with beam search, the other based on discriminative classifiers and a deterministic parsing strategy, and show that the usefulness of sublexical units holds regardless of parsing method.We examine the impact of morphological and lexical information in detail and show that, properly used, this kind of information can improve parsing accuracy substantially. Applying the techniques presented in this article, we achieve the highest reported accuracy for parsing the Turkish Treebank

CiteSeerX

Crossref

Sabanci University Research Database

Chinese Function Tag Labeling

Author: Sui Zhifang
Sun Weiwei
Publication venue: City University of Hong Kong
Publication date: 01/01/2009
Field of study

PACLIC 23 / City University of Hong Kong / 3-5 December 200

Waseda University Repository

Modelling the Lexicon in Unsupervised Part of Speech Induction

Author: Blunsom Phil
Dubbin Greg
Publication venue
Publication date: 01/01/2014
Field of study

Automatically inducing the syntactic part-of-speech categories for words in text is a fundamental task in Computational Linguistics. While the performance of unsupervised tagging models has been slowly improving, current state-of-the-art systems make the obviously incorrect assumption that all tokens of a given word type must share a single part-of-speech tag. This one-tag-per-type heuristic counters the tendency of Hidden Markov Model based taggers to over generate tags for a given word type. However, it is clearly incompatible with basic syntactic theory. In this paper we extend a state-of-the-art Pitman-Yor Hidden Markov Model tagger with an explicit model of the lexicon. In doing so we are able to incorporate a soft bias towards inducing few tags per type. We develop a particle filter for drawing samples from the posterior of our model and present empirical results that show that our model is competitive with and faster than the state-of-the-art without making any unrealistic restrictions.Comment: To be presented at the 14th Conference of the European Chapter of the Association for Computational Linguistic

arXiv.org e-Print Archive

CiteSeerX

Crossref

Oxford University Research Archive

Review of Laurence R. Horn and Yasuhiko Kato (eds) (2000) Negation and polarity: syntactic and semantic perspectives. (Oxford University Press.)

Author: Rowlett PA
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2002
Field of study

University of Salford Institutional Repository

Crossref