Search CORE

44,939 research outputs found

Cross-lingual Word Clusters for Direct Transfer of Linguistic Structure

Author: McDonald Ryan
Täckström Oscar
Uszkoreit Jakob
Publication venue
Publication date: 01/01/2012
Field of study

It has been established that incorporating word cluster features derived from large unlabeled corpora can significantly improve prediction of linguistic structure. While previous work has focused primarily on English, we extend these results to other languages along two dimensions. First, we show that these results hold true for a number of languages across families. Second, and more interestingly, we provide an algorithm for inducing cross-lingual clusters and we show that features derived from these clusters significantly improve the accuracy of cross-lingual structure prediction. Specifically, we show that by augmenting direct-transfer systems with cross-lingual cluster features, the relative error of delexicalized dependency parsers, trained on English treebanks and transferred to foreign languages, can be reduced by up to 13%. When applying the same method to direct transfer of named-entity recognizers, we observe relative improvements of up to 26%

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Meta-Learning for Phonemic Annotation of Corpora

Author: Daelemans W.
Gillis S.
Hoste V.
Tjong Kim Sang E.F.
van den Bosch A.
Weigand H.
Publication venue
Publication date: 01/01/2000
Field of study

We apply rule induction, classifier combination and meta-learning (stacked classifiers) to the problem of bootstrapping high accuracy automatic annotation of corpora with pronunciation information. The task we address in this paper consists of generating phonemic representations reflecting the Flemish and Dutch pronunciations of a word on the basis of its orthographic representation (which in turn is based on the actual speech recordings). We compare several possible approaches to achieve the text-to-pronunciation mapping task: memory-based learning, transformation-based learning, rule induction, maximum entropy modeling, combination of classifiers in stacked learning, and stacking of meta-learners. We are interested both in optimal accuracy and in obtaining insight into the linguistic regularities involved. As far as accuracy is concerned, an already high accuracy level (93% for Celex and 86% for Fonilex at word level) for single classifiers is boosted significantly with additional error reductions of 31% and 38% respectively using combination of classifiers, and a further 5% using combination of meta-learners, bringing overall word level accuracy to 96% for the Dutch variant and 92% for the Flemish variant. We also show that the application of machine learning methods indeed leads to increased insight into the linguistic regularities determining the variation between the two pronunciation variants studied.Comment: 8 page

arXiv.org e-Print Archive

CiteSeerX

Ghent University Academic Bibliography

Archivsystem Ask23

Institutional Repository Universiteit Antwerpen

Tilburg University Repository

Machine Learning Theory and Practice as a Source of Insight into Universal Grammar

Author: Lappin Shalom
Shieber S
Publication venue
Publication date: 01/01/2007
Field of study

Articl

SAS-SPACE

Induction, complexity, and economic methodology

Author: Smith Peter
Publication venue
Publication date
Field of study

This paper focuses on induction, because the supposed weaknesses of that process are the main reason for favouring falsificationism, which plays an important part in scientific methodology generally; the paper is part of a wider study of economic methodology. The standard objections to, and paradoxes of, induction are reviewed, and this leads to the conclusion that the supposed ‘problem’ or ‘riddle’ of induction is a false one. It is an artefact of two assumptions: that the classic two-valued logic (CL) is appropriate for the contexts in which induction is relevant; and that it is the touchstone of rational thought. The status accorded to CL is the result of historical and cultural factors. The material we need to reason about falls into four distinct domains; these are explored in turn, while progressively relaxing the restrictions that are essential to the valid application of CL. The restrictions include the requirement for a pre-existing, independently-guaranteed classification, into which we can fit all new cases with certainty; and non-ambiguous relationships between antecedents and consequents. Natural kinds, determined by the existence of complex entities whose characteristics cannot be unbundled and altered in a piecemeal, arbitrary fashion, play an important part in the review; so also does fuzzy logic (FL). These are used to resolve two famous paradoxes about induction (the grue and raven paradoxes); and the case for believing that conventional logic is a subset of fuzzy logic is outlined. The latter disposes of all questions of justifying induction deductively. The concept of problem structure is used as the basis for a structured concept of rationality that is appropriate to all four of the domains mentioned above. The rehabilitation of induction supports an alternative definition of science: that it is the business of developing networks of contrastive, constitutive explanations of reproducible, inter-subjective (‘objective’) data. Social and psychological obstacles ensure the progress of science is slow and convoluted; however, the relativist arguments against such a project are rejected.induction; economics; methodology; complexity

Research Papers in Economics

Linguistic Modelling Using a semi-naive Bayes Framework

Author: Lawry J
Randon NJ
Publication venue
Publication date: 01/07/2002
Field of study

Explore Bristol Research

Apperceptive patterning: Artefaction, extensional beliefs and cognitive scaffolding

Author: Erkan Ekin
Publication venue
Publication date: 01/01/2020
Field of study

In “Psychopower and Ordinary Madness” my ambition, as it relates to Bernard Stiegler’s recent literature, was twofold: 1) critiquing Stiegler’s work on exosomatization and artefactual posthumanism—or, more specifically, nonhumanism—to problematize approaches to media archaeology that rely upon technical exteriorization; 2) challenging how Stiegler engages with Giuseppe Longo and Francis Bailly’s conception of negative entropy. These efforts were directed by a prevalent techno-cultural qualifier: the rise of Synthetic Intelligence (including neural nets, deep learning, predictive processing and Bayesian models of cognition). This paper continues this project but first directs a critical analytic lens at the Derridean practice of the ontologization of grammatization from which Stiegler emerges while also distinguishing how metalanguages operate in relation to object-oriented environmental interaction by way of inferentialism. Stalking continental (Kapp, Simondon, Leroi-Gourhan, etc.) and analytic traditions (e.g., Carnap, Chalmers, Clark, Sutton, Novaes, etc.), we move from artefacts to AI and Predictive Processing so as to link theories related to technicity with philosophy of mind. Simultaneously drawing forth Robert Brandom’s conceptualization of the roles that commitments play in retrospectively reconstructing the social experiences that lead to our endorsement(s) of norms, we compliment this account with Reza Negarestani’s deprivatized account of intelligence while analyzing the equipollent role between language and media (both digital and analog)

PhilPapers