Search CORE

9 research outputs found

Learning with noisy supervision for Spoken Language Understanding

Author
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Crossref

Improving Text Classification Accuracy by Training Label Cleaning

Author: Abney S.
Andrea Esuli
Brodley C. E.
Fabrizio Sebastiani
Freund Y.
Grady C.
Hersh W.
John G. H.
Maclin R.
Schapire R. E.
Shinnou H.
Snow R.
Yih W.-T.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

Recommended from our members

High-performance Word Sense Disambiguation with Less Manual Effort

Author: Dligach Dmitriy
Publication venue: CU Scholar
Publication date: 01/01/2010
Field of study

Supervised learning is a widely used paradigm in Natural Language Processing. This paradigm involves learning a classifier from annotated examples and applying it to unseen data. We cast word sense disambiguation, our task of interest, as a supervised learning problem. We then formulate the end goal of this dissertation: to develop a series of methods aimed at achieving the highest possible word sense disambiguation performance with the least reliance on manual effort. We begin by implementing a word sense disambiguation system, which utilizes rich linguistic features to better represent the contexts of ambiguous words. Our state-of-the-art system captures three types of linguistic features: lexical, syntactic, and semantic. Traditionally, semantic features are extracted with the help of expensive hand-crafted lexical resources. We propose a novel unsupervised approach to extracting a similar type of semantic information from unlabeled corpora. We show that incorporating this information into a classification framework leads to performance improvements. The result is a system that outperforms traditional methods while eliminating the reliance on manual effort for extracting semantic data. We then proceed by attacking the problem of reducing the manual effort from a different direction. Supervised word sense disambiguation relies on annotated data for learning sense classifiers. However, annotation is expensive since it requires a large time investment from expert labelers. We examine various annotation practices and propose several approaches for making them more efficient. We evaluate the proposed approaches and compare them to the existing ones. We show that the annotation effort can often be reduced significantly without sacrificing the performance of the models trained on the annotated data

CU Scholar Institutional Repository

Detecting Errors in Corpora Using Support Vector Machines

Author: Tetsuji Nakagawa
Yuji Matsumoto
Publication venue
Publication date: 01/01/2002
Field of study

While the corpus-based research relies on human annotated corpora, it is often said that a non-negligible amount of errors remain even in frequently used corpora such as Penn Treebank. Detection of errors in annotated corpora is important for corpus-based natural language processing. In this paper, we propose a method to detect errors in corpora using support vector machines (SVMs). This method is based on the idea of extracting exceptional elements that violate consistency. We propose a method of using SVMs to assign a weight to each element and to find errors in a POS tagged corpus. We apply the method to English and Japanese POS-tagged corpora and achieve high precision in detecting errors

CiteSeerX

Crossref

Stochastic chaos and thermodynamic phase transitions : theory and Bayesian estimation algorithms

Author: Deng Zhi-De
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (M. Eng. and S.B.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 177-200).The chaotic behavior of dynamical systems underlies the foundations of statistical mechanics through ergodic theory. This putative connection is made more concrete in Part I of this thesis, where we show how to quantify certain chaotic properties of a system that are of relevance to statistical mechanics and kinetic theory. We consider the motion of a particle trapped in a double-well potential coupled to a noisy environment. By use of the classic Langevin and Fokker-Planck equations, we investigate Kramers' escape rate problem. We show that there is a deep analogy between kinetic rate theory and stochastic chaos, for which we propose a novel definition. In Part II, we develop techniques based on Volterra series modeling and Bayesian non-linear filtering to distinguish between dynamic noise and measurement noise. We quantify how much of the system's ergodic behavior can be attributed to intrinsic deterministic dynamical properties vis-a-vis inevitable extrinsic noise perturbations.by Zhi-De Deng.M.Eng.and S.B

DSpace@MIT