Search CORE

7 research outputs found

Mistake-Driven Learning in Text Categorization

Author: Dagan Ido
Karov Yael
Roth Dan
Publication venue
Publication date: 01/01/1997
Field of study

Learning problems in the text processing domain often map the text to a space whose dimensions are the measured features of the text, e.g., its words. Three characteristic properties of this domain are (a) very high dimensionality, (b) both the learned concepts and the instances reside very sparsely in the feature space, and (c) a high variation in the number of active features in an instance. In this work we study three mistake-driven learning algorithms for a typical task of this nature -- text categorization. We argue that these algorithms -- which categorize documents by learning a linear separator in the feature space -- have a few properties that make them ideal for this domain. We then show that a quantum leap in performance is achieved when we further modify the algorithms to better address some of the specific characteristics of the domain. In particular, we demonstrate (1) how variation in document length can be tolerated by either normalizing feature weights or by using negative weights, (2) the positive effect of applying a threshold range in training, (3) alternatives in considering feature frequency, and (4) the benefits of discarding features while training. Overall, we present an algorithm, a variation of Littlestone's Winnow, which performs significantly better than any other algorithm tested on this task using a similar feature set.Comment: 9 pages, uses aclap.st

arXiv.org e-Print Archive

CiteSeerX

Similarity-based word sense disambiguation

Author: Shimon Edelman
Yael Karov
Publication venue
Publication date: 01/01/1998
Field of study

We describe a method for automatic word sense disambiguation using a text corpus and a machinereadable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method can learn even from very sparse training data, achieving over 92 % correct disambiguation performance

CiteSeerX

Similarity-based Word Sense Disambiguation

Author: Edelman Shimon
Karov Yael
Publication venue
Publication date: 01/06/1996
Field of study

We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method can learn even from very sparse training data, achieving over 92% correct disambiguation performance

Learning Similarity-Based Word Sense Disambiguation from Sparse Data

Author: Shimon Edelman
Yael Karov
Publication venue
Publication date: 01/01/1996
Field of study

We describe a method for automatic word sense disambiguation using a text corpus and a machine-readable dictionary (MRD). The method is based on word similarity and context similarity measures. Words are considered similar if they appear in similar contexts; contexts are similar if they contain similar words. The circularity of this definition is resolved by an iterative, converging process, in which the system learns from the corpus a set of typical usages for each of the senses of the polysemous word listed in the MRD. A new instance of a polysemous word is assigned the sense associated with the typical usage most similar to its context. Experiments show that this method performs well, and can learn even from very sparse training data

CiteSeerX

MicroRNA expression detected by oligonucleotide microarrays: System establishment and expression profiling in human tissues

Author: Aharonov Ranit
Avniel Amir
Barad Omer
Barzilai Adi
Bentwich Isaac
Bentwich Zvi
Einat Paz
Einav Uri
Gilad Shlomit
Hurban Patrick
Karov Yael
Lobenhofer Edward K.
Meiri Eti
Sharon Eilon
Shiboleth Yoel M.
Shtutman Marat
Publication venue: Cold Spring Harbor Laboratory Press
Publication date: 01/12/2004
Field of study

MicroRNAs (MIRs) are a novel group of conserved short ∼22 nucleotide-long RNAs with important roles in regulating gene expression. We have established a MIR-specific oligonucleotide microarray system that enables efficient analysis of the expression of the human MIRs identified so far. We show that the 60-mer oligonucleotide probes on the microarrays hybridize with labeled cRNA of MIRs, but not with their precursor hairpin RNAs, derived from amplified, size-fractionated, total RNA of human origin. Signal intensity is related to the location of the MIR sequences within the 60-mer probes, with location at the 5′ region giving the highest signals, and at the 3′ end, giving the lowest signals. Accordingly, 60-mer probes harboring one MIR copy at the 5′ end gave signals of similar intensity to probes containing two or three MIR copies. Mismatch analysis shows that mutations within the MIR sequence significantly reduce or eliminate the signal, suggesting that the observed signals faithfully reflect the abundance of matching MIRs in the labeled cRNA. Expression profiling of 150 MIRs in five human tissues and in HeLa cells revealed a good overall concordance with previously published results, but also with some differences. We present novel data on MIR expression in thymus, testes, and placenta, and have identified MIRs highly enriched in these tissues. Taken together, these results highlight the increased sensitivity of the DNA microarray over other methods for the detection and study of MIRs, and the immense potential in applying such microarrays for the study of MIRs in health and disease

Crossref

PubMed Central

Metaphor: A Computational Perspective

Author: Aarts Jan
Arciuli Joanne
ater Stefan
Badryzlova Yulia
Badryzlova Yulia
Baldwin Timothy
Barnden John
Barnden John
Baumer Eric
Beata Beigman Klebanov
Berger Adam L.
Bibik Janice
Birke Julia
Black Max
Blei David
Brants orsten
Brône Geert
Burstein Jill
Cameron Lynne
Carbonell Jaime
Casonato Marco
Cater Arthur
Charteris-Black Jonathan
Clarken Rodney
Cortes Corinna
Cox DR
Croft William
Cucerzan Silviu
Danesi Marcel
de Cruys Tim Van
den Boon Ton
Dunn Jonathan
Ekaterina Shutova
Entman Robert
Erk Katrin
Fairclough Norman
Fass Dan
Fass Dan
Fauconnier Gilles
Fauconnier Gilles
Feldman Jerome
Fellbaum Christiane
Gandy Lisa
Gentner Dedre
Gentner Dedre
Goldberg Adele E.
Grice Paul
Heintz Ilana
Hobbs Jerry
Hobbs Jerry
Hovy Dirk
Hovy Dirk
Hutton James
Indurkhya Bipin
Izwaini Sattar
Karov Yael
Kaviani Hossein
Keller Kevin
Kipper-Schuler Karin
Klebanov Beata Beigman
Klebanov Beata Beigman
Kutz Oliver
Kövecses Zoltán
Kövecses Zoltán
Lakoff George
Lakoff George
Lee Mark
Leino Anna-Liisa
Levin Beth
Levin Lori
Lewis David
Li Hongsong
Li Linlin
Little William
Littlemore Jeannette
Lönneker Birte
Lönneker Birte
Marcus Mitchell
Mark Johnson George Lakoffand
Martin James
Meila Marina
Melamed Dan
Mikolov Tomas
Mirella Lapata JeffMitchell
Mishra Taniya
Mohler Michael
Mohler Michael
Moreno Marco
Morgan Gareth
Moschitti Alessandro
Musolff Andreas
Nelson Francis W.
Nikitina Larisa
Niles Ian
Norvig Peter
Pasanek Brad
Pereira Francisco C.
Peters Wim
Pichotta Karl
Pustejovsky James
Pustejovsky James
Rash Felicity
Reddy Michael
Resnik Philip
Roberts John
Rundell Michael
Sandhaus Evan
Shutova Ekaterina
Shutova Ekaterina
Shutova Ekaterina
Shutova Ekaterina
Shutova Ekaterina
Shutova Ekaterina
Steen Gerard
Stefanowitsch Anatol
Stone Biz
Strzalkowski Tomek
Taylor Archer
Tony Veale
Tsvetkov Yulia
Turney Peter
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Veale Tony
Way Eileen Cornell
Wilks Yorick
Yu Shipeng
Zaltman Gerald
Özbal Gozde
Publication venue: 'Morgan & Claypool Publishers LLC'
Publication date
Field of study

Crossref