Search CORE

14 research outputs found

French-English Terminology Extraction from Comparable Corpora

Author: B. Daille
C. Jacquemin
E. Gaussier
G. Salton
I.D. Melamed
P. Fung
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2005
Field of study

Crossref

Mining Parenthetical Translations for Polish-English Lexica

Author: I.D. Melamed
P. Resnik
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref

Chinese Ancient-Modern Sentence Alignment

Author: C. Sun
I.D. Melamed
J. Veronis
M. Kay
X. Wang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

Crossref

Optimal designs for two-color microarray experiments in multi-factorial models

Author: H.M. Caseli
I.D. Melamed
J. Chen
P.F. Brown
P.F. Brown
W.A. Gale
Publication venue: Publikationsserver der RWTH Aachen University
Publication date: 01/01/2005
Field of study

Two-color microarray experiments form an important tool in gene expression analysis. They are often used to identify candidate genes that can be made accountable for the genesis of a certain disease. Due to the high costs of microarray experiments it is fundamental to design these experiments carefully and specifically give instructions, which samples should be allocated on the same microarray. Thereby, two samples are hybridized together on one array and the assignment of samples to arrays influences the precision of the results. Therefore, design issues for microarray experiments have been investigated intensively in the last years. However, only few authors (e.g. Stanzel (2007)) focused on more than one factor of interest. We extend Stanzel's work and derive approximate optimal designs for estimating interactions in multi-factorial settings. Thereby, optimality of candidate designs is shown using equivalence theorems (Pukelsheim (1993)). Another practical important but less studied topic is the derivation of exact optimal designs. Most research considers approximate designs or exact designs for special contrast sets and selected numbers of arrays. Therefore, we focus on exact designs and present a method to construct A-optimal microarray designs for arbitrary numbers of arrays and arbitrary contrast sets. This method is applied to derive optimal designs for estimating treatment-control comparisons, all-to-next contrasts, Helmert contrasts and all pairwise comparisons. Furthermore, we derive robust designs, which achieve efficient results even if observations are missing. Missing values are a crucial topic in the context of microarray experiments, since they often occur due to scratches on the slide or other damaging. In applications recommendations for the choice of efficient experimental layouts can be derived from our constructed designs

Crossref

Publikationsserver der RWTH Aachen University

Evaluation of Methods for Sentence and Lexical Alignment of Brazilian Portuguese and English Parallel Texts

Author: E. Gaussier
I.D. Melamed
J. Véronis
K. Hofland
O. Kraif
S. Piperidis
W.A. Gale
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2004
Field of study

Crossref

A two-level structure for compressing aligned bitexts

Author: C.G. Nevill-Manning
D.E. Knuth
E.S. Conley
F.J. Och
G. Navarro
H.S. Heaps
I.D. Melamed
J.G. Cleary
N.R. Brisaboa
R. Mihalcea
R.N. Horspool
R.S. Boyer
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

A bitext, or bilingual parallel corpus, consists of two texts, each one in a different language, that are mutual translations. Bitexts are very useful in linguistic engineering because they are used as source of knowledge for different purposes. In this paper we propose a strategy to efﬁciently compress and use bitexts, saving, not only space, but also processing time when exploiting them. Our strategy is based on a two-level structure for the vocabularies, and on the use of biwords, a pair of associated words, one from each language, as basic symbols to be encoded with an ETDC compressor. The resulting compressed bitext needs around 20% of the space and allows more efﬁcient implementations of the different types of searches and operations that linguistic engineerings need to perform on them. In this paper we discuss and provide results for compression, decompression, different types of searches, and bilingual snippets extraction.Spanish projects TIN2006-15071-C03-01, TIN2006-15071-C03-02 and TIN2006-15071-C03-03. Regional Government of Castilla y León and the European Social Fund

Repositorio Institucional de la Universidad de Alicante

Crossref

Adaptive Bilingual Sentence Alignment

Author: I.D. Melamed
J.S. Chang
K.L. Kwok
Longman Group
M. Kay
P.F. Brown
S.J. Ker
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

N-gram similarity and distance

Author: A. Marzal
B. Smyth
B.L. Lambert
E. Ukkonen
I.D. Melamed
R.A. Wagner
T.H. Cormen
V. Chvátal
Publication venue
Publication date: 01/01/2005
Field of study

Abstract. In many applications, it is necessary to algorithmically quantify the similarity exhibited by two strings composed of symbols from a finite alphabet. Numerous string similarity measures have been proposed. Particularly well-known measures are based are edit distance and the length of the longest common subsequence. We develop a notion of n-gram similarity and distance. We show that edit distance and the length of the longest common subsequence are special cases of n-gram distance and similarity, respectively. We provide formal, recursive definitions of n-gram similarity and distance, together with efficient algorithms for computing them. We formulate a family of word similarity measures based on n-grams, and report the results of experiments that suggest that the new measures outperform their unigram equivalents.

CiteSeerX

Crossref

Alignment of Paragraphs in Bilingual Texts Using Bilingual Dictionaries and Dynamic Programming

Author: A. Gelbukh
A. Meyers
C. Kit
C.M. Robert
F. Velásquez
Gelbukh
H.M. Caseli
I.D. Melamed
K. Martin
M. Mikhailov
R. Baeza-Yates
S. Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2006
Field of study

Parallel text alignment is a special type of pattern recognition task aimed to discover the similarity between two sequences of symbols. Given the same text in two different languages, the task is to decide which elements--paragraphs in case of paragraph alignment---in one text are translations of which elements of the other text. One of the applications is training training statistical machine translation algorithms. The task is not trivial unless detailed text understanding can be afforded. In our previous work we have presented a simple technique that relied on bilingual dictionaries but does not perform any syntactic analysis of the texts. In this paper we give a formal definition of the task and present an exact optimization algorithm for finding the best alignment

CiteSeerX

Crossref

Human Evaluation of a German Surface Realisation Ranker

Author: A. Belz
A. Cahill
A. Gatt
A. Nenkova
C.Y. Lin
E. Velldal
H. Nakanishi
I.D. Melamed
J. Bresnan
J. Hall
K. Filippova
K. Owczarzak
K. Owczarzak
K. Papineni
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Crossref