Search CORE

93 research outputs found

A Conditional Random Field for Discriminatively-trained Finite-state String Edit Distance

Author: Bellare Kedar
McCallum Andrew
Pereira Fernando
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2005
Field of study

The need to measure sequence similarity arises in information extraction, object identity, data mining, biological sequence analysis, and other domains. This paper presents discriminative string-edit CRFs, a finitestate conditional random field model for edit sequences between strings. Conditional random fields have advantages over generative approaches to this problem, such as pair HMMs or the work of Ristad and Yianilos, because as conditionally-trained methods, they enable the use of complex, arbitrary actions and features of the input strings. As in generative models, the training data does not have to specify the edit sequences between the given string pairs. Unlike generative models, however, our model is trained on both positive and negative instances of string pairs. We present positive experimental results on several data sets

CiteSeerX

ScholarWorks@UMass Amherst

Learning to match names across languages

Author: Alex Yeh
Inderjeet Mani
Sherri Condon
Publication venue: ACL
Publication date: 01/01/2008
Field of study

We report on research on matching names in different scripts across languages. We explore two trainable approaches based on comparing pronunciations. The first, a cross-lingual approach, uses an automatic name-matching program that exploits rules based on phonological comparisons of the two languages carried out by humans. The second, monolingual approach, relies only on automatic comparison of the phonological representations of each pair. Alignments produced by each approach are fed to a machine learning algorithm. Results show that the monolingual approach results in machine-learning based comparison of person-names in English and Chinese at an accuracy of over 97.0 F-measure.

CiteSeerX

Crossref

Alignment Analysis of Sequential Segmentation of Lexicons to Improve Automatic Cognate Detection

Author: A Pranav
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2018
Field of study

Ranking functions in information retrieval are often used in search engines to recommend the relevant answers to the query. This paper makes use of this notion of information retrieval and applies onto the problem domain of cognate detection. The main contributions of this paper are: (1) positional segmentation, which incorporates the sequential notion; (2) graphical error modelling, which deduces the transformations. The current research work focuses on classification problem; which is distinguishing whether a pair of words are cognates. This paper focuses on a harder problem, whether we could predict a possible cognate from the given input. Our study shows that when language modelling smoothing methods are applied as the retrieval functions and used in conjunction with positional segmentation and error modelling gives better results than competing baselines, in both classification and prediction of cognates. Source code is at: https://github.com/pranav-ust/cognatesComment: Published at ACL-SRW 201

arXiv.org e-Print Archive

Crossref

Grammatical-Restrained Hidden Conditional Random Fields for Bioinformatics applications

Author: A Krogh
A McCallum
A Quattoni
C Manning
C Sutton
C Sutton
Castrense Savojardo
CT Li
H Bigelow
J Lafferty
K Sato
L Wang
MH Li
P Bagos
P Baldi
P Fariselli
P Martelli
Pier Luigi Martelli
Piero Fariselli
R Durbin
Rita Casadio
S Wang
TH Dang
W Kabsch
X Xia
Y Liu
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Abstract Background Discriminative models are designed to naturally address classification tasks. However, some applications require the inclusion of grammar rules, and in these cases generative models, such as Hidden Markov Models (HMMs) and Stochastic Grammars, are routinely applied. Results We introduce Grammatical-Restrained Hidden Conditional Random Fields (GRHCRFs) as an extension of Hidden Conditional Random Fields (HCRFs). GRHCRFs while preserving the discriminative character of HCRFs, can assign labels in agreement with the production rules of a defined grammar. The main GRHCRF novelty is the possibility of including in HCRFs prior knowledge of the problem by means of a defined grammar. Our current implementation allows <it>regular grammar </it>rules. We test our GRHCRF on a typical biosequence labeling problem: the prediction of the topology of Prokaryotic outer-membrane proteins. Conclusion We show that in a typical biosequence labeling problem the GRHCRF performs better than CRF models of the same complexity, indicating that GRHCRFs can be useful tools for biosequence analysis applications. Availability GRHCRF software is available under GPLv3 licence at the website <url>http://www.biocomp.unibo.it/~savojard/biocrf-0.9.tar.gz.</url></p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Archivio istituzionale della ricerca - Università di Padova