Search CORE

13 research outputs found

Robust domain adaptation for relation extraction via clustering consistency

Author: Chai KMA
Chieu HL
Nguyen ML
Tsang IW
Publication venue
Publication date: 01/01/2014
Field of study

We propose a two-phase framework to adapt existing relation extraction classifiers to extract relations for new target domains. We address two challenges: negative transfer when knowledge in source domains is used without considering the differences in relation distributions; and lack of adequate labeled samples for rarer relations in the new domain, due to a small labeled data set and imbalance relation distributions. Our framework leverages on both labeled and unlabeled data in the target domain. First, we determine the relevance of each source domain to the target domain for each relation type, using the consistency between the clustering given by the target domain labels and the clustering given by the predictors trained for the source domain. To overcome the lack of labeled samples for rarer relations, these clusterings operate on both the labeled and unlabeled data in the target domain. Second, we trade-off between using relevance-weighted sourcedomain predictors and the labeled target data. Again, to overcome the imbalance distribution, the source-domain predictors operate on the unlabeled target data. Our method outperforms numerous baselines and a weakly-supervised relation extraction method on ACE 2004 and YAGO. © 2014 Association for Computational Linguistics

OPUS - University of Technology Sydney

Conditional random field with high-order dependencies for sequence labeling and segmentation

Author: Chieu HL
Cuong NV
Lee WS
Ye N
Publication venue
Publication date: 01/01/2014
Field of study

Dependencies among neighboring labels in a sequence are important sources of information for sequence labeling and segmentation. However, only first-order dependencies, which are dependencies between adjacent labels or segments, are commonly exploited in practice because of the high computational complexity of typical inference algorithms when longer distance dependencies are taken into account. In this paper, we give efficient inference algorithms to handle high-order dependencies between labels or segments in conditional random fields, under the assumption that the number of distinct label patterns used in the features is small. This leads to efficient learning algorithms for these conditional random fields. We show experimentally that exploiting high-order dependencies can lead to substantial performance improvements for some problems, and we discuss conditions under which high-order features can be effective. © 2014 Nguyen Viet Cuong, Nan Ye, Wee Sun Lee and Hai Leong Chieu

CiteSeerX

Queensland University of Technology ePrints Archive

CUED - Cambridge University Engineering Department

ScholarBank@NUS

Active learning for probabilistic hypotheses using the maximum Gibbs error criterion

Author: Chai KMA
Chieu HL
Lee WS
Nguyen VC
Ye N
Publication venue
Publication date: 01/01/2013
Field of study

Advances in Neural Information Processing System

Queensland University of Technology ePrints Archive

CUED - Cambridge University Engineering Department

ScholarBank@NUS

A split-merge framework for comparing clusterings

Author: Chai KMA
Chieu HL
Mao Q
Tsang IWH
Xiang Q
Zhao Z
Publication venue
Publication date: 01/01/2012
Field of study

Clustering evaluation measures are frequently used to evaluate the performance of algorithms. However, most measures are not properly normalized and ignore some information in the inherent structure of clusterings. We model the relation between two clusterings as a bipartite graph and propose a general component-based decomposition formula based on the components of the graph. Most existing measures are examples of this formula. In order to satisfy consistency in the component, we further propose a split-merge framework for comparing clusterings of different data sets. Our framework gives measures that are conditionally normalized, and it can make use of data point information, such as feature vectors and pairwise distances. We use an entropy-based instance of the framework and a coreference resolution data set to demonstrate empirically the utility of our framework over other measures. Copyright 2012 by the author(s)/owner(s)

arXiv.org e-Print Archive

OPUS - University of Technology Sydney

Macquarie University ResearchOnline

Domain adaptation for coreference resolution: An adaptive ensemble approach

Author: Chai KMA
Chieu HL
Mao Q
Tsang IW
Xiang QL
Yang JB
Publication venue
Publication date: 01/12/2012
Field of study

We propose an adaptive ensemble method to adapt coreference resolution across domains. This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base models; (3) it can automatically adjust the active ensemble members during prediction. With simplification, this method can be used in the traditional within-domain case, while still retaining the above features. To the best of our knowledge, this work is the first to both (i) develop a domain adaptation algorithm for the coreference resolution problem and (ii) have the above features as an ensemble method. Empirically, we show the benefits of (i) on the six domains of the ACE 2005 data set in domain adaptation setting, and of (ii) on both the MUC-6 and the ACE 2005 data sets in within-domain setting. © 2012 Association for Computational Linguistics

OPUS - University of Technology Sydney

Relation Extraction

Author: 10.1109/TKDE.2009.191.
A Berger
C Cortes
D Gildea
D Zelenko
HL Chieu
M Glass
R Caruana
RC Bunescu
S Abney
S Thrun
TG Dietterich
Y Freund
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

General learning approach for event extraction: Case of management change event

Author: Becker M
Besançon R
Bunescu R
Bunescu RC
Carreras X
Chieu HL
Cunningham H.
Ekbal A
Grishman R
Grishman R.
Isozaki H
Jiang J
Kazama J
Lafferty J
Ludovic JL
McCallum A
McCallum A
Sekine S
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

A Structuralist Approach for Personal Knowledge Exploration Systems on Mobile Devices

Author: F Saussure De
Fleischman Michael
G Heyer
G Salton
GR McMenamin
HL Chieu
J Rudman
M Barlow
M Hall
M Schierle
N Kushmerick
R Grishman
R Grishman
R Richardson
S Bordag
T Dunning
T Kiss
VR Carvalho
Witschel Hf
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

CHEMDNER system with mixed conditional random fields and multi-scale word clustering

Author: A McCallum
D Yu
Donghong Ji
EF Tjong Kim Sang
G Zhou
GS Mann
HL Chieu
HS Huang
J Finkel
J Lafferty
J Turian
JD Kim
K Ganchev
L Smith
M Krallinger
PF Brown
R Collobert
R McDonald
S Miller
T Mikolov
Xiaohui Liang
Xiaomei Wei
Xiaoyuan Yao
Yanan Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

NetiNeti: discovery of scientific names from text using machine learning methods

Author: A Borthwick
A Ratnaparkhi
AL Berger
BSG Goodrich
C Plake
Catherine N Norton
D Beeferman
D Hanisch
D Koning
D Rebholz-Schuhmann
DJ Patterson
DJ Patterson
G Sautter
H Poon
HL Chieu
Holly Miller
I Rish
IN Sarkar
J Hakenberg
J Nocedal
JE Hopcroft
JN Darroch
JR Quinlan
K Nigam
Lakshmi Manohar Akella
M Gerner
N Okazaki
P Domingos
PR Leary
R Malouf
R Rosenfeld
RD Page
RT Abbott
S DellaPietra
T Kappeler
TM Mitchell
X Wang
X Wang
X Wang
Publication venue: BMC
Publication date: 01/01/2012
Field of study

Abstract Background A scientific name for an organism can be associated with almost all biological data. Name identification is an important step in many text mining tasks aiming to extract useful information from biological, biomedical and biodiversity text sources. A scientific name acts as an important metadata element to link biological information. Results We present NetiNeti (Name Extraction from Textual Information-Name Extraction for Taxonomic Indexing), a machine learning based approach for recognition of scientific names including the discovery of new species names from text that will also handle misspellings, OCR errors and other variations in names. The system generates candidate names using rules for scientific names and applies probabilistic machine learning methods to classify names based on structural features of candidate names and features derived from their contexts. NetiNeti can also disambiguate scientific names from other names using the contextual information. We evaluated NetiNeti on legacy biodiversity texts and biomedical literature (MEDLINE). NetiNeti performs better (precision = 98.9% and recall = 70.5%) compared to a popular dictionary based approach (precision = 97.5% and recall = 54.3%) on a 600-page biodiversity book that was manually marked by an annotator. On a small set of PubMed Central’s full text articles annotated with scientific names, the precision and recall values are 98.5% and 96.2% respectively. NetiNeti found more than 190,000 unique binomial and trinomial names in more than 1,880,000 PubMed records when used on the full MEDLINE database. NetiNeti also successfully identifies almost all of the new species names mentioned within web pages. Conclusions We present NetiNeti, a machine learning based approach for identification and discovery of scientific names. The system implementing the approach can be accessed at <url>http://namefinding.ubio.org.</url></p

Crossref

Woods Hole Open Access Server

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central