Search CORE

133 research outputs found

Large Margin Nearest Neighbor Embedding for Knowledge Representation

Author: Fan Miao
Grishman Ralph
Zheng Thomas Fang
Zhou Qiang
Publication venue
Publication date: 07/04/2015
Field of study

Traditional way of storing facts in triplets ({\it head\_entity, relation, tail\_entity}), abbreviated as ({\it h, r, t}), makes the knowledge intuitively displayed and easily acquired by mankind, but hardly computed or even reasoned by AI machines. Inspired by the success in applying {\it Distributed Representations} to AI-related fields, recent studies expect to represent each entity and relation with a unique low-dimensional embedding, which is different from the symbolic and atomic framework of displaying knowledge in triplets. In this way, the knowledge computing and reasoning can be essentially facilitated by means of a simple {\it vector calculation}, i.e.

{\bf h} + {\bf r} \approx {\bf t}

. We thus contribute an effective model to learn better embeddings satisfying the formula by pulling the positive tail entities

{\bf t^{+}}

to get together and close to {\bf h} + {\bf r} ({\it Nearest Neighbor}), and simultaneously pushing the negatives

{\bf t^{-}}

away from the positives

{\bf t^{+}}

via keeping a {\it Large Margin}. We also design a corresponding learning algorithm to efficiently find the optimal solution based on {\it Stochastic Gradient Descent} in iterative fashion. Quantitative experiments illustrate that our approach can achieve the state-of-the-art performance, compared with several latest methods on some benchmark datasets for two classical applications, i.e. {\it Link prediction} and {\it Triplet classification}. Moreover, we analyze the parameter complexities among all the evaluated models, and analytical results indicate that our model needs fewer computational resources on outperforming the other methods.Comment: arXiv admin note: text overlap with arXiv:1503.0815

arXiv.org e-Print Archive

Crossref

Recommended from our members

Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task

Author: Coyne Robert Eric
Diab Mona T.
Grishman Ralph
Hakkani-Tür Dilek
Harper Mary
Ji Heng
Ma Wei Yun
McKeown Kathleen
Meyers Adam
Parton Kristen
Rosenthal Sara
Sun Ang
Tur Gokhan
Xu Wei
Yaman Sibel
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor target-language analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem

Columbia University Academic Commons

Исследование противомикробной активности ди(имидазол-1-ил)алканов и их производных

Author: Favre Benoit
Grishman Ralph
Hakkani-Tur Dilek
Harper Mary
Hillard Dustin
Hirschberg Julia
Ji Heng
Kahn Jeremy G.
Liu Yang
Maskey Sameer
Matusov Evgeny
Ney Hermann
Ostendorf Mari
Rosenberg Andrew
Shriberg Elizabeth
Wang Wen
Wooter Chuck
Publication venue: Изд-во ТПУ
Publication date: 01/01/2008
Field of study

Electronic archive of Tomsk Polytechnic University

GIANT: Scalable Creation of a Web-scale Ontology

Author: Adomavicius Gediminas
Brin Sergey
Cordeiro Mário
Devlin Jacob
Doddington George R
Fader Anthony
Frantzi Katerina
Grishman Ralph
Ji Heng
Koo Terry
McClosky David
Mihalcea Rada
Pasca Marius
Pawar Sachin
Ritter Alan
Sha Lei
Smirnova Alisa
Witten Ian H
Witten Ian H
Zhang Ziqi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/04/2020
Field of study

Understanding what online users may pay attention to is key to content recommendation and search services. These services will benefit from a highly structured and web-scale ontology of entities, concepts, events, topics and categories. While existing knowledge bases and taxonomies embody a large volume of entities and categories, we argue that they fail to discover properly grained concepts, events and topics in the language style of online population. Neither is a logically structured ontology maintained among these notions. In this paper, we present GIANT, a mechanism to construct a user-centered, web-scale, structured ontology, containing a large number of natural language phrases conforming to user attentions at various granularities, mined from a vast volume of web documents and search click graphs. Various types of edges are also constructed to maintain a hierarchy in the ontology. We present our graph-neural-network-based techniques used in GIANT, and evaluate the proposed methods as compared to a variety of baselines. GIANT has produced the Attention Ontology, which has been deployed in various Tencent applications involving over a billion users. Online A/B testing performed on Tencent QQ Browser shows that Attention Ontology can significantly improve click-through rates in news recommendation.Comment: Accepted as full paper by SIGMOD 202

arXiv.org e-Print Archive

Crossref

Nominalization and Alternations in Biomedical Language

Author: Adam Meyers
Adam Meyers
Adam Meyers
Adam Meyers
BarbaraH Partee
Ben Goertzel
Beth Levin
Carol Friedman
CharlesJ Fillmore
Christiane Fellbaum
DeborahA Dahl
Douglas Biber
George Dunham
George Hripcsak
Gondy Leroy
Gondy Leroy
James Pustejovsky
Jin-Dong Kim
JM Ko
John Lehrberger
Jonathan Schuman
K. Bretonnel Cohen
Karin Verspoor
KBretonnel Cohen
KBretonnel Cohen
Laurie Bauer
Lawrence Hunter
Leroy Gondy
Lynette Hirschman
M Narayanaswamy
Malka Rappaport-Hovav
Maria Koptjevskaja-Tamm
Martha Palmer
Martha Palmer
Martha Palmer
MartinF Porter
Michael Johnston
Michael Johnston
Naomi Sager
Naomi Sager
ParantuK Shah
PhilipV Ogren
PhilipV Ogren
Pierre Zweigenbaum
Ralph Grishman
Randolph Quirk
Richard Kittredge
Richard Tzong-Han Tsai
Robert P. Futrelle
RobertB Lees
Ron Artstein
Sameer Pradhan
Seth Kulick
T Ono
Thomas Herbst
Thomas Roeper
ThomasC Rindflesch
TimothyW Finin
Tony McEnery
Tuangthong Wattarujeekrit
Wen-Chi Chou
X Yuan
Yacov Kogan
Yuka Tateisi
Zellig Harris
Zheng Ping Jiang
ZZ Hu
Publication venue: Public Library of Science
Publication date: 01/01/2008
Field of study

Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central