133 research outputs found
Large Margin Nearest Neighbor Embedding for Knowledge Representation
Traditional way of storing facts in triplets ({\it head\_entity, relation,
tail\_entity}), abbreviated as ({\it h, r, t}), makes the knowledge intuitively
displayed and easily acquired by mankind, but hardly computed or even reasoned
by AI machines. Inspired by the success in applying {\it Distributed
Representations} to AI-related fields, recent studies expect to represent each
entity and relation with a unique low-dimensional embedding, which is different
from the symbolic and atomic framework of displaying knowledge in triplets. In
this way, the knowledge computing and reasoning can be essentially facilitated
by means of a simple {\it vector calculation}, i.e. . We thus contribute an effective model to learn better embeddings
satisfying the formula by pulling the positive tail entities to
get together and close to {\bf h} + {\bf r} ({\it Nearest Neighbor}), and
simultaneously pushing the negatives away from the positives
via keeping a {\it Large Margin}. We also design a corresponding
learning algorithm to efficiently find the optimal solution based on {\it
Stochastic Gradient Descent} in iterative fashion. Quantitative experiments
illustrate that our approach can achieve the state-of-the-art performance,
compared with several latest methods on some benchmark datasets for two
classical applications, i.e. {\it Link prediction} and {\it Triplet
classification}. Moreover, we analyze the parameter complexities among all the
evaluated models, and analytical results indicate that our model needs fewer
computational resources on outperforming the other methods.Comment: arXiv admin note: text overlap with arXiv:1503.0815
Recommended from our members
Who, What, When, Where, Why? Comparing Multiple Approaches to the Cross-Lingual 5W Task
Cross-lingual tasks are especially difficult due to the compounding effect of errors in language processing and errors in machine translation (MT). In this paper, we present an error analysis of a new cross-lingual task: the 5W task, a sentence-level understanding task which seeks to return the English 5W's (Who, What, When, Where and Why) corresponding to a Chinese sentence. We analyze systems that we developed, identifying specific problems in language processing and MT that cause errors. The best cross-lingual 5W system was still 19% worse than the best monolingual 5W system, which shows that MT significantly degrades sentence-level understanding. Neither source-language nor target-language analysis was able to circumvent problems in MT, although each approach had advantages relative to the other. A detailed error analysis across multiple systems suggests directions for future research on the problem
GIANT: Scalable Creation of a Web-scale Ontology
Understanding what online users may pay attention to is key to content
recommendation and search services. These services will benefit from a highly
structured and web-scale ontology of entities, concepts, events, topics and
categories. While existing knowledge bases and taxonomies embody a large volume
of entities and categories, we argue that they fail to discover properly
grained concepts, events and topics in the language style of online population.
Neither is a logically structured ontology maintained among these notions. In
this paper, we present GIANT, a mechanism to construct a user-centered,
web-scale, structured ontology, containing a large number of natural language
phrases conforming to user attentions at various granularities, mined from a
vast volume of web documents and search click graphs. Various types of edges
are also constructed to maintain a hierarchy in the ontology. We present our
graph-neural-network-based techniques used in GIANT, and evaluate the proposed
methods as compared to a variety of baselines. GIANT has produced the Attention
Ontology, which has been deployed in various Tencent applications involving
over a billion users. Online A/B testing performed on Tencent QQ Browser shows
that Attention Ontology can significantly improve click-through rates in news
recommendation.Comment: Accepted as full paper by SIGMOD 202
Nominalization and Alternations in Biomedical Language
Background: This paper presents data on alternations in the argument structure of common domain-specific verbs and their associated verbal nominalizations in the PennBioIE corpus. Alternation is the term in theoretical linguistics for variations in the surface syntactic form of verbs, e.g. the different forms of stimulate in FSH stimulates follicular development and follicular development is stimulated by FSH. The data is used to assess the implications of alternations for biomedical text mining systems and to test the fit of the sublanguage model to biomedical texts. Methodology/Principal Findings: We examined 1,872 tokens of the ten most common domain-specific verbs or their zerorelated nouns in the PennBioIE corpus and labelled them for the presence or absence of three alternations. We then annotated the arguments of 746 tokens of the nominalizations related to these verbs and counted alternations related to the presence or absence of arguments and to the syntactic position of non-absent arguments. We found that alternations are quite common both for verbs and for nominalizations. We also found a previously undescribed alternation involving an adjectival present participle. Conclusions/Significance: We found that even in this semantically restricted domain, alternations are quite common, and alternations involving nominalizations are exceptionally diverse. Nonetheless, the sublanguage model applies to biomedica
- …