Search CORE

4,204 research outputs found

A Generative Model of Words and Relationships from Multiple Sources

Author: Hyland Stephanie L.
Karaletsos Theofanis
Rätsch Gunnar
Publication venue
Publication date: 03/12/2015
Field of study

Neural language models are a powerful tool to embed words into semantic vector spaces. However, learning such models generally relies on the availability of abundant and diverse training examples. In highly specialised domains this requirement may not be met due to difficulties in obtaining a large corpus, or the limited range of expression in average use. Such domains may encode prior knowledge about entities in a knowledge base or ontology. We propose a generative model which integrates evidence from diverse data sources, enabling the sharing of semantic information. We achieve this by generalising the concept of co-occurrence from distributional semantics to include other relationships between entities or words, which we model as affine transformations on the embedding space. We demonstrate the effectiveness of this approach by outperforming recent models on a link prediction task and demonstrating its ability to profit from partially or fully unobserved data training labels. We further demonstrate the usefulness of learning from different data sources with overlapping vocabularies.Comment: 8 pages, 5 figures; incorporated feedback from reviewers; to appear in Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Ontology Driven Web Extraction from Semi-structured and Unstructured Data for B2B Market Analysis

Author: Darlington John
Imtiaz Hazzaz
Zuo Landong
Publication venue
Publication date: 01/09/2009
Field of study

The Market Blended Insight project1 has the objective of improving the UK business to business marketing performance using the semantic web technologies. In this project, we are implementing an ontology driven web extraction and translation framework to supplement our backend triple store of UK companies, people and geographical information. It deals with both the semi-structured data and the unstructured text on the web, to annotate and then translate the extracted data according to the backend schema

Southampton (e-Prints Soton)