10 research outputs found

    Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs

    Full text link
    Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a list of non-discrete attributes for each entity. Intuitively, these attributes such as height, price or population count are able to richly characterize entities in knowledge graphs. This additional source of information may help to alleviate the inherent sparsity and incompleteness problem that are prevalent in knowledge graphs. Unfortunately, many state-of-the-art relational learning models ignore this information due to the challenging nature of dealing with non-discrete data types in the inherently binary-natured knowledge graphs. In this paper, we propose a novel multi-task neural network approach for both encoding and prediction of non-discrete attribute information in a relational setting. Specifically, we train a neural network for triplet prediction along with a separate network for attribute value regression. Via multi-task learning, we are able to learn representations of entities, relations and attributes that encode information about both tasks. Moreover, such attributes are not only central to many predictive tasks as an information source but also as a prediction target. Therefore, models that are able to encode, incorporate and predict such information in a relational learning context are highly attractive as well. We show that our approach outperforms many state-of-the-art methods for the tasks of relational triplet classification and attribute value prediction.Comment: Accepted at CIKM 201

    Linear Feature Extractors Based on Mutual Information

    No full text
    This paper presents and evaluates two linear feature extractors based on mutual information. These feature extractors consider general dependencies between features and class labels, as opposed to well known linear methods such as PCA which does not consider class labels and LDA, which uses only simple low order dependencies. As evidenced by several simulations on high dimensional data sets, the proposed techniques provide superior feature extraction and better dimensionality reduction while having similar computational requirements. 1. Introduction The capabilities of a classifier are ultimately limited by the quality of the features in each input vector. In particular, when the measurement space is highdimensional but the number of samples is limited, one is faced with the "curse of dimensionality" problem during training [3]. Feature extraction is often used to alleviate this problem. Although linear feature extractors are ultimately less flexible than the more general non-linear ..

    A system for automatic personalized tracking of scientific literature on the web

    No full text
    We introduce a system as part of the CiteSeer digital library project for automatic tracking of scientific literature that is relevant to a user’s research interests. Unlike previous systems that use simple keyword matching, CiteSeer is able to track and recommend topically relevant papers even when keyword based query profiles fail. This is made possible through the use of a heterogenous profile to represent user interests. These profiles include several representations, including content based relatedness measures. The CiteSeer tracking system is well integrated into the search and browsing facilities of CiteSeer, and provides the user with great flexibility in tuning a profile to better match his or her interests. The software for this system is available, and a sample database is online as a public service

    CiteSeer: An Automatic Citation Indexing System

    No full text
    We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following citation links (e.g. by providing a list of papers that cite a given paper), the evaluation and ranking of papers, authors, journals, etc. based on the number of citations, and the identification of research trends. CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost..

    Discovering relevant scientific literature on the web

    No full text
    boon to scientific publication. It lets researchers disseminate their reports faster and at lower cost than ever before, greatly increasing the number and diversity of easily available publications. At the same time, however, the acceleration of publication has increased the perceived information overload for researchers attempting to keep abreast of relevant research in rapidly advancing fields. Scientific literature on the Web makes up a massive, noisy, disorganized database. Unlike large, single-source databases such as a corporate customer database, the Web database draws from many sources, each with its own organization. Also, owing to its diversity, most records in this database are irrelevant to an individual researcher. Furthermore, the database is constantly growing in content and changing in organization. All these characteristics make the Web a difficult domain for knowledge discovery. To quickly and easily gather useful knowledge from such a database, users need the help of an information-filtering system that automatically extracts only relevant records as they appear in a stream of incoming records. 1 To this end, we have developed the CiteSeer digital library system. 2 CiteSeer, a custom-digital-library generator, perform

    CiteSeer: An Autonomous Web Agent for Automatic Retrieval and Identification of Interesting Publications

    No full text
    Published research papers available on the World Wide Web (WWW or Web) are often poorly organized, often exist in non-text form (e.g. Postscript) documents, and increase in quantity daily. Significant amounts of time and effort are commonly needed to find interesting and relevant publications on the Web. We have developed a Web based information agent that assists the user in the process of performing a scientific literature search. Given a set of keywords, the agent uses Web search engines and heuristics to locate and download papers. The papers are parsed in order to extract information features such as the abstract and individually identified citations which are placed into an SQL database. The agent's Web interface can be used to find relevant papers in the database using keyword searches, or by navigating the links between papers formed by the citations. Links to both "citing " and "cited " publications can be followed. In addition to simple browsing and keyword searches, the agent can find papers which are similar to a given paper using word information and by analyzing common citations made by the papers

    Citeseer: an automatic citation indexing system

    No full text
    We present CiteSeer: an autonomous citation indexing system which indexes academic literature in electronic format (e.g. Postscript files on the Web). CiteSeer understands how to parse citations, identify citations to the same paper in different formats, and identify the context of citations in the body of articles. CiteSeer provides most of the advantages of traditional (manually constructed) citation indexes (e.g. the ISI citation indexes), including: literature retrieval by following citation links (e.g. by providing a list of papers that cite a given paper), the evaluation and ranking of papers, authors, journals, etc. based on the number of citations, and the identification of research trends. CiteSeer has many advantages over traditional citation indexes, including the ability to create more up-to-date databases which are not limited to a preselected set of journals or restricted by journal publication delays, completely autonomous operation with a corresponding reduction in cost, and powerful interactive browsing of the literature using the context of citations. Given a particular paper of interest, CiteSeer can display the context of how the paper is cited in subsequent publications. This context may contain a brief summary of the paper, another author's response to the paper, or subsequent work which builds upon the original article. CiteSeer allows the location of papers by keyword search or by citation links. Papers related to a given paper can be located using common citation information or word vector similarity. CiteSeer will soon be available for public use
    corecore