14 research outputs found

    Multi-label collective classification in multi-attribute multi-relational network data

    No full text
    Abstract—Classical machine learning techniques assume the data to be i.i.d., but the real world data is inherently relational and can generally be represented using graphs or some variants of a graph representation. The importance of modeling relational data is evident from its increasing presence in many domains: Telecom networks, WWW, social networks, organizational net-works, images, protein sequences, etc. This field has recently been receiving a lot of attention in various communities under different themes depending on the problem addressed and the nature of solution proposed. Collective classification is one such popular approach which involves the use of a local classifier that embeds the node’s own attributes and neighbors ’ information in a feature vector, and classifies the nodes in an iterative procedure. Despite the increasing popularity, there is not much attention paid towards datasets with multiple attributes and multi-relational (MAMR) networks under multi-label scenarios. In MAMR data, nodes can be represented using multiple types of attributes (attribute views) and there are multiple link types between the nodes. For example, in Twitter, users can be represented using their tweets, urls shared, hashtags and list memberships. And different Twitter users can be connected using follower, followed by and re-tweet links. Secondly, in many networks, nodes are associated with more than one label. For instance, Twitter users can be tagged with one or more labels from a set L, where L contains various movie genres that a user might like. Motivated by this, we propose a learning technique for multi-label collective classification using multiple attribute views on multi-relational network data which captures complex label correlations within and across attribute/relationship types. We empirically evaluate our proposed approach on Twitter and MovieLens datasets, and we show that it performs better than the state-of-art approaches. I

    A Natural Language Processing System for Extracting Evidence of Drug Repurposing from Scientific Publications

    No full text
    More than 200 generic drugs approved by the U.S. Food and Drug Administration for non-cancer indications have shown promise for treating cancer. Due to their long history of safe patient use, low cost, and widespread availability, repurposing of these drugs represents a major opportunity to rapidly improve outcomes for cancer patients and reduce healthcare costs. In many cases, there is already evidence of efficacy for cancer, but trying to manually extract such evidence from the scientific literature is intractable. In this emerging applications paper, we introduce a system to automate non-cancer generic drug evidence extraction from PubMed abstracts. Our primary contribution is to define the natural language processing pipeline required to obtain such evidence, comprising the following modules: querying, filtering, cancer type entity extraction, therapeutic association classification, and study type classification. Using the subject matter expertise on our team, we create our own datasets for these specialized domain-specific tasks. We obtain promising performance in each of the modules by utilizing modern language processing techniques and plan to treat them as baseline approaches for future improvement of individual components
    corecore