104 research outputs found

    Overlap Matrix Completion for Predicting Drug-Associated Indications

    Get PDF
    Identification of potential drug-associated indications is critical for either approved or novel drugs in drug repositioning. Current computational methods based on drug similarity and disease similarity have been developed to predict drug-disease associations. When more reliable drug- or disease-related information becomes available and is integrated, the prediction precision can be continuously improved. However, it is a challenging problem to effectively incorporate multiple types of prior information, representing different characteristics of drugs and diseases, to identify promising drug-disease associations. In this study, we propose an overlap matrix completion (OMC) for bilayer networks (OMC2) and tri-layer networks (OMC3) to predict potential drug-associated indications, respectively. OMC is able to efficiently exploit the underlying low-rank structures of the drug-disease association matrices. In OMC2, first of all, we construct one bilayer network from drug-side aspect and one from disease-side aspect, and then obtain their corresponding block adjacency matrices. We then propose the OMC2 algorithm to fill out the values of the missing entries in these two adjacency matrices, and predict the scores of unknown drug-disease pairs. Moreover, we further extend OMC2 to OMC3 to handle tri-layer networks. Computational experiments on various datasets indicate that our OMC methods can effectively predict the potential drug-disease associations. Compared with the other state-of-the-art approaches, our methods yield higher prediction accuracy in 10-fold cross-validation and de novo experiments. In addition, case studies also confirm the effectiveness of our methods in identifying promising indications for existing drugs in practical applications

    Intelligent Fusion of Structural and Citation-Based Evidence for Text Classification

    Get PDF
    This paper investigates how citation-based information and structural content (e.g., title, abstract) can be combined to improve classification of text documents into predefined categories. We evaluate different measures of similarity, five derived from the citation structure of the collection, and three measures derived from the structural content, and determine how they can be fused to improve classification effectiveness. To discover the best fusion framework, we apply Genetic Programming (GP) techniques. Our empirical experiments using documents from the ACM digital library and the ACM classification scheme show that we can discover similarity functions that work better than any evidence in isolation and whose combined performance through a simple majority voting is comparable to that of Support Vector Machine classifiers

    A New Web Search Engine with Learning Hierarchy

    Get PDF
    Most of the existing web search engines (such as Google and Bing) are in the form of keyword-based search. Typically, after the user issues a query with the keywords, the search engine will return a flat list of results. When the query issued by the user is related to a topic, only the keyword matching may not accurately retrieve the whole set of webpages in that topic. On the other hand, there exists another type of search system, particularly in e-Commerce web- sites, where the user can search in the categories of different faceted hierarchies (e.g., product types and price ranges). Is it possible to integrate the two types of search systems and build a web search engine with a topic hierarchy? The main diffculty is how to classify the vast number of webpages on the Internet into the topic hierarchy. In this thesis, we will leverage machine learning techniques to automatically classify webpages into the categories in our hierarchy, and then utilize the classification results to build the new search engine SEE. The experimental results demonstrate that SEE can achieve better search results than the traditional keyword-based search engine in most of the queries, particularly when the query is related to a topic. We also conduct a small-scale usability study which further verifies that SEE is a promising search engine. To further improve SEE, we also propose a new active learning framework with several novel strategies for hierarchical classification

    Introducing linked open data in graph-based recommender systems

    Get PDF
    Thanks to the recent spread of the Linked Open Data (LOD) initiative, a huge amount of machine-readable knowledge encoded as RDF statements is today available in the so-called LOD cloud. Accordingly, a big effort is now spent to investigate to what extent such information can be exploited to develop new knowledge-based services or to improve the effectiveness of knowledge-intensive platforms as Recommender Systems (RS). To this end, in this article we study the impact of the exogenous knowledge coming from the LOD cloud on the overall performance of a graph-based recommendation framework. Specifically, we propose a methodology to automatically feed a graph-based RS with features gathered from the LOD cloud and we analyze the impact of several widespread feature selection techniques in such recommendation settings. The experimental evaluation, performed on three state-of-the-art datasets, provided several outcomes: first, information extracted from the LOD cloud can significantly improve the performance of a graph-based RS. Next, experiments showed a clear correlation between the choice of the feature selection technique and the ability of the algorithm to maximize specific evaluation metrics, as accuracy or diversity of the recommendations. Moreover, our graph-based algorithm fed with LOD-based features was able to overcome several baselines, as collaborative filtering and matrix factorization

    Healthcare data mining from multi-source data

    Get PDF
    The "big data" challenge is changing the way we acquire, store, analyse, and draw conclusions from data. How we effectively and efficiently "mine" the data from possibly multiple sources and extract useful information is a critical question. Increasing research attention has been drawn to healthcare data mining, with an ultimate goal to improve the quality of care. The human body is complex and so too the data collected in treating it. Data noise that is often introduced via the collection process makes building Data Mining models a challenging task. This thesis focuses on the classification tasks of mining healthcare data, with the goal of improving the effectiveness of health risk prediction. In particular, we developed algorithms to address issues identified from real healthcare data, such as feature extraction, heterogeneity, label uncertainty, and large unlabeled data. The three main contributions of this research are as follows. First, we developed a new health index called Personal Health Index (PHI) that scores a person's health status based on the examination records of a given population. Second, we identified the key characteristics of the real datasets and issues that were associated with the data. Third, we developed classification algorithms to cope with those issues, particularly, the label uncertainty and large unlabeled data issues. This research takes one step forward towards scoring personal health based on mining increasingly large health records. Particularly, it pioneers exploring the mining of GHE data and tackles the associated challenges. It is our anticipation that in the near future, more robust data-mining-based health scoring systems will be available for healthcare professionals to understand people's health status and thus improve the quality of care
    corecore