67 research outputs found

    DeepWalk: Online Learning of Social Representations

    Full text link
    We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1F_1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table

    The effect of interfirm financial transactions on the credit risk of small and medium-sized enterprises

    Get PDF
    Ā© 2019 The Authors. Despite the recognized importance of interfirm financial links in determining a company's performance, only a few studies have incorporated proxies for interfirm links in credit risk models, and none of these use real financial transactions. We estimate a credit risk model for small and medium-sized enterprises, augmented with information on observed interfirm financial transactions. We exploit a novel data set on about 60000 companies based in the UK and their financial transactions over the years 2015 and 2016. We develop several network-augmented credit risk models and compare their prediction performance with that of a conventional credit risk model that includes only a set of financial ratios. We find that augmenting a default risk model with information on the transaction network makes a significant contribution to increasing the default prediction power of risk models built specifically for small and medium-sized enterprises. Our results may help bankers and credit scoring agencies to improve the credit scoring of these companies, ultimately reducing their propensity to apply excessive lending restrictions.Engineering and Physical Sciences Research Council (grant EP/L021250/1)

    From Popularity Prediction to Ranking Online News

    Get PDF
    International audienceNews articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking

    Lifted graphical models: a survey

    Get PDF
    Lifted graphical models provide a language for expressing dependencies between different types of entities, their attributes, and their diverse relations, as well as techniques for probabilistic reasoning in such multi-relational domains. In this survey, we review a general form for a lifted graphical model, a par-factor graph, and show how a number of existing statistical relational representations map to this formalism. We discuss inference algorithms, including lifted inference algorithms, that efficiently compute the answers to probabilistic queries over such models. We also review work in learning lifted graphical models from data. There is a growing need for statistical relational models (whether they go by that name or another), as we are inundated with data which is a mix of structured and unstructured, with entities and relations extracted in a noisy manner from text, and with the need to reason effectively with this data. We hope that this synthesis of ideas from many different research groups will provide an accessible starting point for new researchers in this expanding field

    Beyond tissueInfo: functional prediction using tissue expression profile similarity searches

    Get PDF
    We present and validate tissue expression profile similarity searches (TEPSS), a computational approach to identify transcripts that share similar tissue expression profiles to one or more transcripts in a group of interest. We evaluated TEPSS for its ability to discriminate between pairs of transcripts coding for interacting proteins and non-interacting pairs. We found that ordering proteinā€“protein pairs by TEPSS score produces sets significantly enriched in reported pairs of interacting proteins [interacting versus non-interacting pairs, Odds-ratio (OR) = 157.57, 95% confidence interval (CI) (36.81ā€“375.51) at 1% coverage, employing a large dataset of about 50 000 human protein interactions]. When used with multiple transcripts as input, we find that TEPSS can predict non-obvious members of the cytosolic ribosome. We used TEPSS to predict S-nitrosylation (SNO) protein targets from a set of brain proteins that undergo SNO upon exposure to physiological levels of S-nitrosoglutathione in vitro. While some of the top TEPSS predictions have been validated independently, several of the strongest SNO TEPSS predictions await experimental validation. Our data indicate that TEPSS is an effective and flexible approach to functional prediction. Since the approach does not use sequence similarity, we expect that TEPSS will be useful for various gene discovery applications. TEPSS programs and data are distributed at http://icb.med.cornell.edu/crt/tepss/index.xml

    Moderated Class membership Interchange in Iterative Multi relational Graph Classifier

    No full text
    Organizing information resources into classes helps significantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classification methods are often based only on own content of document, i.e. its attributes. Considering relations in the web document space brings better results. We adopt multi relational classification that interconnects attribute based classifiers with iterative optimization based on relational heterogeneous graph structures, while different types of instances and various relation types can be classified together. We establish moderated class membership spreading mechanism in multi relational graphs and compare the impact of various levels of regulation in collective inference classifier. The experiments based on large scale graphs originated in MAPEKUS research project data set (web portals of scientific libraries) demonstrate that moderated class membership spreading significantly increases accuracy of the relational classifier (up to 10%) and protects instances with heterophilic neighborhood to be misclassified
    • ā€¦
    corecore