49 research outputs found

    DeepWalk: Online Learning of Social Representations

    Full text link
    We present DeepWalk, a novel approach for learning latent representations of vertices in a network. These latent representations encode social relations in a continuous vector space, which is easily exploited by statistical models. DeepWalk generalizes recent advancements in language modeling and unsupervised feature learning (or deep learning) from sequences of words to graphs. DeepWalk uses local information obtained from truncated random walks to learn latent representations by treating walks as the equivalent of sentences. We demonstrate DeepWalk's latent representations on several multi-label network classification tasks for social networks such as BlogCatalog, Flickr, and YouTube. Our results show that DeepWalk outperforms challenging baselines which are allowed a global view of the network, especially in the presence of missing information. DeepWalk's representations can provide F1F_1 scores up to 10% higher than competing methods when labeled data is sparse. In some experiments, DeepWalk's representations are able to outperform all baseline methods while using 60% less training data. DeepWalk is also scalable. It is an online learning algorithm which builds useful incremental results, and is trivially parallelizable. These qualities make it suitable for a broad class of real world applications such as network classification, and anomaly detection.Comment: 10 pages, 5 figures, 4 table

    The effect of interfirm financial transactions on the credit risk of small and medium-sized enterprises

    Get PDF
    © 2019 The Authors. Despite the recognized importance of interfirm financial links in determining a company's performance, only a few studies have incorporated proxies for interfirm links in credit risk models, and none of these use real financial transactions. We estimate a credit risk model for small and medium-sized enterprises, augmented with information on observed interfirm financial transactions. We exploit a novel data set on about 60000 companies based in the UK and their financial transactions over the years 2015 and 2016. We develop several network-augmented credit risk models and compare their prediction performance with that of a conventional credit risk model that includes only a set of financial ratios. We find that augmenting a default risk model with information on the transaction network makes a significant contribution to increasing the default prediction power of risk models built specifically for small and medium-sized enterprises. Our results may help bankers and credit scoring agencies to improve the credit scoring of these companies, ultimately reducing their propensity to apply excessive lending restrictions.Engineering and Physical Sciences Research Council (grant EP/L021250/1)

    From Popularity Prediction to Ranking Online News

    Get PDF
    International audienceNews articles are an engaging type of online content that captures the attention of a significant amount of Internet users. They are particularly enjoyed by mobile users and massively spread through online social platforms. As a result, there is an increased interest in discovering the articles that will become popular among users. This objective falls under the broad scope of content popularity prediction and has direct implications in the development of new services for online advertisement and content distribution. In this paper, we address the problem of predicting the popularity of news articles based on user comments. We formulate the prediction task as a ranking problem, where the goal is not to infer the precise attention that a content will receive but to accurately rank articles based on their predicted popularity. Using data obtained from two important news sites in France and Netherlands, we analyze the ranking effectiveness of two prediction models. Our results indicate that popularity prediction methods are adequate solutions for this ranking task and could be considered as a valuable alternative for automatic online news ranking

    Moderated Class membership Interchange in Iterative Multi relational Graph Classifier

    No full text
    Organizing information resources into classes helps significantly in searching in massive volumes of on line documents available through the Web or other information sources such as electronic mail, digital libraries, corporate databases. Existing classification methods are often based only on own content of document, i.e. its attributes. Considering relations in the web document space brings better results. We adopt multi relational classification that interconnects attribute based classifiers with iterative optimization based on relational heterogeneous graph structures, while different types of instances and various relation types can be classified together. We establish moderated class membership spreading mechanism in multi relational graphs and compare the impact of various levels of regulation in collective inference classifier. The experiments based on large scale graphs originated in MAPEKUS research project data set (web portals of scientific libraries) demonstrate that moderated class membership spreading significantly increases accuracy of the relational classifier (up to 10%) and protects instances with heterophilic neighborhood to be misclassified

    Classification with Pedigree and its Applicability to Record Linkage

    No full text
    Real-world data is virtually never noise-free. Current methods for handling noise do so either by removing noisy instances or by trying to clean noisy attributes. Neither of these deal directly with the issue of noise and in fact removing a noisy instance is not a viable option in many real systems. In this paper, we consider the problem of noise in the context of record linkage, a frequent problem in text mining. We present a new method for dealing with data sources that have noisy attributes which reflect the pedigree of that source. Our method, which assumes that training data is clean and that noise is only present in the test set, is an extension of decision trees which directly handles noise at classification time by changing how it walks through the tree at the various nodes, similar to how current trees handle missing values. We test the efficacy of our method on the IMDb movie database where we classify whether pairs of records refer to the same person. Our results clearly show that we dramatically improve performance by handling pedigree directly at classification time.
    corecore