23,227 research outputs found
Active Discovery of Network Roles for Predicting the Classes of Network Nodes
Nodes in real world networks often have class labels, or underlying
attributes, that are related to the way in which they connect to other nodes.
Sometimes this relationship is simple, for instance nodes of the same class are
may be more likely to be connected. In other cases, however, this is not true,
and the way that nodes link in a network exhibits a different, more complex
relationship to their attributes. Here, we consider networks in which we know
how the nodes are connected, but we do not know the class labels of the nodes
or how class labels relate to the network links. We wish to identify the best
subset of nodes to label in order to learn this relationship between node
attributes and network links. We can then use this discovered relationship to
accurately predict the class labels of the rest of the network nodes.
We present a model that identifies groups of nodes with similar link
patterns, which we call network roles, using a generative blockmodel. The model
then predicts labels by learning the mapping from network roles to class labels
using a maximum margin classifier. We choose a subset of nodes to label
according to an iterative margin-based active learning strategy. By integrating
the discovery of network roles with the classifier optimisation, the active
learning process can adapt the network roles to better represent the network
for node classification. We demonstrate the model by exploring a selection of
real world networks, including a marine food web and a network of English
words. We show that, in contrast to other network classifiers, this model
achieves good classification accuracy for a range of networks with different
relationships between class labels and network links
Supervised Blockmodelling
Collective classification models attempt to improve classification
performance by taking into account the class labels of related instances.
However, they tend not to learn patterns of interactions between classes and/or
make the assumption that instances of the same class link to each other
(assortativity assumption). Blockmodels provide a solution to these issues,
being capable of modelling assortative and disassortative interactions, and
learning the pattern of interactions in the form of a summary network. The
Supervised Blockmodel provides good classification performance using link
structure alone, whilst simultaneously providing an interpretable summary of
network interactions to allow a better understanding of the data. This work
explores three variants of supervised blockmodels of varying complexity and
tests them on four structurally different real world networks.Comment: Workshop on Collective Learning and Inference on Structured Data 201
Customer churn prediction in telecom using machine learning and social network analysis in big data platform
Customer churn is a major problem and one of the most important concerns for
large companies. Due to the direct effect on the revenues of the companies,
especially in the telecom field, companies are seeking to develop means to
predict potential customer to churn. Therefore, finding factors that increase
customer churn is important to take necessary actions to reduce this churn. The
main contribution of our work is to develop a churn prediction model which
assists telecom operators to predict customers who are most likely subject to
churn. The model developed in this work uses machine learning techniques on big
data platform and builds a new way of features' engineering and selection. In
order to measure the performance of the model, the Area Under Curve (AUC)
standard measure is adopted, and the AUC value obtained is 93.3%. Another main
contribution is to use customer social network in the prediction model by
extracting Social Network Analysis (SNA) features. The use of SNA enhanced the
performance of the model from 84 to 93.3% against AUC standard. The model was
prepared and tested through Spark environment by working on a large dataset
created by transforming big raw data provided by SyriaTel telecom company. The
dataset contained all customers' information over 9 months, and was used to
train, test, and evaluate the system at SyriaTel. The model experimented four
algorithms: Decision Tree, Random Forest, Gradient Boosted Machine Tree "GBM"
and Extreme Gradient Boosting "XGBOOST". However, the best results were
obtained by applying XGBOOST algorithm. This algorithm was used for
classification in this churn predictive model.Comment: 24 pages, 14 figures. PDF https://rdcu.be/budK
A multilayer network approach for guiding drug repositioning in neglected diseases
Drug development for neglected diseases has been historically hampered due to lack of market incentives. The advent of public domain resources containing chemical information from high throughput screenings is changing the landscape of drug discovery for these diseases. In this work we took advantage of data from extensively studied organisms like human, mouse, E. coli and yeast, among others, to develop a novel integrative network model to prioritize and identify candidate drug targets in neglected pathogen proteomes, and bioactive drug-like molecules. We modeled genomic (proteins) and chemical (bioactive compounds) data as a multilayer weighted network graph that takes advantage of bioactivity data across 221 species, chemical similarities between 1.7 105 compounds and several functional relations among 1.67 105 proteins. These relations comprised orthology, sharing of protein domains, and shared participation in defined biochemical pathways. We showcase the application of this network graph to the problem of prioritization of new candidate targets, based on the information available in the graph for known compound-target associations. We validated this strategy by performing a cross validation procedure for known mouse and Trypanosoma cruzi targets and showed that our approach outperforms classic alignment-based approaches. Moreover, our model provides additional flexibility as two different network definitions could be considered, finding in both cases qualitatively different but sensible candidate targets. We also showcase the application of the network to suggest targets for orphan compounds that are active against Plasmodium falciparum in high-throughput screens. In this case our approach provided a reduced prioritization list of target proteins for the query molecules and showed the ability to propose new testable hypotheses for each compound. Moreover, we found that some predictions highlighted by our network model were supported by independent experimental validations as found post-facto in the literature.Fil: Berenstein, Ariel José. Fundación Instituto Leloir; Argentina. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Física; ArgentinaFil: Magariños, María Paula. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); ArgentinaFil: Chernomoretz, Ariel. Fundación Instituto Leloir; Argentina. Universidad de Buenos Aires. Facultad de Ingeniería. Departamento de Física; ArgentinaFil: Fernandez Aguero, Maria Jose. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - La Plata. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús). Universidad Nacional de San Martín. Instituto de Investigaciones Biotecnológicas. Instituto de Investigaciones Biotecnológicas "Dr. Raúl Alfonsín" (sede Chascomús); Argentin
Predicting Network Attacks Using Ontology-Driven Inference
Graph knowledge models and ontologies are very powerful modeling and re
asoning tools. We propose an effective approach to model network attacks and
attack prediction which plays important roles in security management. The goals
of this study are: First we model network attacks, their prerequisites and
consequences using knowledge representation methods in order to provide
description logic reasoning and inference over attack domain concepts. And
secondly, we propose an ontology-based system which predicts potential attacks
using inference and observing information which provided by sensory inputs. We
generate our ontology and evaluate corresponding methods using CAPEC, CWE, and
CVE hierarchical datasets. Results from experiments show significant capability
improvements comparing to traditional hierarchical and relational models.
Proposed method also reduces false alarms and improves intrusion detection
effectiveness.Comment: 9 page
Discovering Functional Communities in Dynamical Networks
Many networks are important because they are substrates for dynamical
systems, and their pattern of functional connectivity can itself be dynamic --
they can functionally reorganize, even if their underlying anatomical structure
remains fixed. However, the recent rapid progress in discovering the community
structure of networks has overwhelmingly focused on that constant anatomical
connectivity. In this paper, we lay out the problem of discovering_functional
communities_, and describe an approach to doing so. This method combines recent
work on measuring information sharing across stochastic networks with an
existing and successful community-discovery algorithm for weighted networks. We
illustrate it with an application to a large biophysical model of the
transition from beta to gamma rhythms in the hippocampus.Comment: 18 pages, 4 figures, Springer "Lecture Notes in Computer Science"
style. Forthcoming in the proceedings of the workshop "Statistical Network
Analysis: Models, Issues and New Directions", at ICML 2006. Version 2: small
clarifications, typo corrections, added referenc
Transforming Graph Representations for Statistical Relational Learning
Relational data representations have become an increasingly important topic
due to the recent proliferation of network datasets (e.g., social, biological,
information networks) and a corresponding increase in the application of
statistical relational learning (SRL) algorithms to these domains. In this
article, we examine a range of representation issues for graph-based relational
data. Since the choice of relational data representation for the nodes, links,
and features can dramatically affect the capabilities of SRL algorithms, we
survey approaches and opportunities for relational representation
transformation designed to improve the performance of these algorithms. This
leads us to introduce an intuitive taxonomy for data representation
transformations in relational domains that incorporates link transformation and
node transformation as symmetric representation tasks. In particular, the
transformation tasks for both nodes and links include (i) predicting their
existence, (ii) predicting their label or type, (iii) estimating their weight
or importance, and (iv) systematically constructing their relevant features. We
motivate our taxonomy through detailed examples and use it to survey and
compare competing approaches for each of these tasks. We also discuss general
conditions for transforming links, nodes, and features. Finally, we highlight
challenges that remain to be addressed
- …