59 research outputs found
An agent-driven semantical identifier using radial basis neural networks and reinforcement learning
Due to the huge availability of documents in digital form, and the deception
possibility raise bound to the essence of digital documents and the way they
are spread, the authorship attribution problem has constantly increased its
relevance. Nowadays, authorship attribution,for both information retrieval and
analysis, has gained great importance in the context of security, trust and
copyright preservation. This work proposes an innovative multi-agent driven
machine learning technique that has been developed for authorship attribution.
By means of a preprocessing for word-grouping and time-period related analysis
of the common lexicon, we determine a bias reference level for the recurrence
frequency of the words within analysed texts, and then train a Radial Basis
Neural Networks (RBPNN)-based classifier to identify the correct author. The
main advantage of the proposed approach lies in the generality of the semantic
analysis, which can be applied to different contexts and lexical domains,
without requiring any modification. Moreover, the proposed system is able to
incorporate an external input, meant to tune the classifier, and then
self-adjust by means of continuous learning reinforcement.Comment: Published on: Proceedings of the XV Workshop "Dagli Oggetti agli
Agenti" (WOA 2014), Catania, Italy, Sepember. 25-26, 201
The Infinite Degree Corrected Stochastic Block Model
In Stochastic blockmodels, which are among the most prominent statistical
models for cluster analysis of complex networks, clusters are defined as groups
of nodes with statistically similar link probabilities within and between
groups. A recent extension by Karrer and Newman incorporates a node degree
correction to model degree heterogeneity within each group. Although this
demonstrably leads to better performance on several networks it is not obvious
whether modelling node degree is always appropriate or necessary. We formulate
the degree corrected stochastic blockmodel as a non-parametric Bayesian model,
incorporating a parameter to control the amount of degree correction which can
then be inferred from data. Additionally, our formulation yields principled
ways of inferring the number of groups as well as predicting missing links in
the network which can be used to quantify the model's predictive performance.
On synthetic data we demonstrate that including the degree correction yields
better performance both on recovering the true group structure and predicting
missing links when degree heterogeneity is present, whereas performance is on
par for data with no degree heterogeneity within clusters. On seven real
networks (with no ground truth group structure available) we show that
predictive performance is about equal whether or not degree correction is
included; however, for some networks significantly fewer clusters are
discovered when correcting for degree indicating that the data can be more
compactly explained by clusters of heterogenous degree nodes.Comment: Originally presented at the Complex Networks workshop NIPS 201
Holographic Embeddings of Knowledge Graphs
Learning embeddings of entities and relations is an efficient and versatile
method to perform machine learning on relational data such as knowledge graphs.
In this work, we propose holographic embeddings (HolE) to learn compositional
vector space representations of entire knowledge graphs. The proposed method is
related to holographic models of associative memory in that it employs circular
correlation to create compositional representations. By using correlation as
the compositional operator HolE can capture rich interactions but
simultaneously remains efficient to compute, easy to train, and scalable to
very large datasets. In extensive experiments we show that holographic
embeddings are able to outperform state-of-the-art methods for link prediction
in knowledge graphs and relational learning benchmark datasets.Comment: To appear in AAAI-1
Bayesian non parametric inference of discrete valued networks
International audienceWe present a non parametric bayesian inference strategy to automatically infer the number of classes during the clustering process of a discrete valued random network. Our methodology is related to the Dirichlet process mixture models and inference is performed using a Blocked Gibbs sampling procedure. Using simulated data, we show that our approach improves over competitive variational inference clustering methods
Hierarchical relational models for document networks
We develop the relational topic model (RTM), a hierarchical model of both
network structure and node attributes. We focus on document networks, where the
attributes of each document are its words, that is, discrete observations taken
from a fixed vocabulary. For each pair of documents, the RTM models their link
as a binary random variable that is conditioned on their contents. The model
can be used to summarize a network of documents, predict links between them,
and predict words within them. We derive efficient inference and estimation
algorithms based on variational methods that take advantage of sparsity and
scale with the number of links. We evaluate the predictive performance of the
RTM for large networks of scientific abstracts, web documents, and
geographically tagged news.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS309 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …