258 research outputs found
The Libra Toolkit for Probabilistic Models
The Libra Toolkit is a collection of algorithms for learning and inference
with discrete probabilistic models, including Bayesian networks, Markov
networks, dependency networks, and sum-product networks. Compared to other
toolkits, Libra places a greater emphasis on learning the structure of
tractable models in which exact inference is efficient. It also includes a
variety of algorithms for learning graphical models in which inference is
potentially intractable, and for performing exact and approximate inference.
Libra is released under a 2-clause BSD license to encourage broad use in
academia and industry
Learning Markov networks with context-specific independences
Learning the Markov network structure from data is a problem that has
received considerable attention in machine learning, and in many other
application fields. This work focuses on a particular approach for this purpose
called independence-based learning. Such approach guarantees the learning of
the correct structure efficiently, whenever data is sufficient for representing
the underlying distribution. However, an important issue of such approach is
that the learned structures are encoded in an undirected graph. The problem
with graphs is that they cannot encode some types of independence relations,
such as the context-specific independences. They are a particular case of
conditional independences that is true only for a certain assignment of its
conditioning set, in contrast to conditional independences that must hold for
all its assignments. In this work we present CSPC, an independence-based
algorithm for learning structures that encode context-specific independences,
and encoding them in a log-linear model, instead of a graph. The central idea
of CSPC is combining the theoretical guarantees provided by the
independence-based approach with the benefits of representing complex
structures by using features in a log-linear model. We present experiments in a
synthetic case, showing that CSPC is more accurate than the state-of-the-art IB
algorithms when the underlying distribution contains CSIs.Comment: 8 pages, 6 figure
ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions
ProSper is a python library containing probabilistic algorithms to learn dictionaries. Given a set of data points, the implemented algorithms seek to learn the elementary components that have generated the data. The library widens the scope of dictionary learning approaches beyond implementations of standard approaches such as ICA, NMF or standard L1 sparse coding. The implemented algorithms are especially well-suited in cases when data consist of components that combine non-linearly and/or for data requiring flexible prior distributions. Furthermore, the implemented algorithms go beyond standard approaches by inferring prior and noise parameters of the data, and they provide rich a-posteriori approximations for inference. The library is designed to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis (MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding (GSC, a recent spike-and-slab sparse coding approach). The algorithms are scalable due to a combination of variational approximations and parallelization. Implementations of all algorithms allow for parallel execution on multiple CPUs and multiple machines for medium to large-scale applications. Typical large-scale runs of the algorithms can use hundreds of CPUs to learn hundreds of dictionary elements from data with tens of millions of floating-point numbers such that models with several hundred thousand parameters can be optimized. The library is designed to have minimal dependencies and to be easy to use. It targets users of dictionary learning algorithms and Machine Learning researchers
ProSper -- A Python Library for Probabilistic Sparse Coding with Non-Standard Priors and Superpositions
ProSper is a python library containing probabilistic algorithms to learn
dictionaries. Given a set of data points, the implemented algorithms seek to
learn the elementary components that have generated the data. The library
widens the scope of dictionary learning approaches beyond implementations of
standard approaches such as ICA, NMF or standard L1 sparse coding. The
implemented algorithms are especially well-suited in cases when data consist of
components that combine non-linearly and/or for data requiring flexible prior
distributions. Furthermore, the implemented algorithms go beyond standard
approaches by inferring prior and noise parameters of the data, and they
provide rich a-posteriori approximations for inference. The library is designed
to be extendable and it currently includes: Binary Sparse Coding (BSC), Ternary
Sparse Coding (TSC), Discrete Sparse Coding (DSC), Maximal Causes Analysis
(MCA), Maximum Magnitude Causes Analysis (MMCA), and Gaussian Sparse Coding
(GSC, a recent spike-and-slab sparse coding approach). The algorithms are
scalable due to a combination of variational approximations and
parallelization. Implementations of all algorithms allow for parallel execution
on multiple CPUs and multiple machines for medium to large-scale applications.
Typical large-scale runs of the algorithms can use hundreds of CPUs to learn
hundreds of dictionary elements from data with tens of millions of
floating-point numbers such that models with several hundred thousand
parameters can be optimized. The library is designed to have minimal
dependencies and to be easy to use. It targets users of dictionary learning
algorithms and Machine Learning researchers
The Grow-Shrink strategy for learning Markov network structures constrained by context-specific independences
Markov networks are models for compactly representing complex probability
distributions. They are composed by a structure and a set of numerical weights.
The structure qualitatively describes independences in the distribution, which
can be exploited to factorize the distribution into a set of compact functions.
A key application for learning structures from data is to automatically
discover knowledge. In practice, structure learning algorithms focused on
"knowledge discovery" present a limitation: they use a coarse-grained
representation of the structure. As a result, this representation cannot
describe context-specific independences. Very recently, an algorithm called
CSPC was designed to overcome this limitation, but it has a high computational
complexity. This work tries to mitigate this downside presenting CSGS, an
algorithm that uses the Grow-Shrink strategy for reducing unnecessary
computations. On an empirical evaluation, the structures learned by CSGS
achieve competitive accuracies and lower computational complexity with respect
to those obtained by CSPC.Comment: 12 pages, and 8 figures. This works was presented in IBERAMIA 201
Public Perceptions of Facebook’s Libra Digital Currency Initiative: Text Mining on Twitter
Large corporations in the financial and technology sectors are increasingly interested in digital currencies, and central bank digital currencies are being actively researched around the globe. In this study, we analyzed the public discourse conducted through the social media platform Twitter concerning Facebook’s Libra digital currency initiative. Text mining of tweets posted during the one-month period around the official announcement of the digital currency project revealed that the majority of the public have a neutral sentiment toward the proposed digital currency. However, those with positive attitudes outnumbered those perceiving the digital currency initiative as negative, and the negative sentiment mainly stemmed from anger and anxiety. Through topical modeling analysis using latent Dirichlet allocation, we identified eight themes in the public discourse related to Facebook Libra. The study provides an early exploratory assessment of factors facilitating and hindering user adoption of one of the most important practical applications of blockchain technology
POISED: Spotting Twitter Spam Off the Beaten Paths
Cybercriminals have found in online social networks a propitious medium to
spread spam and malicious content. Existing techniques for detecting spam
include predicting the trustworthiness of accounts and analyzing the content of
these messages. However, advanced attackers can still successfully evade these
defenses.
Online social networks bring people who have personal connections or share
common interests to form communities. In this paper, we first show that users
within a networked community share some topics of interest. Moreover, content
shared on these social network tend to propagate according to the interests of
people. Dissemination paths may emerge where some communities post similar
messages, based on the interests of those communities. Spam and other malicious
content, on the other hand, follow different spreading patterns.
In this paper, we follow this insight and present POISED, a system that
leverages the differences in propagation between benign and malicious messages
on social networks to identify spam and other unwanted content. We test our
system on a dataset of 1.3M tweets collected from 64K users, and we show that
our approach is effective in detecting malicious messages, reaching 91%
precision and 93% recall. We also show that POISED's detection is more
comprehensive than previous systems, by comparing it to three state-of-the-art
spam detection systems that have been proposed by the research community in the
past. POISED significantly outperforms each of these systems. Moreover, through
simulations, we show how POISED is effective in the early detection of spam
messages and how it is resilient against two well-known adversarial machine
learning attacks
- …