12,891 research outputs found
On Empirical Entropy
We propose a compression-based version of the empirical entropy of a finite
string over a finite alphabet. Whereas previously one considers the naked
entropy of (possibly higher order) Markov processes, we consider the sum of the
description of the random variable involved plus the entropy it induces. We
assume only that the distribution involved is computable. To test the new
notion we compare the Normalized Information Distance (the similarity metric)
with a related measure based on Mutual Information in Shannon's framework. This
way the similarities and differences of the last two concepts are exposed.Comment: 14 pages, LaTe
Feature-based time-series analysis
This work presents an introduction to feature-based time-series analysis. The
time series as a data type is first described, along with an overview of the
interdisciplinary time-series analysis literature. I then summarize the range
of feature-based representations for time series that have been developed to
aid interpretable insights into time-series structure. Particular emphasis is
given to emerging research that facilitates wide comparison of feature-based
representations that allow us to understand the properties of a time-series
dataset that make it suited to a particular feature-based representation or
analysis algorithm. The future of time-series analysis is likely to embrace
approaches that exploit machine learning methods to partially automate human
learning to aid understanding of the complex dynamical patterns in the time
series we measure from the world.Comment: 28 pages, 9 figure
Learning Language from a Large (Unannotated) Corpus
A novel approach to the fully automated, unsupervised extraction of
dependency grammars and associated syntax-to-semantic-relationship mappings
from large text corpora is described. The suggested approach builds on the
authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well
as on a number of prior papers and approaches from the statistical language
learning literature. If successful, this approach would enable the mining of
all the information needed to power a natural language comprehension and
generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa
A Hybrid Approach to Privacy-Preserving Federated Learning
Federated learning facilitates the collaborative training of models without
the sharing of raw data. However, recent attacks demonstrate that simply
maintaining data locality during training processes does not provide sufficient
privacy guarantees. Rather, we need a federated learning system capable of
preventing inference over both the messages exchanged during training and the
final trained model while ensuring the resulting model also has acceptable
predictive accuracy. Existing federated learning approaches either use secure
multiparty computation (SMC) which is vulnerable to inference or differential
privacy which can lead to low accuracy given a large number of parties with
relatively small amounts of data each. In this paper, we present an alternative
approach that utilizes both differential privacy and SMC to balance these
trade-offs. Combining differential privacy with secure multiparty computation
enables us to reduce the growth of noise injection as the number of parties
increases without sacrificing privacy while maintaining a pre-defined rate of
trust. Our system is therefore a scalable approach that protects against
inference threats and produces models with high accuracy. Additionally, our
system can be used to train a variety of machine learning models, which we
validate with experimental results on 3 different machine learning algorithms.
Our experiments demonstrate that our approach out-performs state of the art
solutions
Determination the different categories of buyers based on the Jaynes’ information principle
Purpose: The article aims to reduce the volume of statistical data, necessary for determination the buyer’s structure. The correct clustering of clients is important for successful activity for both commercial and non-profit organizations. This issue is devoted to a large number of studies. Their main mathematical apparatus is statistical methods. Input data are results of buyer polls. Polls are labor-consuming and quite often annoying buyers. The problem of determination of structure (various categories) of buyers by the mathematical methods demanding a small amount of these polls is relevant. Design/Methodology/Approach: The approach offered in this report based on the Jaynes' information principle (principle of maximum entropy). Jaynes idea is as follows. Let us consider a system in which the conditions cannot be calculated or measured by an experiment. However, each state of the system has a certain measured implication, the average value of which is known (or can be defined), and the average result of these implications is known from the statistical data. Then the most objective are probabilities of states maximizing Shannon’s entropy under restrictions imposed by information about average implications of states. Findings: In this work the task of determination of percentage of buyers for computer shop by the average check is set and solved provided that average checks for each concrete category of buyers are known. Input data for calculation are their average checks. Determination of these values requires much less statistical data, than to directly determine relative number of buyers of various categories. Practical Implications: The results are of particular interest to marketing experts. Originality/Value: The article deals with practical situation when initially there are only three different groups of customers. For this case, the problem of maximizing entropy under given constraints reduced to the problem of finding a solution to a system of three equations, of which only one is nonlinear. This is a completely new result.peer-reviewe
- …