Search CORE

12,891 research outputs found

On Empirical Entropy

Author: Vitányi Paul M. B.
Publication venue
Publication date: 30/03/2011
Field of study

We propose a compression-based version of the empirical entropy of a finite string over a finite alphabet. Whereas previously one considers the naked entropy of (possibly higher order) Markov processes, we consider the sum of the description of the random variable involved plus the entropy it induces. We assume only that the distribution involved is computable. To test the new notion we compare the Normalized Information Distance (the similarity metric) with a related measure based on Mutual Information in Shannon's framework. This way the similarities and differences of the last two concepts are exposed.Comment: 14 pages, LaTe

arXiv.org e-Print Archive

CWI's Institutional Repository

Feature-based time-series analysis

Author: Fulcher Ben D.
Publication venue
Publication date: 01/10/2017
Field of study

This work presents an introduction to feature-based time-series analysis. The time series as a data type is first described, along with an overview of the interdisciplinary time-series analysis literature. I then summarize the range of feature-based representations for time series that have been developed to aid interpretable insights into time-series structure. Particular emphasis is given to emerging research that facilitates wide comparison of feature-based representations that allow us to understand the properties of a time-series dataset that make it suited to a particular feature-based representation or analysis algorithm. The future of time-series analysis is likely to embrace approaches that exploit machine learning methods to partially automate human learning to aid understanding of the complex dynamical patterns in the time series we measure from the world.Comment: 28 pages, 9 figure

arXiv.org e-Print Archive

Crossref

Learning Language from a Large (Unannotated) Corpus

Author: Goertzel Ben
Vepstas Linas
Publication venue
Publication date: 14/01/2014
Field of study

A novel approach to the fully automated, unsupervised extraction of dependency grammars and associated syntax-to-semantic-relationship mappings from large text corpora is described. The suggested approach builds on the authors' prior work with the Link Grammar, RelEx and OpenCog systems, as well as on a number of prior papers and approaches from the statistical language learning literature. If successful, this approach would enable the mining of all the information needed to power a natural language comprehension and generation system, directly from a large, unannotated corpus.Comment: 29 pages, 5 figures, research proposa

arXiv.org e-Print Archive

CiteSeerX

A Hybrid Approach to Privacy-Preserving Federated Learning

Author: Anwar Ali
Baracaldo Nathalie
Ludwig Heiko
Steinke Thomas
Truex Stacey
Zhang Rui
Zhou Yi
Publication venue
Publication date: 01/01/2019
Field of study

Federated learning facilitates the collaborative training of models without the sharing of raw data. However, recent attacks demonstrate that simply maintaining data locality during training processes does not provide sufficient privacy guarantees. Rather, we need a federated learning system capable of preventing inference over both the messages exchanged during training and the final trained model while ensuring the resulting model also has acceptable predictive accuracy. Existing federated learning approaches either use secure multiparty computation (SMC) which is vulnerable to inference or differential privacy which can lead to low accuracy given a large number of parties with relatively small amounts of data each. In this paper, we present an alternative approach that utilizes both differential privacy and SMC to balance these trade-offs. Combining differential privacy with secure multiparty computation enables us to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while maintaining a pre-defined rate of trust. Our system is therefore a scalable approach that protects against inference threats and produces models with high accuracy. Additionally, our system can be used to train a variety of machine learning models, which we validate with experimental results on 3 different machine learning algorithms. Our experiments demonstrate that our approach out-performs state of the art solutions

arXiv.org e-Print Archive

Crossref

Determination the different categories of buyers based on the Jaynes’ information principle

Author: Maron A.
Maron Maxim A.
Publication venue: Eleftherios Thalassinos
Publication date: 01/01/2019
Field of study

Purpose: The article aims to reduce the volume of statistical data, necessary for determination the buyer’s structure. The correct clustering of clients is important for successful activity for both commercial and non-profit organizations. This issue is devoted to a large number of studies. Their main mathematical apparatus is statistical methods. Input data are results of buyer polls. Polls are labor-consuming and quite often annoying buyers. The problem of determination of structure (various categories) of buyers by the mathematical methods demanding a small amount of these polls is relevant. Design/Methodology/Approach: The approach offered in this report based on the Jaynes' information principle (principle of maximum entropy). Jaynes idea is as follows. Let us consider a system in which the conditions cannot be calculated or measured by an experiment. However, each state of the system has a certain measured implication, the average value of which is known (or can be defined), and the average result of these implications is known from the statistical data. Then the most objective are probabilities of states maximizing Shannon’s entropy under restrictions imposed by information about average implications of states. Findings: In this work the task of determination of percentage of buyers for computer shop by the average check is set and solved provided that average checks for each concrete category of buyers are known. Input data for calculation are their average checks. Determination of these values requires much less statistical data, than to directly determine relative number of buyers of various categories. Practical Implications: The results are of particular interest to marketing experts. Originality/Value: The article deals with practical situation when initially there are only three different groups of customers. For this case, the problem of maximizing entropy under given constraints reduced to the problem of finding a solution to a system of three equations, of which only one is nonlinear. This is a completely new result.peer-reviewe

OAR@UM