Search CORE

117,463 research outputs found

An introduction and survey of estimation of distribution algorithms

A Survey and Taxonomy of Graph Sampling

Author: Hu Pili
Lau Wing Cheong
Publication venue
Publication date: 23/08/2013
Field of study

Graph sampling is a technique to pick a subset of vertices and/ or edges from original graph. It has a wide spectrum of applications, e.g. survey hidden population in sociology [54], visualize social graph [29], scale down Internet AS graph [27], graph sparsification [8], etc. In some scenarios, the whole graph is known and the purpose of sampling is to obtain a smaller graph. In other scenarios, the graph is unknown and sampling is regarded as a way to explore the graph. Commonly used techniques are Vertex Sampling, Edge Sampling and Traversal Based Sampling. We provide a taxonomy of different graph sampling objectives and graph sampling approaches. The relations between these approaches are formally argued and a general framework to bridge theoretical analysis and practical implementation is provided. Although being smaller in size, sampled graphs may be similar to original graphs in some way. We are particularly interested in what graph properties are preserved given a sampling procedure. If some properties are preserved, we can estimate them on the sampled graphs, which gives a way to construct efficient estimators. If one algorithm relies on the perserved properties, we can expect that it gives similar output on original and sampled graphs. This leads to a systematic way to accelerate a class of graph algorithms. In this survey, we discuss both classical text-book type properties and some advanced properties. The landscape is tabularized and we see a lot of missing works in this field. Some theoretical studies are collected in this survey and simple extensions are made. Most previous numerical evaluation works come in an ad hoc fashion, i.e. evaluate different type of graphs, different set of properties, and different sampling algorithms. A systematical and neutral evaluation is needed to shed light on further graph sampling studies

arXiv.org e-Print Archive

A traffic classification method using machine learning algorithm

Author: Chishti Hamayoun Rauf
Publication venue: University of Bedfordshire
Publication date: 01/01/2013
Field of study

Applying concepts of attack investigation in IT industry, this idea has been developed to design a Traffic Classification Method using Data Mining techniques at the intersection of Machine Learning Algorithm, Which will classify the normal and malicious traffic. This classification will help to learn about the unknown attacks faced by IT industry. The notion of traffic classification is not a new concept; plenty of work has been done to classify the network traffic for heterogeneous application nowadays. Existing techniques such as (payload based, port based and statistical based) have their own pros and cons which will be discussed in this literature later, but classification using Machine Learning techniques is still an open field to explore and has provided very promising results up till now

University of Bedfordshire Repository

Generalized Approximate Survey Propagation for High-Dimensional Estimation

Author: Lu Yue M.
Lucibello Carlo
Saglietti Luca
Publication venue: 'IOP Publishing'
Publication date: 13/05/2019
Field of study

In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal that is observed through a linear transform followed by a component-wise, possibly nonlinear and noisy, channel. In the Bayesian optimal setting, Generalized Approximate Message Passing (GAMP) is known to achieve optimal performance for GLE. However, its performance can significantly degrade whenever there is a mismatch between the assumed and the true generative model, a situation frequently encountered in practice. In this paper, we propose a new algorithm, named Generalized Approximate Survey Propagation (GASP), for solving GLE in the presence of prior or model mis-specifications. As a prototypical example, we consider the phase retrieval problem, where we show that GASP outperforms the corresponding GAMP, reducing the reconstruction threshold and, for certain choices of its parameters, approaching Bayesian optimal performance. Furthermore, we present a set of State Evolution equations that exactly characterize the dynamics of GASP in the high-dimensional limit

arXiv.org e-Print Archive

Archivio istituzionale della Ricerca - Bocconi

Estimating Discrete Markov Models From Various Incomplete Data Schemes

Author: Alberto Pasanisi
Allman
Andrieu
Andrieu
Andrieu
Ang
Atchadé
Bartholomew
Basile
Brooks
Bukowski
Carter
Cochran
Cole
Craiu
Cronvall
Deltout
Duckstein
Dupuis
Dupuis
El Ghaoui
El-Nashar
Früwirth-Schnatter
Fuertes
Fuh
Gelman
Genest
Genest
Gentleman
Gilks
Girard
Gouno
Grimshaw
Grinstead
Haario
Huard
Ischwaran
Kalbfleisch
Kim
Lassen
Lawless
Lee
Little
Little
MacRae
Marin
Marshall
Mira
Nelsen
Nicolas Bousquet
Nikoloulopoulos
Nott
Parent
Pasanisi
Pollard
Puolamäki
Robert
Robert
Roberts
Roberts
Roberts
Roh
Rosenthal
Rubin
Sargent
Sendi
Shuai Fu
Tuyl
Urakabe
Vihola
Vogel
Wu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2012
Field of study

The parameters of a discrete stationary Markov model are transition probabilities between states. Traditionally, data consist in sequences of observed states for a given number of individuals over the whole observation period. In such a case, the estimation of transition probabilities is straightforwardly made by counting one-step moves from a given state to another. In many real-life problems, however, the inference is much more difficult as state sequences are not fully observed, namely the state of each individual is known only for some given values of the time variable. A review of the problem is given, focusing on Monte Carlo Markov Chain (MCMC) algorithms to perform Bayesian inference and evaluate posterior distributions of the transition probabilities in this missing-data framework. Leaning on the dependence between the rows of the transition matrix, an adaptive MCMC mechanism accelerating the classical Metropolis-Hastings algorithm is then proposed and empirically studied.Comment: 26 pages - preprint accepted in 20th February 2012 for publication in Computational Statistics and Data Analysis (please cite the journal's paper

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

Simulation optimization: A review of algorithms and applications

Author: Amaran Satyajith
Bury Scott J.
Sahinidis Nikolaos V.
Sharda Bikram
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 26/06/2017
Field of study

Simulation Optimization (SO) refers to the optimization of an objective function subject to constraints, both of which can be evaluated through a stochastic simulation. To address specific features of a particular simulation---discrete or continuous decisions, expensive or cheap simulations, single or multiple outputs, homogeneous or heterogeneous noise---various algorithms have been proposed in the literature. As one can imagine, there exist several competing algorithms for each of these classes of problems. This document emphasizes the difficulties in simulation optimization as compared to mathematical programming, makes reference to state-of-the-art algorithms in the field, examines and contrasts the different approaches used, reviews some of the diverse applications that have been tackled by these methods, and speculates on future directions in the field

arXiv.org e-Print Archive

A survey of discrete methods in (algebraic) statistics for networks

Author: Petrović Sonja
Publication venue
Publication date: 08/01/2016
Field of study

Sampling algorithms, hypergraph degree sequences, and polytopes play a crucial role in statistical analysis of network data. This article offers a brief overview of open problems in this area of discrete mathematics from the point of view of a particular family of statistical models for networks called exponential random graph models. The problems and underlying constructions are also related to well-known concepts in commutative algebra and graph-theoretic concepts in computer science. We outline a few lines of recent work that highlight the natural connection between these fields and unify them into some open problems. While these problems are often relevant in discrete mathematics in their own right, the emphasis here is on statistical relevance with the hope that these lines of research do not remain disjoint. Suggested specific open problems and general research questions should advance algebraic statistics theory as well as applied statistical tools for rigorous statistical analysis of networks.Comment: Revised for clarity, minor updates, added example, upon suggestions of people mentioned in the acknowledgements sectio

arXiv.org e-Print Archive

Transfer Learning, Soft Distance-Based Bias, and the Hierarchical BOA

Author: Hauschild Mark W.
Lanzi Pier Luca
Pelikan Martin
Publication venue
Publication date: 21/06/2012
Field of study

An automated technique has recently been proposed to transfer learning in the hierarchical Bayesian optimization algorithm (hBOA) based on distance-based statistics. The technique enables practitioners to improve hBOA efficiency by collecting statistics from probabilistic models obtained in previous hBOA runs and using the obtained statistics to bias future hBOA runs on similar problems. The purpose of this paper is threefold: (1) test the technique on several classes of NP-complete problems, including MAXSAT, spin glasses and minimum vertex cover; (2) demonstrate that the technique is effective even when previous runs were done on problems of different size; (3) provide empirical evidence that combining transfer learning with other efficiency enhancement techniques can often yield nearly multiplicative speedups.Comment: Accepted at Parallel Problem Solving from Nature (PPSN XII), 10 pages. arXiv admin note: substantial text overlap with arXiv:1201.224

arXiv.org e-Print Archive

Mixture Models and Networks -- Overview of Stochastic Blockmodelling

Author: De Nicola Giacomo
Kauermann Göran
Sischka Benjamin
Publication venue
Publication date: 26/05/2020
Field of study

Mixture models are probabilistic models aimed at uncovering and representing latent subgroups within a population. In the realm of network data analysis, the latent subgroups of nodes are typically identified by their connectivity behaviour, with nodes behaving similarly belonging to the same community. In this context, mixture modelling is pursued through stochastic blockmodelling. We consider stochastic blockmodels and some of their variants and extensions from a mixture modelling perspective. We also survey some of the main classes of estimation methods available, and propose an alternative approach. In addition to the discussion of inferential properties and estimating procedures, we focus on the application of the models to several real-world network datasets, showcasing the advantages and pitfalls of different approaches.Comment: 23 pages, 5 figure

arXiv.org e-Print Archive

Thompson Sampling for Dynamic Pricing

Author: Ganti Ravi
Seaman Brian
Sustik Matyas
Tran Quoc
Publication venue
Publication date: 08/02/2018
Field of study

In this paper we apply active learning algorithms for dynamic pricing in a prominent e-commerce website. Dynamic pricing involves changing the price of items on a regular basis, and uses the feedback from the pricing decisions to update prices of the items. Most popular approaches to dynamic pricing use a passive learning approach, where the algorithm uses historical data to learn various parameters of the pricing problem, and uses the updated parameters to generate a new set of prices. We show that one can use active learning algorithms such as Thompson sampling to more efficiently learn the underlying parameters in a pricing problem. We apply our algorithms to a real e-commerce system and show that the algorithms indeed improve revenue compared to pricing algorithms that use passive learning

arXiv.org e-Print Archive