Search CORE

10 research outputs found

Constructing practical Fuzzy Extractors using QIM

Author: Buhan Ileana
Doumen Jeroen
Hartel Pieter
Veldhuis Raymond
Publication venue: University of Twente, CTIT
Publication date: 01/01/2007
Field of study

Fuzzy extractors are a powerful tool to extract randomness from noisy data. A fuzzy extractor can extract randomness only if the source data is discrete while in practice source data is continuous. Using quantizers to transform continuous data into discrete data is a commonly used solution. However, as far as we know no study has been made of the effect of the quantization strategy on the performance of fuzzy extractors. We construct the encoding and the decoding function of a fuzzy extractor using quantization index modulation (QIM) and we express properties of this fuzzy extractor in terms of parameters of the used QIM. We present and analyze an optimal (in the sense of embedding rate) two dimensional construction. Our 6-hexagonal tiling construction offers ( log2 6 / 2-1) approx. 3 extra bits per dimension of the space compared to the known square quantization based fuzzy extractor

CiteSeerX

University of Twente Research Information

Stein's method and Poisson process approximation for a class of Wasserstein metrics

Author: Schuhmacher Dominic
Publication venue: 'Bernoulli Society for Mathematical Statistics and Probability'
Publication date: 01/01/2009
Field of study

Based on Stein's method, we derive upper bounds for Poisson process approximation in the

L_1

-Wasserstein metric

d_2^{(p)}

, which is based on a slightly adapted

L_p

-Wasserstein metric between point measures. For the case

p=1

, this construction yields the metric

d_2

introduced in [Barbour and Brown Stochastic Process. Appl. 43 (1992) 9--31], for which Poisson process approximation is well studied in the literature. We demonstrate the usefulness of the extension to general

p

by showing that

d_2^{(p)}

-bounds control differences between expectations of certain

p

th order average statistics of point processes. To illustrate the bounds obtained for Poisson process approximation, we consider the structure of 2-runs and the hard core model as concrete examples.Comment: Published in at http://dx.doi.org/10.3150/08-BEJ161 the Bernoulli (http://isi.cbs.nl/bernoulli/) by the International Statistical Institute/Bernoulli Society (http://isi.cbs.nl/BS/bshome.htm

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bern Open Repository and Information System (BORIS)

Information theoretic bounds for Compressed Sensing

Author: Aeron Shuchin
Saligrama Venkatesh
Zhao Manqi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2010
Field of study

In this paper we derive information theoretic performance bounds to sensing and reconstruction of sparse phenomena from noisy projections. We consider two settings: output noise models where the noise enters after the projection and input noise models where the noise enters before the projection. We consider two types of distortion for reconstruction: support errors and mean-squared errors. Our goal is to relate the number of measurements,

m

, and \snr, to signal sparsity,

k

, distortion level,

d

, and signal dimension,

n

. We consider support errors in a worst-case setting. We employ different variations of Fano's inequality to derive necessary conditions on the number of measurements and \snr required for exact reconstruction. To derive sufficient conditions we develop new insights on max-likelihood analysis based on a novel superposition property. In particular this property implies that small support errors are the dominant error events. Consequently, our ML analysis does not suffer the conservatism of the union bound and leads to a tighter analysis of max-likelihood. These results provide order-wise tight bounds. For output noise models we show that asymptotically an \snr of

\Theta(\log(n))

together with

\Theta(k \log(n/k))

measurements is necessary and sufficient for exact support recovery. Furthermore, if a small fraction of support errors can be tolerated, a constant \snr turns out to be sufficient in the linear sparsity regime. In contrast for input noise models we show that support recovery fails if the number of measurements scales as

o(n\log(n)/SNR)

implying poor compression performance for such cases. We also consider Bayesian set-up and characterize tradeoffs between mean-squared distortion and the number of measurements using rate-distortion theory.Comment: 30 pages, 2 figures, submitted to IEEE Trans. on I

arXiv.org e-Print Archive

CiteSeerX

Crossref

Reprezentacije i metrike za mašinsko učenje i analizu podataka velikih dimenzija

Author: Radovanović Miloš
Publication venue: Универзитет у Новом Саду, Природно-математички факултет
Publication date: 11/02/2011
Field of study

In the current information age, massive amounts of data are gathered, at a rate prohibiting their effective structuring, analysis, and conversion into useful knowledge. This information overload is manifested both in large numbers of data objects recorded in data sets, and large numbers of attributes, also known as high dimensionality. This dis-sertation deals with problems originating from high dimensionality of data representation, referred to as the “curse of dimensionality,” in the context of machine learning, data mining, and information retrieval. The described research follows two angles: studying the behavior of (dis)similarity metrics with increasing dimensionality, and exploring feature-selection methods, primarily with regard to document representation schemes for text classification. The main results of the dissertation, relevant to the first research angle, include theoretical insights into the concentration behavior of cosine similarity, and a detailed analysis of the phenomenon of hubness, which refers to the tendency of some points in a data set to become hubs by being in-cluded in unexpectedly many k-nearest neighbor lists of other points. The mechanisms behind the phenomenon are studied in detail, both from a theoretical and empirical perspective, linking hubness with the (intrinsic) dimensionality of data, describing its interaction with the cluster structure of data and the information provided by class la-bels, and demonstrating the interplay of the phenomenon and well known algorithms for classification, semi-supervised learning, clustering, and outlier detection, with special consideration being given to time-series classification and information retrieval. Results pertaining to the second research angle include quantification of the interaction between various transformations of high-dimensional document representations, and feature selection, in the context of text classification.U tekućem „informatičkom dobu“, masivne količine podataka se sakupljaju brzinom koja ne dozvoljava njihovo efektivno strukturiranje, analizu, i pretvaranje u korisno znanje. Ovo zasićenje informacijama se manifestuje kako kroz veliki broj objekata uključenih u skupove podataka, tako i kroz veliki broj atributa, takođe poznat kao velika dimenzionalnost. Disertacija se bavi problemima koji proizilaze iz velike dimenzionalnosti reprezentacije podataka, često nazivanim „prokletstvom dimenzionalnosti“, u kontekstu mašinskog učenja, data mining-a i information retrieval-a. Opisana istraživanja prate dva pravca: izučavanje ponašanja metrika (ne)sličnosti u odnosu na rastuću dimenzionalnost, i proučavanje metoda odabira atributa, prvenstveno u interakciji sa tehnikama reprezentacije dokumenata za klasifikaciju teksta. Centralni rezultati disertacije, relevantni za prvi pravac istraživanja, uključuju teorijske uvide u fenomen koncentracije kosinusne mere sličnosti, i detaljnu analizu fenomena habovitosti koji se odnosi na tendenciju nekih tačaka u skupu podataka da postanu habovi tako što bivaju uvrštene u neočekivano mnogo lista k najbližih suseda ostalih tačaka. Mehanizmi koji pokreću fenomen detaljno su proučeni, kako iz teorijske tako i iz empirijske perspektive. Habovitost je povezana sa (latentnom) dimenzionalnošću podataka, opisana je njena interakcija sa strukturom klastera u podacima i informacijama koje pružaju oznake klasa, i demonstriran je njen efekat na poznate algoritme za klasifikaciju, semi-supervizirano učenje, klastering i detekciju outlier-a, sa posebnim osvrtom na klasifikaciju vremenskih serija i information retrieval. Rezultati koji se odnose na drugi pravac istraživanja uključuju kvantifikaciju interakcije između različitih transformacija višedimenzionalnih reprezentacija dokumenata i odabira atributa, u kontekstu klasifikacije teksta

National Repository of Dissertations in Serbia (NaRDuS)

Reprezentacije i metrike za mašinsko učenje i analizu podataka velikih dimenzija

Author: Radovanović Miloš
Publication venue: Универзитет у Новом Саду, Природно-математички факултет
Publication date: 11/02/2011
Field of study

National Repository of Dissertations in Serbia (NaRDuS)

Nardus