543 research outputs found
Graph-based techniques for compression and reconstruction of sparse sources
The main goal of this thesis is to develop lossless compression schemes for analog and binary sources. All the considered compression schemes have as common feature that the encoder can be represented by a graph, so they can be studied employing tools from modern coding theory.
In particular, this thesis is focused on two compression problems: the group testing and the noiseless compressed sensing problems. Although both problems may seem unrelated, in the thesis they are shown to be very close. Furthermore, group testing has the same mathematical formulation as non-linear binary source compression schemes that use the OR operator. In this thesis, the similarities between these problems are exploited.
The group testing problem is aimed at identifying the defective subjects of a population with as few tests as possible. Group testing schemes can be divided into two groups: adaptive and non-adaptive group testing schemes. The former schemes generate tests sequentially and exploit the partial decoding results to attempt to reduce the overall number of tests required to label all members of the population, whereas non-adaptive schemes perform all the test in parallel and attempt to label as many subjects as possible.
Our contributions to the group testing problem are both theoretical and practical. We propose a novel adaptive scheme aimed to efficiently perform the testing process. Furthermore, we develop tools to predict the performance of both adaptive and non-adaptive schemes when the number of subjects to be tested is large. These tools allow to characterize the performance of adaptive and non-adaptive group testing schemes without simulating them.
The goal of the noiseless compressed sensing problem is to retrieve a signal from its lineal projection version in a lower-dimensional space. This can be done only whenever the amount of null components of the original signal is large enough. Compressed sensing deals with the design of sampling schemes and reconstruction algorithms that manage to reconstruct the original signal vector with as few samples as possible.
In this thesis we pose the compressed sensing problem within a probabilistic framework, as opposed to the classical compression sensing formulation. Recent results in the state of the art show that this approach is more efficient than the classical one.
Our contributions to noiseless compressed sensing are both theoretical and practical. We deduce a necessary and sufficient matrix design condition to guarantee that the reconstruction is lossless. Regarding the design of practical schemes, we propose two novel reconstruction algorithms based on message passing over the sparse representation of the matrix, one of them with very low computational complexity.El objetivo principal de la tesis es el desarrollo de esquemas de compresión sin pérdidas para fuentes analógicas y binarias. Los esquemas analizados tienen en común la representación del compresor mediante un grafo; esto ha permitido emplear en su estudio las herramientas de codificación modernas. Más concretamente la tesis estudia dos problemas de compresión en particular: el diseño de experimentos de testeo comprimido de poblaciones (de sangre, de presencia de elementos contaminantes, secuenciado de ADN, etcétera) y el muestreo comprimido de señales reales en ausencia de ruido. A pesar de que a primera vista parezcan problemas totalmente diferentes, en la tesis mostramos que están muy relacionados. Adicionalmente, el problema de testeo comprimido de poblaciones tiene una formulación matemática idéntica a los códigos de compresión binarios no lineales basados en puertas OR. En la tesis se explotan las similitudes entre todos estos problemas. Existen dos aproximaciones al testeo de poblaciones: el testeo adaptativo y el no adaptativo. El primero realiza los test de forma secuencial y explota los resultados parciales de estos para intentar reducir el número total de test necesarios, mientras que el segundo hace todos los test en bloque e intenta extraer el máximo de datos posibles de los test. Nuestras contribuciones al problema de testeo comprimido han sido tanto teóricas como prácticas. Hemos propuesto un nuevo esquema adaptativo para realizar eficientemente el proceso de testeo. Además hemos desarrollado herramientas que permiten predecir el comportamiento tanto de los esquemas adaptativos como de los esquemas no adaptativos cuando el número de sujetos a testear es elevado. Estas herramientas permiten anticipar las prestaciones de los esquemas de testeo sin necesidad de simularlos. El objetivo del muestreo comprimido es recuperar una señal a partir de su proyección lineal en un espacio de menor dimensión. Esto sólo es posible si se asume que la señal original tiene muchas componentes que son cero. El problema versa sobre el diseño de matrices y algoritmos de reconstrucción que permitan implementar esquemas de muestreo y reconstrucción con un número mínimo de muestras. A diferencia de la formulación clásica de muestreo comprimido, en esta tesis se ha empleado un modelado probabilístico de la señal. Referencias recientes en la literatura demuestran que este enfoque permite conseguir esquemas de compresión y descompresión más eficientes. Nuestras contribuciones en el campo de muestreo comprimido de fuentes analógicas dispersas han sido también teóricas y prácticas. Por un lado, la deducción de la condición necesaria y suficiente que debe garantizar la matriz de muestreo para garantizar que se puede reconstruir unívocamente la secuencia de fuente. Por otro lado, hemos propuesto dos algoritmos, uno de ellos de baja complejidad computacional, que permiten reconstruir la señal original basados en paso de mensajes entre los nodos de la representación gráfica de la matriz de proyección.Postprint (published version
Regularity Properties and Pathologies of Position-Space Renormalization-Group Transformations
We reconsider the conceptual foundations of the renormalization-group (RG)
formalism, and prove some rigorous theorems on the regularity properties and
possible pathologies of the RG map. Regarding regularity, we show that the RG
map, defined on a suitable space of interactions (= formal Hamiltonians), is
always single-valued and Lipschitz continuous on its domain of definition. This
rules out a recently proposed scenario for the RG description of first-order
phase transitions. On the pathological side, we make rigorous some arguments of
Griffiths, Pearce and Israel, and prove in several cases that the renormalized
measure is not a Gibbs measure for any reasonable interaction. This means that
the RG map is ill-defined, and that the conventional RG description of
first-order phase transitions is not universally valid. For decimation or
Kadanoff transformations applied to the Ising model in dimension ,
these pathologies occur in a full neighborhood of the low-temperature part of the first-order
phase-transition surface. For block-averaging transformations applied to the
Ising model in dimension , the pathologies occur at low temperatures
for arbitrary magnetic-field strength. Pathologies may also occur in the
critical region for Ising models in dimension . We discuss in detail
the distinction between Gibbsian and non-Gibbsian measures, and give a rather
complete catalogue of the known examples. Finally, we discuss the heuristic and
numerical evidence on RG pathologies in the light of our rigorous theorems.Comment: 273 pages including 14 figures, Postscript, See also
ftp.scri.fsu.edu:hep-lat/papers/9210/9210032.ps.
Kernel methods in machine learning
We review machine learning methods employing positive definite kernels. These
methods formulate learning and estimation problems in a reproducing kernel
Hilbert space (RKHS) of functions defined on the data domain, expanded in terms
of a kernel. Working in linear spaces of function has the benefit of
facilitating the construction and analysis of learning algorithms while at the
same time allowing large classes of functions. The latter include nonlinear
functions as well as functions defined on nonvectorial data. We cover a wide
range of methods, ranging from binary classifiers to sophisticated methods for
estimation with structured data.Comment: Published in at http://dx.doi.org/10.1214/009053607000000677 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
A precise bare simulation approach to the minimization of some distances. Foundations
In information theory -- as well as in the adjacent fields of statistics,
machine learning, artificial intelligence, signal processing and pattern
recognition -- many flexibilizations of the omnipresent Kullback-Leibler
information distance (relative entropy) and of the closely related Shannon
entropy have become frequently used tools. To tackle corresponding constrained
minimization (respectively maximization) problems by a newly developed
dimension-free bare (pure) simulation method, is the main goal of this paper.
Almost no assumptions (like convexity) on the set of constraints are needed,
within our discrete setup of arbitrary dimension, and our method is precise
(i.e., converges in the limit). As a side effect, we also derive an innovative
way of constructing new useful distances/divergences. To illustrate the core of
our approach, we present numerous examples. The potential for widespread
applicability is indicated, too; in particular, we deliver many recent
references for uses of the involved distances/divergences and entropies in
various different research fields (which may also serve as an interdisciplinary
interface)
Distributional Robust Batch Contextual Bandits
Policy learning using historical observational data is an important problem
that has found widespread applications. Examples include selecting offers,
prices, advertisements to send to customers, as well as selecting which
medication to prescribe to a patient. However, existing literature rests on the
crucial assumption that the future environment where the learned policy will be
deployed is the same as the past environment that has generated the data--an
assumption that is often false or too coarse an approximation. In this paper,
we lift this assumption and aim to learn a distributional robust policy with
incomplete (bandit) observational data. We propose a novel learning algorithm
that is able to learn a robust policy to adversarial perturbations and unknown
covariate shifts. We first present a policy evaluation procedure in the
ambiguous environment and then give a performance guarantee based on the theory
of uniform convergence. Additionally, we also give a heuristic algorithm to
solve the distributional robust policy learning problems efficiently.Comment: The short version has been accepted in ICML 202
Machine Learning
Machine Learning can be defined in various ways related to a scientific domain concerned with the design and development of theoretical and implementation tools that allow building systems with some Human Like intelligent behavior. Machine learning addresses more specifically the ability to improve automatically through experience
- …