88 research outputs found
Binary Independent Component Analysis with OR Mixtures
Independent component analysis (ICA) is a computational method for separating
a multivariate signal into subcomponents assuming the mutual statistical
independence of the non-Gaussian source signals. The classical Independent
Components Analysis (ICA) framework usually assumes linear combinations of
independent sources over the field of realvalued numbers R. In this paper, we
investigate binary ICA for OR mixtures (bICA), which can find applications in
many domains including medical diagnosis, multi-cluster assignment, Internet
tomography and network resource management. We prove that bICA is uniquely
identifiable under the disjunctive generation model, and propose a
deterministic iterative algorithm to determine the distribution of the latent
random variables and the mixing matrix. The inverse problem concerning
inferring the values of latent variables are also considered along with noisy
measurements. We conduct an extensive simulation study to verify the
effectiveness of the propose algorithm and present examples of real-world
applications where bICA can be applied.Comment: Manuscript submitted to IEEE Transactions on Signal Processin
Independent Component Analysis for Binary Data
Independent Component Analysis (ICA) aims to separate the observed signals into their underlying independent components responsible for generating the observations. Most research in ICA has focused on continuous signals, while the methodology for binary and discrete signals is less developed. Yet, binary observations are equally present in various fields and applications, such as causal discovery, signal processing, and bioinformatics. In the last decade, Boolean OR and XOR mixtures have been shown to be identifiable by ICA, but such models suffer from limited expressivity, calling for new methods to solve the problem.
In this thesis, "Independent Component Analysis for Binary Data", we estimate the mixing matrix of ICA from binary observations and an additionally observed auxiliary variable by employing a linear model inspired by the Identifiable Variational Autoencoder (iVAE), which exploits the non-stationarity of the data. The model is optimized with a gradient-based algorithm that uses second-order optimization with limited memory, resulting in a training time in the order of seconds for the particular study cases.
We investigate which conditions can lead to the reconstruction of the mixing matrix, concluding that the method is able to identify the mixing matrix when the number of observed variables is greater than the number of sources. In such cases, the linear binary iVAE can reconstruct the mixing matrix up to order and scale indeterminacies, which are considered in the evaluation with the Mean Cosine Similarity Score. Furthermore, the model can reconstruct the mixing matrix even under a limited sample size. Therefore, this work demonstrates the potential for applications in real-world data and also offers a possibility to study and formalize identifiability in future work.
In summary, the most important contributions of this thesis are the empirical study of the conditions that enable the mixing matrix reconstruction using the binary iVAE, and the empirical results on the performance and efficiency of the model. The latter was achieved through a new combination of existing methods, including modifications and simplifications of a linear binary iVAE model and the optimization of such a model under limited computational resources
Infinite Divisibility of Information
We study an information analogue of infinitely divisible probability
distributions, where the i.i.d. sum is replaced by the joint distribution of an
i.i.d. sequence. A random variable is called informationally infinitely
divisible if, for any , there exists an i.i.d. sequence of random
variables that contains the same information as , i.e.,
there exists an injective function such that .
While there does not exist informationally infinitely divisible discrete random
variable, we show that any discrete random variable has a bounded
multiplicative gap to infinite divisibility, that is, if we remove the
injectivity requirement on , then there exists i.i.d.
and satisfying , and the entropy satisfies
. We also study a new class of discrete
probability distributions, called spectral infinitely divisible distributions,
where we can remove the multiplicative gap . Furthermore, we study the
case where is itself an i.i.d. sequence, , for
which the multiplicative gap can be replaced by .
This means that as increases, becomes closer to
being spectral infinitely divisible in a uniform manner. This can be regarded
as an information analogue of Kolmogorov's uniform theorem. Applications of our
result include independent component analysis, distributed storage with a
secrecy constraint, and distributed random number generation.Comment: 22 page
Codage réseau pour des applications multimédias avancées
Network coding is a paradigm that allows an efficient use of the capacity of communication networks. It maximizes the throughput in a multi-hop multicast communication and reduces the delay. In this thesis, we focus our attention to the integration of the network coding framework to multimedia applications, and in particular to advanced systems that provide enhanced video services to the users. Our contributions concern several instances of advanced multimedia communications: an efficient framework for transmission of a live stream making joint use of network coding and multiple description coding; a novel transmission strategy for lossy wireless networks that guarantees a trade-off between loss resilience and short delay based on a rate-distortion optimized scheduling of the video frames, that we also extended to the case of interactive multi-view streaming; a distributed social caching system that, using network coding in conjunction with the knowledge of the users' preferences in terms of views, is able to select a replication scheme such that to provide a high video quality by accessing only other members of the social group without incurring the access cost associated with a connection to a central server and without exchanging large tables of metadata to keep track of the replicated parts; and, finally, a study on using blind source separation techniques to reduce the overhead incurred by network coding schemes based on error-detecting techniques such as parity coding and message digest generation. All our contributions are aimed at using network coding to enhance the quality of video transmission in terms of distortion and delay perceivedLe codage rĂ©seau est un paradigme qui permet une utilisation efficace du rĂ©seau. Il maximise le dĂ©bit dans un rĂ©seau multi-saut en multicast et rĂ©duit le retard. Dans cette thĂšse, nous concentrons notre attention sur lâintĂ©gration du codage rĂ©seau aux applications multimĂ©dias, et en particulier aux systĂšmes avancĂšs qui fournissent un service vidĂ©o amĂ©liorĂ© pour les utilisateurs. Nos contributions concernent plusieurs scĂ©narios : un cadre de fonctions efficace pour la transmission de flux en directe qui utilise Ă la fois le codage rĂ©seau et le codage par description multiple, une nouvelle stratĂ©gie de transmission pour les rĂ©seaux sans fil avec perte qui garantit un compromis entre la rĂ©silience vis-Ă -vis des perte et la reduction du retard sur la base dâune optimisation dĂ©bit-distorsion de l'ordonnancement des images vidĂ©o, que nous avons Ă©galement Ă©tendu au cas du streaming multi-vue interactive, un systĂšme replication sociale distribuĂ©e qui, en utilisant le rĂ©seau codage en relation et la connaissance des prĂ©fĂ©rences des utilisateurs en termes de vue, est en mesure de sĂ©lectionner un schĂ©ma de rĂ©plication capable de fournir une vidĂ©o de haute qualitĂ© en accĂ©dant seulement aux autres membres du groupe social, sans encourir le coĂ»t dâaccĂšs associĂ© Ă une connexion Ă un serveur central et sans Ă©changer des larges tables de mĂ©tadonnĂ©es pour tenir trace des Ă©lĂ©ments rĂ©pliquĂ©s, et, finalement, une Ă©tude sur lâutilisation de techniques de sĂ©paration aveugle de source -pour rĂ©duire lâoverhead encouru par les schĂ©mas de codage rĂ©seau- basĂ© sur des techniques de dĂ©tection dâerreur telles que le codage de paritĂ© et la gĂ©nĂ©ration de message digest
On streaming approximation algorithms for constraint satisfaction problems
In this thesis, we explore streaming algorithms for approximating constraint
satisfaction problems (CSPs). The setup is roughly the following: A computer
has limited memory space, sees a long "stream" of local constraints on a set of
variables, and tries to estimate how many of the constraints may be
simultaneously satisfied. The past ten years have seen a number of works in
this area, and this thesis includes both expository material and novel
contributions. Throughout, we emphasize connections to the broader theories of
CSPs, approximability, and streaming models, and highlight interesting open
problems.
The first part of our thesis is expository: We present aspects of previous
works that completely characterize the approximability of specific CSPs like
Max-Cut and Max-Dicut with -space streaming algorithm (on
-variable instances), while characterizing the approximability of all CSPs
in space in the special case of "composable" (i.e., sketching)
algorithms, and of a particular subclass of CSPs with linear-space streaming
algorithms.
In the second part of the thesis, we present two of our own joint works. We
begin with a work with Madhu Sudan and Santhoshini Velusamy in which we prove
linear-space streaming approximation-resistance for all ordering CSPs (OCSPs),
which are "CSP-like" problems maximizing over sets of permutations. Next, we
present joint work with Joanna Boyland, Michael Hwang, Tarun Prasad, and
Santhoshini Velusamy in which we investigate the -space streaming
approximability of symmetric Boolean CSPs with negations. We give explicit
-space sketching approximability ratios for several families of CSPs,
including Max-AND; develop simpler optimal sketching approximation
algorithms for threshold predicates; and show that previous lower bounds fail
to characterize the -space streaming approximability of Max-AND.Comment: Harvard College senior thesis; 119 pages plus references; abstract
shortened for arXiv; formatted with Dissertate template (feel free to copy!);
exposits papers arXiv:2105.01782 (APPROX 2021) and arXiv:2112.06319 (APPROX
2022
- âŠ