88 research outputs found

    Binary Independent Component Analysis with OR Mixtures

    Full text link
    Independent component analysis (ICA) is a computational method for separating a multivariate signal into subcomponents assuming the mutual statistical independence of the non-Gaussian source signals. The classical Independent Components Analysis (ICA) framework usually assumes linear combinations of independent sources over the field of realvalued numbers R. In this paper, we investigate binary ICA for OR mixtures (bICA), which can find applications in many domains including medical diagnosis, multi-cluster assignment, Internet tomography and network resource management. We prove that bICA is uniquely identifiable under the disjunctive generation model, and propose a deterministic iterative algorithm to determine the distribution of the latent random variables and the mixing matrix. The inverse problem concerning inferring the values of latent variables are also considered along with noisy measurements. We conduct an extensive simulation study to verify the effectiveness of the propose algorithm and present examples of real-world applications where bICA can be applied.Comment: Manuscript submitted to IEEE Transactions on Signal Processin

    Independent Component Analysis for Binary Data

    Get PDF
    Independent Component Analysis (ICA) aims to separate the observed signals into their underlying independent components responsible for generating the observations. Most research in ICA has focused on continuous signals, while the methodology for binary and discrete signals is less developed. Yet, binary observations are equally present in various fields and applications, such as causal discovery, signal processing, and bioinformatics. In the last decade, Boolean OR and XOR mixtures have been shown to be identifiable by ICA, but such models suffer from limited expressivity, calling for new methods to solve the problem. In this thesis, "Independent Component Analysis for Binary Data", we estimate the mixing matrix of ICA from binary observations and an additionally observed auxiliary variable by employing a linear model inspired by the Identifiable Variational Autoencoder (iVAE), which exploits the non-stationarity of the data. The model is optimized with a gradient-based algorithm that uses second-order optimization with limited memory, resulting in a training time in the order of seconds for the particular study cases. We investigate which conditions can lead to the reconstruction of the mixing matrix, concluding that the method is able to identify the mixing matrix when the number of observed variables is greater than the number of sources. In such cases, the linear binary iVAE can reconstruct the mixing matrix up to order and scale indeterminacies, which are considered in the evaluation with the Mean Cosine Similarity Score. Furthermore, the model can reconstruct the mixing matrix even under a limited sample size. Therefore, this work demonstrates the potential for applications in real-world data and also offers a possibility to study and formalize identifiability in future work. In summary, the most important contributions of this thesis are the empirical study of the conditions that enable the mixing matrix reconstruction using the binary iVAE, and the empirical results on the performance and efficiency of the model. The latter was achieved through a new combination of existing methods, including modifications and simplifications of a linear binary iVAE model and the optimization of such a model under limited computational resources

    Infinite Divisibility of Information

    Full text link
    We study an information analogue of infinitely divisible probability distributions, where the i.i.d. sum is replaced by the joint distribution of an i.i.d. sequence. A random variable XX is called informationally infinitely divisible if, for any n≄1n\ge1, there exists an i.i.d. sequence of random variables Z1,
,ZnZ_{1},\ldots,Z_{n} that contains the same information as XX, i.e., there exists an injective function ff such that X=f(Z1,
,Zn)X=f(Z_{1},\ldots,Z_{n}). While there does not exist informationally infinitely divisible discrete random variable, we show that any discrete random variable XX has a bounded multiplicative gap to infinite divisibility, that is, if we remove the injectivity requirement on ff, then there exists i.i.d. Z1,
,ZnZ_{1},\ldots,Z_{n} and ff satisfying X=f(Z1,
,Zn)X=f(Z_{1},\ldots,Z_{n}), and the entropy satisfies H(X)/n≀H(Z1)≀1.59H(X)/n+2.43H(X)/n\le H(Z_{1})\le1.59H(X)/n+2.43. We also study a new class of discrete probability distributions, called spectral infinitely divisible distributions, where we can remove the multiplicative gap 1.591.59. Furthermore, we study the case where X=(Y1,
,Ym)X=(Y_{1},\ldots,Y_{m}) is itself an i.i.d. sequence, m≄2m\ge2, for which the multiplicative gap 1.591.59 can be replaced by 1+5(log⁥m)/m1+5\sqrt{(\log m)/m}. This means that as mm increases, (Y1,
,Ym)(Y_{1},\ldots,Y_{m}) becomes closer to being spectral infinitely divisible in a uniform manner. This can be regarded as an information analogue of Kolmogorov's uniform theorem. Applications of our result include independent component analysis, distributed storage with a secrecy constraint, and distributed random number generation.Comment: 22 page

    Codage réseau pour des applications multimédias avancées

    Get PDF
    Network coding is a paradigm that allows an efficient use of the capacity of communication networks. It maximizes the throughput in a multi-hop multicast communication and reduces the delay. In this thesis, we focus our attention to the integration of the network coding framework to multimedia applications, and in particular to advanced systems that provide enhanced video services to the users. Our contributions concern several instances of advanced multimedia communications: an efficient framework for transmission of a live stream making joint use of network coding and multiple description coding; a novel transmission strategy for lossy wireless networks that guarantees a trade-off between loss resilience and short delay based on a rate-distortion optimized scheduling of the video frames, that we also extended to the case of interactive multi-view streaming; a distributed social caching system that, using network coding in conjunction with the knowledge of the users' preferences in terms of views, is able to select a replication scheme such that to provide a high video quality by accessing only other members of the social group without incurring the access cost associated with a connection to a central server and without exchanging large tables of metadata to keep track of the replicated parts; and, finally, a study on using blind source separation techniques to reduce the overhead incurred by network coding schemes based on error-detecting techniques such as parity coding and message digest generation. All our contributions are aimed at using network coding to enhance the quality of video transmission in terms of distortion and delay perceivedLe codage rĂ©seau est un paradigme qui permet une utilisation efficace du rĂ©seau. Il maximise le dĂ©bit dans un rĂ©seau multi-saut en multicast et rĂ©duit le retard. Dans cette thĂšse, nous concentrons notre attention sur l’intĂ©gration du codage rĂ©seau aux applications multimĂ©dias, et en particulier aux systĂšmes avancĂšs qui fournissent un service vidĂ©o amĂ©liorĂ© pour les utilisateurs. Nos contributions concernent plusieurs scĂ©narios : un cadre de fonctions efficace pour la transmission de flux en directe qui utilise Ă  la fois le codage rĂ©seau et le codage par description multiple, une nouvelle stratĂ©gie de transmission pour les rĂ©seaux sans fil avec perte qui garantit un compromis entre la rĂ©silience vis-Ă -vis des perte et la reduction du retard sur la base d’une optimisation dĂ©bit-distorsion de l'ordonnancement des images vidĂ©o, que nous avons Ă©galement Ă©tendu au cas du streaming multi-vue interactive, un systĂšme replication sociale distribuĂ©e qui, en utilisant le rĂ©seau codage en relation et la connaissance des prĂ©fĂ©rences des utilisateurs en termes de vue, est en mesure de sĂ©lectionner un schĂ©ma de rĂ©plication capable de fournir une vidĂ©o de haute qualitĂ© en accĂ©dant seulement aux autres membres du groupe social, sans encourir le coĂ»t d’accĂšs associĂ© Ă  une connexion Ă  un serveur central et sans Ă©changer des larges tables de mĂ©tadonnĂ©es pour tenir trace des Ă©lĂ©ments rĂ©pliquĂ©s, et, finalement, une Ă©tude sur l’utilisation de techniques de sĂ©paration aveugle de source -pour rĂ©duire l’overhead encouru par les schĂ©mas de codage rĂ©seau- basĂ© sur des techniques de dĂ©tection d’erreur telles que le codage de paritĂ© et la gĂ©nĂ©ration de message digest

    On streaming approximation algorithms for constraint satisfaction problems

    Full text link
    In this thesis, we explore streaming algorithms for approximating constraint satisfaction problems (CSPs). The setup is roughly the following: A computer has limited memory space, sees a long "stream" of local constraints on a set of variables, and tries to estimate how many of the constraints may be simultaneously satisfied. The past ten years have seen a number of works in this area, and this thesis includes both expository material and novel contributions. Throughout, we emphasize connections to the broader theories of CSPs, approximability, and streaming models, and highlight interesting open problems. The first part of our thesis is expository: We present aspects of previous works that completely characterize the approximability of specific CSPs like Max-Cut and Max-Dicut with n\sqrt{n}-space streaming algorithm (on nn-variable instances), while characterizing the approximability of all CSPs in n\sqrt n space in the special case of "composable" (i.e., sketching) algorithms, and of a particular subclass of CSPs with linear-space streaming algorithms. In the second part of the thesis, we present two of our own joint works. We begin with a work with Madhu Sudan and Santhoshini Velusamy in which we prove linear-space streaming approximation-resistance for all ordering CSPs (OCSPs), which are "CSP-like" problems maximizing over sets of permutations. Next, we present joint work with Joanna Boyland, Michael Hwang, Tarun Prasad, and Santhoshini Velusamy in which we investigate the n\sqrt n-space streaming approximability of symmetric Boolean CSPs with negations. We give explicit n\sqrt n-space sketching approximability ratios for several families of CSPs, including Max-kkAND; develop simpler optimal sketching approximation algorithms for threshold predicates; and show that previous lower bounds fail to characterize the n\sqrt n-space streaming approximability of Max-33AND.Comment: Harvard College senior thesis; 119 pages plus references; abstract shortened for arXiv; formatted with Dissertate template (feel free to copy!); exposits papers arXiv:2105.01782 (APPROX 2021) and arXiv:2112.06319 (APPROX 2022
    • 

    corecore