18 research outputs found

    Identifiability for Blind Source Separation of Multiple Finite Alphabet Linear Mixtures

    Full text link
    We give under weak assumptions a complete combinatorial characterization of identifiability for linear mixtures of finite alphabet sources, with unknown mixing weights and unknown source signals, but known alphabet. This is based on a detailed treatment of the case of a single linear mixture. Notably, our identifiability analysis applies also to the case of unknown number of sources. We provide sufficient and necessary conditions for identifiability and give a simple sufficient criterion together with an explicit construction to determine the weights and the source signals for deterministic data by taking advantage of the hierarchical structure within the possible mixture values. We show that the probability of identifiability is related to the distribution of a hitting time and converges exponentially fast to one when the underlying sources come from a discrete Markov process. Finally, we explore our theoretical results in a simulation study. Our work extends and clarifies the scope of scenarios for which blind source separation becomes meaningful

    Interview, Building Trust in Medical AI Algorithms with Veridical Data Science

    Get PDF

    Finite Alphabet Blind Separation

    Get PDF
    This thesis considers a particular blind source separation problem, where the sources are assumed to only take values in a known finite set, denoted as the alphabet. More precisely, one observes M linear mixtures of m signals (sources) taking values in the known finite alphabet. The aim in this model is to identify the unknown mixing weights and sources, including the number of sources, from noisy observations of the mixture. Finite Alphabet Blind Separation (FABS) occurs in many applications, for instance in digital communications with mixtures of multilevel pulse amplitude modulated digital signals. The main motivation for this thesis, however, comes from cancer genetics, where one aims to infer copy number aberrations of different clones in a tumor. In the first part of this thesis, we provide necessary and sufficient identifiability conditions and obtain exact recovery within a neighborhood of the mixture. In the second part, we study FABS for single mixtures M=1 within a change-point regression setting with Gaussian error. We provide uniformly honest lower confidence bounds and estimators with exponential convergence rates for the number of source components. With this at hand, we obtain consistent estimators with optimal convergence rates (up to log-factors) and asymptotically uniform honest confidence statements for the weights and the sources. We explore our procedure with a data example from cancer genetics. In the third part, we consider multivariate FABS, where several mixtures M > 1 are observed. For Gaussian error we show that the least squares estimator (LSE) attains the minimax rates, both for the prediction and for the estimation error. As computation of the LSE is not feasible, an efficient algorithm is proposed. Simulations suggest that this approximates the LSE well

    Opportunities and Challenges for AI-Based Analysis of RWD in Pharmaceutical R&D: A Practical Perspective

    Get PDF
    Real world data (RWD) has become an important tool in pharmaceutical research and development. Generated every time patients interact with the healthcare system when diagnoses are developed and medical interventions are selected, RWD are massive and in many regards typical big data. The use of artificial intelligence (AI) to analyze RWD seems an obvious choice. It promises new insights into medical need, drivers of diseases, and new opportunities for pharmacological interventions. When put into practice RWD analyses are challenging. The distributed generation of data, under sub-optimally standardized conditions in a patient-oriented but not information maximizing healthcare transaction, leads to a high level of sparseness and uncontrolled biases. We discuss why this needs to be addressed independent of the type of analysis approach. While classical statistical analysis and modeling approaches provide a rigorous framework for the handling of bias and sparseness, AI methods are not necessarily suited when applied naively. Special precautions need to be taken from choice of method until interpretation of results to prevent potentially harmful fallacies. The conscious use of prior medical subject matter expertise may also be required. Based on typical application examples we illustrate challenges and methodological considerations

    Provable Boolean Interaction Recovery from Tree Ensemble obtained via Random Forests

    Full text link
    Random Forests (RF) are at the cutting edge of supervised machine learning in terms of prediction performance, especially in genomics. Iterative Random Forests (iRF) use a tree ensemble from iteratively modified RF to obtain predictive and stable non-linear high-order Boolean interactions of features. They have shown great promise for high-order biological interaction discovery that is central to advancing functional genomics and precision medicine. However, theoretical studies into how tree-based methods discover high-order feature interactions are missing. In this paper, to enable such theoretical studies, we first introduce a novel discontinuous nonlinear regression model, called Locally Spiky Sparse (LSS) model, which is inspired by the thresholding behavior in many biological processes. Specifically, LSS model assumes that the regression function is a linear combination of piece-wise constant Boolean interaction terms. We define a quantity called depth-weighted prevalence (DWP) for a set of signed features S and a given RF tree ensemble. We prove that, with high probability under the LSS model, DWP of S attains a universal upper bound that does not involve any model coefficients, if and only if S corresponds to a union of Boolean interactions in the LSS model. As a consequence, we show that RF yields consistent interaction discovery under the LSS model. Simulation results show that DWP can recover the interactions under the LSS model even when some assumptions such as the uniformity assumption are violated
    corecore