2,805 research outputs found

    Sparse Probit Linear Mixed Model

    Full text link
    Linear Mixed Models (LMMs) are important tools in statistical genetics. When used for feature selection, they allow to find a sparse set of genetic traits that best predict a continuous phenotype of interest, while simultaneously correcting for various confounding factors such as age, ethnicity and population structure. Formulated as models for linear regression, LMMs have been restricted to continuous phenotypes. We introduce the Sparse Probit Linear Mixed Model (Probit-LMM), where we generalize the LMM modeling paradigm to binary phenotypes. As a technical challenge, the model no longer possesses a closed-form likelihood function. In this paper, we present a scalable approximate inference algorithm that lets us fit the model to high-dimensional data sets. We show on three real-world examples from different domains that in the setup of binary labels, our algorithm leads to better prediction accuracies and also selects features which show less correlation with the confounding factors.Comment: Published version, 21 pages, 6 figure

    On concavity of TAP free energy in the SK model

    Full text link
    We analyse the Hessian of the Thouless-Anderson-Palmer (TAP) free energy for the Sherrington-Kirkpatrick model, below the de Almeida-Thouless line, evaluated in Bolthausen's approximate solutions of the TAP equations. We show that while its empirical spectrum weakly converges to a measure with negative support, positive outlier eigenvalues occur for some (β,h)(\beta,h) below the AT line. In this sense, TAP free energy may lose concavity in the order parameter of the theory, i.e. the random spin-magnetisations, even below the AT line. Possible interpretations of these findings within Plefka's expansion of the Gibbs potential are not definitive and include the following: i) either higher order terms shall not be neglected even if Plefka's first convergence criterion (yielding, in infinite volume, the AT line) is satisfied, ii) Plefka's first convergence criterion (hence the AT line) is necessary yet hardly sufficient, or iii) Bolthausen's magnetizations do not approximate the TAP solutions sufficiently well up to the AT line.Comment: 29 pages, 1 figur

    AMP algorithms and Stein's method: Understanding TAP equations with a new method

    Full text link
    We propose a new iterative construction of solutions of the classical TAP equations for the Sherrington-Kirkpatrick model, i.e. with finite-size Onsager correction. The algorithm can be started in an arbitrary point, and converges up to the AT line. The analysis relies on a novel treatment of mean field algorithms through Stein's method. As such, the approach also yields weak convergence of the effective fields at all temperatures towards Gaussians, and can be applied, upon proper alterations, to all models where TAP-like equations and a Stein-operator are available.Comment: 38 page

    Raising the Bar in Graph-level Anomaly Detection

    Full text link
    Graph-level anomaly detection has become a critical topic in diverse areas, such as financial fraud detection and detecting anomalous activities in social networks. While most research has focused on anomaly detection for visual data such as images, where high detection accuracies have been obtained, existing deep learning approaches for graphs currently show considerably worse performance. This paper raises the bar on graph-level anomaly detection, i.e., the task of detecting abnormal graphs in a set of graphs. By drawing on ideas from self-supervised learning and transformation learning, we present a new deep learning approach that significantly improves existing deep one-class approaches by fixing some of their known problems, including hypersphere collapse and performance flip. Experiments on nine real-world data sets involving nine techniques reveal that our method achieves an average performance improvement of 11.8% AUC compared to the best existing approach.Comment: To appear in IJCAI-ECAI 202

    Dissecting tumor cell heterogeneity in 3D cell culture systems by combining imaging and next generation sequencing technologies

    Get PDF
    Three-dimensional (3D) in vitro cell culture systems have advanced the modeling of cellular processes in health and disease by reflecting physiological characteristics and architectural features of in vivo tissues. As a result, representative patient-derived 3D culture systems are emerging as advanced pre-clinical tumor models to support individualized therapy decisions. Beside the additional progress that has been achieved in molecular and pathological analyses towards personalized treatments, a remaining problem in both primary lesions and in vitro cultures is our limited understanding of functional tumor cell heterogeneity. This phenomenon is increasingly recognized as key driver of tumor progression and treatment resistance. Recent technological advances in next generation sequencing (NGS) have enabled unbiased identification of gene expression in low-input samples and single cells (scRNA-seq), thereby providing the basis to reveal cellular subtypes and drivers of cell state transitions. However, these methods generally require dissociation of tissues into single cell suspensions, which consequently leads to the loss of multicellular context. Thus, a direct or indirect combination of gene expression profiling with in situ microscopy is necessary for single cell analyses to precisely understand the association between complex cellular phenotypes and their underlying genetic programs. In this thesis, I will present two complementing strategies based on combinations of NGS and microscopy to dissect tumor cell heterogeneity in 3D culture systems. First, I will describe the development and application of the new method ‘pheno-seq’ for integrated high-throughput imaging and transcriptomic profiling of clonal tumor spheroids derived from models of breast and colorectal cancer (CRC). By this approach, we revealed characteristic gene expression that is associated with heterogeneous invasive and proliferative behavior, identified transcriptional regulators that are missed by scRNA-seq, linked visual phenotypes and associated transcriptional signatures to inhibitor response and inferred single-cell regulatory states by deconvolution. Second, by applying scRNA-seq to 12 patient-derived CRC spheroid cultures, we identified shared expression programs that relate to intestinal lineages and revealed metabolic signatures that are linked to cancer cell differentiation. In addition, we validated and complemented sequencing results by quantitative microscopy using live-dyes and multiplexed RNA fluorescence in situ hybridization, thereby revealing metabolic compartmentalization and potential cell-cell interactions. Taken together, we believe that our approaches provide a framework for translational research to dissect heterogeneous transcriptional programs in 3D cell culture systems which will pave the way for a deeper understanding of functional tumor cell heterogeneity

    Deriving Event Logs from Legacy Software Systems

    Get PDF
    Abstract. The modernization of legacy software systems is one of the key challenges in software industry, which requires comprehensive system analysis. In this context, process mining has proven to be useful for understanding the (business) processes implemented by the legacy software system. However, process mining algorithms are highly dependent on both the quality and existence of suitable event logs. In many scenarios, existing software systems (e.g., legacy applications) do not leverage process engines capable of producing such high-quality event logs, which hampers the application of process mining algorithms. Deriving suitable event log data from legacy software systems, therefore, constitutes a relevant task that fosters data-driven analysis approaches, including process mining, data-based process documentation, and process-centric software migration. This paper presents an approach for deriving event logs from legacy software systems by combining knowledge from source code and corresponding database operations. The goal is to identify relevant business objects as well as to document user and software interactions with them in an event log suitable for process mining

    Weaker Than You Think: A Critical Look atWeakly Supervised Learning

    Full text link
    Weakly supervised learning is a popular approach for training machine learning models in low-resource settings. Instead of requesting high-quality yet costly human annotations, it allows training models with noisy annotations obtained from various weak sources. Recently, many sophisticated approaches have been proposed for robust training under label noise, reporting impressive results. In this paper, we revisit the setup of these approaches and find that the benefits brought by these approaches are significantly overestimated. Specifically, we find that the success of existing weakly supervised learning approaches heavily relies on the availability of clean validation samples which, as we show, can be leveraged much more efficiently by simply training on them. After using these clean labels in training, the advantages of using these sophisticated approaches are mostly wiped out. This remains true even when reducing the size of the available clean data to just five samples per class, making these approaches impractical. To understand the true value of weakly supervised learning, we thoroughly analyse diverse NLP datasets and tasks to ascertain when and why weakly supervised approaches work, and provide recommendations for future research.Comment: ACL 202
    • …
    corecore