362 research outputs found

    Deep Clustering: A Comprehensive Survey

    Full text link
    Cluster analysis plays an indispensable role in machine learning and data mining. Learning a good data representation is crucial for clustering algorithms. Recently, deep clustering, which can learn clustering-friendly representations using deep neural networks, has been broadly applied in a wide range of clustering tasks. Existing surveys for deep clustering mainly focus on the single-view fields and the network architectures, ignoring the complex application scenarios of clustering. To address this issue, in this paper we provide a comprehensive survey for deep clustering in views of data sources. With different data sources and initial conditions, we systematically distinguish the clustering methods in terms of methodology, prior knowledge, and architecture. Concretely, deep clustering methods are introduced according to four categories, i.e., traditional single-view deep clustering, semi-supervised deep clustering, deep multi-view clustering, and deep transfer clustering. Finally, we discuss the open challenges and potential future opportunities in different fields of deep clustering

    Data Visualization, Dimensionality Reduction, and Data Alignment via Manifold Learning

    Get PDF
    The high dimensionality of modern data introduces significant challenges in descriptive and exploratory data analysis. These challenges gave rise to extensive work on dimensionality reduction and manifold learning aiming to provide low dimensional representations that preserve or uncover intrinsic patterns and structures in the data. In this thesis, we expand the current literature in manifold learning developing two methods called DIG (Dynamical Information Geometry) and GRAE (Geometry Regularized Autoencoders). DIG is a method capable of finding low-dimensional representations of high-frequency multivariate time series data, especially suited for visualization. GRAE is a general framework which splices the well-established machinery from kernel manifold learning methods to recover a sensitive geometry, alongside the parametric structure of autoencoders. Manifold learning can also be useful to study data collected from different measurement instruments, conditions, or protocols of the same underlying system. In such cases the data is acquired in a multi-domain representation. The last two Chapters of this thesis are devoted to two new methods capable of aligning multi-domain data, leveraging their geometric structure alongside limited common information. First, we present DTA (Diffusion Transport Alignment), a semi-supervised manifold alignment method that exploits prior one-to-one correspondence knowledge between distinct data views and finds an aligned common representation. And finally, we introduce MALI (Manifold Alignment with Label Information). Here we drop the one-to-one prior correspondences assumption, since in many scenarios such information can not be provided, either due to the nature of the experimental design, or it becomes extremely costly. Instead, MALI only needs side-information in the form of discrete labels/classes present in both domains

    Hybrid Gromov-Wasserstein Embedding for Capsule Learning

    Full text link
    Capsule networks (CapsNets) aim to parse images into a hierarchy of objects, parts, and their relations using a two-step process involving part-whole transformation and hierarchical component routing. However, this hierarchical relationship modeling is computationally expensive, which has limited the wider use of CapsNet despite its potential advantages. The current state of CapsNet models primarily focuses on comparing their performance with capsule baselines, falling short of achieving the same level of proficiency as deep CNN variants in intricate tasks. To address this limitation, we present an efficient approach for learning capsules that surpasses canonical baseline models and even demonstrates superior performance compared to high-performing convolution models. Our contribution can be outlined in two aspects: firstly, we introduce a group of subcapsules onto which an input vector is projected. Subsequently, we present the Hybrid Gromov-Wasserstein framework, which initially quantifies the dissimilarity between the input and the components modeled by the subcapsules, followed by determining their alignment degree through optimal transport. This innovative mechanism capitalizes on new insights into defining alignment between the input and subcapsules, based on the similarity of their respective component distributions. This approach enhances CapsNets' capacity to learn from intricate, high-dimensional data while retaining their interpretability and hierarchical structure. Our proposed model offers two distinct advantages: (i) its lightweight nature facilitates the application of capsules to more intricate vision tasks, including object detection; (ii) it outperforms baseline approaches in these demanding tasks

    Graph Priors, Optimal Transport, and Deep Learning in Biomedical Discovery

    Get PDF
    Recent advances in biomedical data collection allows the collection of massive datasets measuring thousands of features in thousands to millions of individual cells. This data has the potential to advance our understanding of biological mechanisms at a previously impossible resolution. However, there are few methods to understand data of this scale and type. While neural networks have made tremendous progress on supervised learning problems, there is still much work to be done in making them useful for discovery in data with more difficult to represent supervision. The flexibility and expressiveness of neural networks is sometimes a hindrance in these less supervised domains, as is the case when extracting knowledge from biomedical data. One type of prior knowledge that is more common in biological data comes in the form of geometric constraints. In this thesis, we aim to leverage this geometric knowledge to create scalable and interpretable models to understand this data. Encoding geometric priors into neural network and graph models allows us to characterize the models’ solutions as they relate to the fields of graph signal processing and optimal transport. These links allow us to understand and interpret this datatype. We divide this work into three sections. The first borrows concepts from graph signal processing to construct more interpretable and performant neural networks by constraining and structuring the architecture. The second borrows from the theory of optimal transport to perform anomaly detection and trajectory inference efficiently and with theoretical guarantees. The third examines how to compare distributions over an underlying manifold, which can be used to understand how different perturbations or conditions relate. For this we design an efficient approximation of optimal transport based on diffusion over a joint cell graph. Together, these works utilize our prior understanding of the data geometry to create more useful models of the data. We apply these methods to molecular graphs, images, single-cell sequencing, and health record data

    Scaling Attributed Network Embedding to Massive Graphs

    Full text link
    Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node vin G to a compact vector Xv, which can be used in downstream machine learning tasks. Ideally, Xv should capture node v's affinity to each attribute, which considers not only v's own attribute associations, but also those of its connected nodes along edges in G. It is challenging to obtain high-utility embeddings that enable accurate predictions; scaling effective ANE computation to massive graphs with millions of nodes pushes the difficulty of the problem to a whole new level. Existing solutions largely fail on such graphs, leading to prohibitive costs, low-quality embeddings, or both. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs that achieves state-of-the-art result quality on multiple benchmark datasets, measured by the accuracy of three common prediction tasks: attribute inference, link prediction, and node classification. PANE obtains high scalability and effectiveness through three main algorithmic designs. First, it formulates the learning objective based on a novel random walk model for attributed networks. The resulting optimization task is still challenging on large graphs. Second, PANE includes a highly efficient solver for the above optimization problem, whose key module is a carefully designed initialization of the embeddings, which drastically reduces the number of iterations required to converge. Finally, PANE utilizes multi-core CPUs through non-trivial parallelization of the above solver, which achieves scalability while retaining the high quality of the resulting embeddings. Extensive experiments, comparing 10 existing approaches on 8 real datasets, demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.Comment: 16 pages. PVLDB 2021. Volume 14, Issue

    Post-authorship attribution using regularized deep neural network

    Get PDF
    Post-authorship attribution is a scientific process of using stylometric features to identify the genuine writer of an online text snippet such as an email, blog, forum post, or chat log. It has useful applications in manifold domains, for instance, in a verification process to proactively detect misogynistic, misandrist, xenophobic, and abusive posts on the internet or social networks. The process assumes that texts can be characterized by sequences of words that agglutinate the functional and content lyrics of a writer. However, defining an appropriate characterization of text to capture the unique writing style of an author is a complex endeavor in the discipline of computational linguistics. Moreover, posts are typically short texts with obfuscating vocabularies that might impact the accuracy of authorship attribution. The vocabularies include idioms, onomatopoeias, homophones, phonemes, synonyms, acronyms, anaphora, and polysemy. The method of the regularized deep neural network (RDNN) is introduced in this paper to circumvent the intrinsic challenges of post-authorship attribution. It is based on a convolutional neural network, bidirectional long short-term memory encoder, and distributed highway network. The neural network was used to extract lexical stylometric features that are fed into the bidirectional encoder to extract a syntactic feature-vector representation. The feature vector was then supplied as input to the distributed high networks for regularization to minimize the network-generalization error. The regularized feature vector was ultimately passed to the bidirectional decoder to learn the writing style of an author. The feature-classification layer consists of a fully connected network and a SoftMax function to make the prediction. The RDNN method was tested against thirteen state-of-the-art methods using four benchmark experimental datasets to validate its performance. Experimental results have demonstrated the effectiveness of the method when compared to the existing state-of-the-art methods on three datasets while producing comparable results on one dataset.The Department of Science and Technology (DST) and the Council for Scientific and Industrial Research (CSIR).https://www.mdpi.com/journal/applsciam2023Computer Scienc

    Normed Spaces for Graph Embedding

    Get PDF
    We would like to extend our thanks to Anna Wienhard, Maria Beatrice Pozetti, Ullrich Koethe, Federico López, and Steve Trettel for many interesting conversations and valuable insights. We extend our sincere gratitude to the reviewers for their insightful comments and suggestions, which have significantly enhanced the quality of this paper. J.M. Riestenberg was supported by the RTG 2229 “Asymptotic Invariants and Limits of Groups and Spaces” and by the DFG under Project-ID 338644254 - SPP2026. Wei Zhao was supported by the Klaus Tschira Foundation and a Young Marsilius Fellowship at Heidelberg UniversityPeer reviewe

    Robustness and Interpretability of Neural Networks’ Predictions under Adversarial Attacks

    Get PDF
    Le reti neurali profonde (DNNs) sono potenti modelli predittivi, che superano le capacità umane in una varietà di task. Imparano sistemi decisionali complessi e flessibili dai dati a disposizione e raggiungono prestazioni eccezionali in molteplici campi di apprendimento automatico, dalle applicazioni dell'intelligenza artificiale, come il riconoscimento di immagini, parole e testi, alle scienze più tradizionali, tra cui medicina, fisica e biologia. Nonostante i risultati eccezionali, le prestazioni elevate e l’alta precisione predittiva non sono sufficienti per le applicazioni nel mondo reale, specialmente in ambienti critici per la sicurezza, dove l'utilizzo dei DNNs è fortemente limitato dalla loro natura black-box. Vi è una crescente necessità di comprendere come vengono eseguite le predizioni, fornire stime di incertezza, garantire robustezza agli attacchi avversari e prevenire comportamenti indesiderati. Anche le migliori architetture sono vulnerabili a piccole perturbazioni nei dati di input, note come attacchi avversari: manipolazioni malevole degli input che sono percettivamente indistinguibili dai campioni originali ma sono in grado di ingannare il modello in predizioni errate. In questo lavoro, dimostriamo che tale fragilità è correlata alla geometria del manifold dei dati ed è quindi probabile che sia una caratteristica intrinseca delle predizioni dei DNNs. Questa condizione suggerisce una possibile direzione al fine di ottenere robustezza agli attacchi: studiamo la geometria degli attacchi avversari nel limite di un numero infinito di dati e di pesi per le reti neurali Bayesiane, dimostrando che, in questo limite, sono immuni agli attacchi avversari gradient-based. Inoltre, proponiamo alcune tecniche di training per migliorare la robustezza delle architetture deterministiche. In particolare, osserviamo sperimentalmente che ensembles di reti neurali addestrati su proiezioni casuali degli input originali in spazi basso-dimensionali sono più resistenti agli attacchi. Successivamente, ci concentriamo sul problema dell'interpretabilità delle predizioni delle reti nel contesto delle saliency-based explanations. Analizziamo la stabilità delle explanations soggette ad attacchi avversari e dimostriamo che, nel limite di un numero infinito di dati e di pesi, le interpretazioni Bayesiane sono più stabili di quelle fornite dalle reti deterministiche. Confermiamo questo comportamento in modo sperimentale nel regime di un numero finito di dati. Infine, introduciamo il concetto di attacco avversario alle sequenze di amminoacidi per protein Language Models (LM). I modelli di Deep Learning per la predizione della struttura delle proteine, come AlphaFold2, sfruttano le architetture Transformer e il loro meccanismo di attention per catturare le proprietà strutturali e funzionali delle sequenze di amminoacidi. Nonostante l'elevata precisione delle predizioni, perturbazioni biologicamente piccole delle sequenze di input, o anche mutazioni di un singolo amminoacido, possono portare a strutture 3D sostanzialmente diverse. Al contempo, i protein LMs sono insensibili alle mutazioni che inducono misfolding o disfunzione (ad esempio le missense mutations). In particolare, le predizioni delle coordinate 3D non rivelano l'effetto di unfolding indotto da queste mutazioni. Pertanto, esiste un'evidente incoerenza tra l'importanza biologica delle mutazioni e il conseguente cambiamento nella predizione strutturale. Ispirati da questo problema, introduciamo il concetto di perturbazione avversaria delle sequenze proteiche negli embedding continui dei protein LMs. Il nostro metodo utilizza i valori di attention per rilevare le posizioni degli amminoacidi più vulnerabili nelle sequenze di input. Le mutazioni avversarie sono biologicamente diverse dalle sequenze di riferimento e sono in grado di alterare in modo significativo le strutture 3D.Deep Neural Networks (DNNs) are powerful predictive models, exceeding human capabilities in a variety of tasks. They learn complex and flexible decision systems from the available data and achieve exceptional performances in multiple machine learning fields, spanning from applications in artificial intelligence, such as image, speech and text recognition, to the more traditional sciences, including medicine, physics and biology. Despite the outstanding achievements, high performance and high predictive accuracy are not sufficient for real-world applications, especially in safety-critical settings, where the usage of DNNs is severely limited by their black-box nature. There is an increasing need to understand how predictions are performed, to provide uncertainty estimates, to guarantee robustness to malicious attacks and to prevent unwanted behaviours. State-of-the-art DNNs are vulnerable to small perturbations in the input data, known as adversarial attacks: maliciously crafted manipulations of the inputs that are perceptually indistinguishable from the original samples but are capable of fooling the model into incorrect predictions. In this work, we prove that such brittleness is related to the geometry of the data manifold and is therefore likely to be an intrinsic feature of DNNs’ predictions. This negative condition suggests a possible direction to overcome such limitation: we study the geometry of adversarial attacks in the large-data, overparameterized limit for Bayesian Neural Networks and prove that, in this limit, they are immune to gradient-based adversarial attacks. Furthermore, we propose some training techniques to improve the adversarial robustness of deterministic architectures. In particular, we experimentally observe that ensembles of NNs trained on random projections of the original inputs into lower dimensional spaces are more resilient to the attacks. Next, we focus on the problem of interpretability of NNs’ predictions in the setting of saliency-based explanations. We analyze the stability of the explanations under adversarial attacks on the inputs and we prove that, in the large-data and overparameterized limit, Bayesian interpretations are more stable than those provided by deterministic networks. We validate this behaviour in multiple experimental settings in the finite data regime. Finally, we introduce the concept of adversarial perturbations of amino acid sequences for protein Language Models (LMs). Deep Learning models for protein structure prediction, such as AlphaFold2, leverage Transformer architectures and their attention mechanism to capture structural and functional properties of amino acid sequences. Despite the high accuracy of predictions, biologically small perturbations of the input sequences, or even single point mutations, can lead to substantially different 3d structures. On the other hand, protein language models are insensitive to mutations that induce misfolding or dysfunction (e.g. missense mutations). Precisely, predictions of the 3d coordinates do not reveal the structure-disruptive effect of these mutations. Therefore, there is an evident inconsistency between the biological importance of mutations and the resulting change in structural prediction. Inspired by this problem, we introduce the concept of adversarial perturbation of protein sequences in continuous embedding spaces of protein language models. Our method relies on attention scores to detect the most vulnerable amino acid positions in the input sequences. Adversarial mutations are biologically diverse from their references and are able to significantly alter the resulting 3D structures
    corecore