136 research outputs found

    Improvement the Community Detection with Graph Autoencoder in Social Network Using Correlation-Based Feature Selection Method

    Get PDF
    مقدمة: في هذا البحث ، نهدف إلى تحسين طرق اكتشاف المجتمع باستخدام Graph Autoencoder. يعد اكتشاف المجتمع مرحلة حاسمة لفهم الشبكات الاجتماعية وتكوينها. طرق العمل: نقترح إطار عمل اكتشاف المجتمع باستخدام نموذج Graph Autoencoder  (CDGAE)، حيث قمنا بدمج ميزة العقد مع هيكل الشبكة كمدخل لطريقتنا. تستخدم CDGAE إستراتيجية قائمة على قياس المركزية للتعامل مع مجموعة البيانات الخالية من الميزات من خلال توفير ميزات اصطناعية لعقدها. تم تحسين أداء النموذج من خلال تطبيق تحديد الميزة على ميزات العقدة. يتمثل الابتكار الأساسي لـ CDGAE في إضافة عدد المجتمعات التي تم حسابها باستخدام Bethe Hessian Matrix في طبقة عنق الزجاجة لبنية Graph Autoencoder (GAE) ، لاستخراج المجتمعات مباشرةً دون استخدام أي خوارزميات تجميع. الاستنتاجات: وفقًا للنتائج التجريبية ، تؤدي إضافة ميزات اصطناعية إلى عقد مجموعة البيانات إلى تحسين الأداء. بالإضافة إلى ذلك ، حصلنا على نتائج افضل بكثير في اكتشاف المجتمع  باستخدام طريقة اختيار الميزة وبتعميق نموذج. أظهرت النتائج التجريبية أن نهجنا يتفوق على الخوارزميات الموجودة.Background: In this paper, we aim to improve community detection methods using Graph Autoencoder.  Community detection is a crucial stage in comprehend the purpose and composition of social networks. Materials and Methods: We propose a Community Detection framework using the Graph Autoencoder (CDGAE) model, we combined the nodes feature with the network topology as input to our method. A centrality measurement-based strategy is used by CDGAE to deal with the featureless dataset by providing artificial attributes to its nodes. The performance of the model was improved by applying feature selection to node features The basic innovation of CDGAE is that added the number of communities counted using the Bethe Hessian Matrix in the bottleneck layer of the graph autoencoder (GAE) structure, to directly extract communities without using any clustering algorithms. Results: According to experimental findings, adding artificial features to the dataset's nodes improves performance. Additionally, the outcomes in community detection were much better with the feature selection method and a deeper model. Experimental evidence has shown that our approach outperforms existing algorithms. Conclusion: In this study, we suggest a community detection framework using graph autoencoder (CDMEC). In order to take advantage of GAE's ability to combine node features with the network topology, we add node features to the featureless graph nodes using centrality measurement. By applying the feature selection to the features of the nodes, the performance of the model has improved significantly, due to the elimination of data noise. Additionally, the inclusion of the number of communities in the bottleneck layer of the GAE structure allowed us to do away with clustering algorithms, which helped decrease the complexity time. deepening the model also improved the community detection. Because social media platforms are dynamic

    Deep Learning in Social Networks for Overlappering Community Detection

    Get PDF
    The collection of nodes is termed as community in any network system that are tightly associated to the other nodes. In network investigation, identifying the community structure is crucial task, particularly for exposing connections between certain nodes. For community overlapping, network discovery, there are numerous methodologies described in the literature. Numerous scholars have recently focused on network embedding and feature learning techniques for node clustering. These techniques translate the network into a representation space with fewer dimensions. In this paper, a deep neural network-based model for learning graph representation and stacked auto-encoders are given a nonlinear embedding of the original graph to learn the model. In order to extract overlapping communities, an AEOCDSN algorithm is used. The efficiency of the suggested model is examined through experiments on real-world datasets of various sizes and accepted standards. The method outperforms various well-known community detection techniques, according to empirical findings

    Graph Summarization

    Full text link
    The continuous and rapid growth of highly interconnected datasets, which are both voluminous and complex, calls for the development of adequate processing and analytical techniques. One method for condensing and simplifying such datasets is graph summarization. It denotes a series of application-specific algorithms designed to transform graphs into more compact representations while preserving structural patterns, query answers, or specific property distributions. As this problem is common to several areas studying graph topologies, different approaches, such as clustering, compression, sampling, or influence detection, have been proposed, primarily based on statistical and optimization methods. The focus of our chapter is to pinpoint the main graph summarization methods, but especially to focus on the most recent approaches and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie

    Data-Efficient Machine Learning with Focus on Transfer Learning

    Get PDF
    Machine learning (ML) has attracted a significant amount of attention from the artifi- cial intelligence community. ML has shown state-of-art performance in various fields, such as signal processing, healthcare system, and natural language processing (NLP). However, most conventional ML algorithms suffer from three significant difficulties: 1) insufficient high-quality training data, 2) costly training process, and 3) domain dis- crepancy. Therefore, it is important to develop solutions for these problems, so the future of ML will be more sustainable. Recently, a new concept, data-efficient ma- chine learning (DEML), has been proposed to deal with the current bottlenecks of ML. Moreover, transfer learning (TL) has been considered as an effective solution to address the three shortcomings of conventional ML. Furthermore, TL is one of the most active areas in the DEML. Over the past ten years, significant progress has been made in TL. In this dissertation, I propose to address the three problems by developing a software- oriented framework and TL algorithms. Firstly, I introduce a DEML framework and a evaluation system. Moreover, I present two novel TL algorithms and applications on real-world problems. Furthermore, I will first present the first well-defined DEML framework and introduce how it can address the challenges in ML. After that, I will give an updated overview of the state-of-the-art and open challenges in the TL. I will then introduce two novel algorithms for two of the most challenging TL topics: distant domain TL and cross-modality TL (image-text). A detailed algorithm introduction and preliminary results on real-world applications (Covid-19 diagnosis and image clas- sification) will be presented. Then, I will discuss the current trends in TL algorithms and real-world applications. Lastly, I will present the conclusion and future research directions

    DATGAN: Integrating expert knowledge into deeplearning for population synthesis

    Get PDF
    Agent-based simulations and activity-based models used to analyse nationwide transport networks require detailed synthetic populations. These applications are becoming more and more complex and thus require more precise synthetic data. However, standard statistical techniques such as Iterative Proportional Fitting (IPF) or Gibbs sampling fail to provide data with a high enough standard, e.g. these techniques fail to generate rare combinations of attributes, also known as sampling zeros in the literature. Researchers have, thus, been investigating new deep learning techniques such as Generative Adversarial Networks (GANs) for population synthesis. These methods have already shown great success in other fields. However, one fundamental limitation is that GANs are data-driven techniques, and it is thus not possible to integrate expert knowledge in the data generation process. This can lead to the following issues: lack of representativity in the generated data, the introduction of bias, and the possibility of overfitting the sample’s noise. To address these limitations, we present the Directed Acyclic Tabular GAN (DATGAN) to integrate expert knowledge in deep learning models for synthetic populations. This approach allows the interactions between variables to be specified explicitly using a Directed Acyclic Graph (DAG). The DAG is then converted to a network of modified Long Short-Term Memory (LSTM) cells. Two types of multi-input LSTM cells have been developed to allow such structure in the generator. The DATGAN is then tested on the Chicago travel survey dataset. We show that our model outperforms state-of-the-art methods on Machine Learning efficacy and statistical metrics

    Unsupervised detection of decoupled subspaces: many-body scars and beyond

    Full text link
    Highly excited eigenstates of quantum many-body systems are typically featureless thermal states. Some systems, however, possess a small number of special, low-entanglement eigenstates known as quantum scars. We introduce a quantum-inspired machine learning platform based on a Quantum Variational Autoencoder (QVAE) that detects families of scar states in spectra of many-body systems. Unlike a classical autoencoder, QVAE performs a parametrized unitary operation, allowing us to compress a single eigenstate into a smaller number of qubits. We demonstrate that the autoencoder trained on a scar state is able to detect the whole family of scar states sharing common features with the input state. We identify families of quantum many-body scars in the PXP model beyond the Z2\mathbb{Z}_2 and Z3\mathbb{Z}_3 families and find dynamically decoupled subspaces in the Hilbert space of disordered, interacting spin ladder model. The possibility of an automatic detection of subspaces of scar states opens new pathways in studies of models with a weak breakdown of ergodicity and fragmented Hilbert spaces.Comment: Author accepted manuscript to be published in PR

    Identifying and Disentangling Interleaved Activities of Daily Living from Sensor Data

    Get PDF
    Activity discovery (AD) refers to the unsupervised extraction of structured activity data from a stream of sensor readings in a real-world or virtual environment. Activity discovery is part of the broader topic of activity recognition, which has potential uses in fields as varied as social work and elder care, psychology and intrusion detection. Since activity recognition datasets are both hard to come by, and very time consuming to label, the development of reliable activity discovery systems could be of significant utility to the researchers and developers working in the field, as well as to the wider machine learning community. This thesis focuses on the investigation of activity discovery systems that can deal with interleaving, which refers to the phenomenon of continuous switching between multiple high-level activities over a short period of time. This is a common characteristic of the real-world datastreams that activity discovery systems have to deal with, but it is one that is unfortunately often left unaddressed in the existing literature. As part of the research presented in this thesis, the fact that activities exist at multiple levels of abstraction is highlighted. A single activity is often a constituent element of a larger, more complex activity, and in turn has constituents of its own that are activities. Thus this investigation necessarily considers activity discovery systems that can find these hierarchies. The primary contribution of this thesis is the development and evaluation of an activity discovery system that is capable of identifying interleaved activities in sequential data. Starting from a baseline system implemented using a topic model, novel approaches are proposed making use of modern language models taken from the field of natural language processing, before moving on to more advanced language modelling that can handle complex, interleaved data. As well as the identification of activities, the thesis also proposes the abstraction of activities into larger, more complex activities. This allows for the construction of hierarchies of activities that more closely reflect the complex inherent structure of activities present in real-world datasets compared to other approaches. The thesis also discusses a number of important issues relating to the evaluation of activity discovery systems, and examines how existing evaluation metrics may at times be misleading. This includes highlighting the existence of differing abstraction issues in activity discovery evaluation, and suggestions for how this problem can be mitigated. Finally, alternative evaluation metrics are investigated. Naturally, this dissertation does not fully solve the problem of activity discovery, and work remains to be done. However, a number of the most pressing issues that affect real-world activity discovery systems are tackled head-on, and show that useful progress can indeed be made on them. This work aims to benefit systems that are as “clean slate as possible, and hence incorporate no domain-specific knowledge. This is perhaps somewhat of an artificial handicap to impose in this problem domain, but it does have the advantage of making this work applicable to as broad a range of domains as possible

    The Infinity Mirror Test for Graph Models

    Full text link
    Graph models, like other machine learning models, have implicit and explicit biases built-in, which often impact performance in nontrivial ways. The model's faithfulness is often measured by comparing the newly generated graph against the source graph using any number or combination of graph properties. Differences in the size or topology of the generated graph therefore indicate a loss in the model. Yet, in many systems, errors encoded in loss functions are subtle and not well understood. In the present work, we introduce the Infinity Mirror test for analyzing the robustness of graph models. This straightforward stress test works by repeatedly fitting a model to its own outputs. A hypothetically perfect graph model would have no deviation from the source graph; however, the model's implicit biases and assumptions are exaggerated by the Infinity Mirror test, exposing potential issues that were previously obscured. Through an analysis of thousands of experiments on synthetic and real-world graphs, we show that several conventional graph models degenerate in exciting and informative ways. We believe that the observed degenerative patterns are clues to the future development of better graph models.Comment: This was submitted to IEEE TKDE 2020, 12 pages and 8 figure
    corecore