492 research outputs found

    Improvement the Community Detection with Graph Autoencoder in Social Network Using Correlation-Based Feature Selection Method

    Get PDF
    مقدمة: في هذا البحث ، نهدف إلى تحسين طرق اكتشاف المجتمع باستخدام Graph Autoencoder. يعد اكتشاف المجتمع مرحلة حاسمة لفهم الشبكات الاجتماعية وتكوينها. طرق العمل: نقترح إطار عمل اكتشاف المجتمع باستخدام نموذج Graph Autoencoder  (CDGAE)، حيث قمنا بدمج ميزة العقد مع هيكل الشبكة كمدخل لطريقتنا. تستخدم CDGAE إستراتيجية قائمة على قياس المركزية للتعامل مع مجموعة البيانات الخالية من الميزات من خلال توفير ميزات اصطناعية لعقدها. تم تحسين أداء النموذج من خلال تطبيق تحديد الميزة على ميزات العقدة. يتمثل الابتكار الأساسي لـ CDGAE في إضافة عدد المجتمعات التي تم حسابها باستخدام Bethe Hessian Matrix في طبقة عنق الزجاجة لبنية Graph Autoencoder (GAE) ، لاستخراج المجتمعات مباشرةً دون استخدام أي خوارزميات تجميع. الاستنتاجات: وفقًا للنتائج التجريبية ، تؤدي إضافة ميزات اصطناعية إلى عقد مجموعة البيانات إلى تحسين الأداء. بالإضافة إلى ذلك ، حصلنا على نتائج افضل بكثير في اكتشاف المجتمع  باستخدام طريقة اختيار الميزة وبتعميق نموذج. أظهرت النتائج التجريبية أن نهجنا يتفوق على الخوارزميات الموجودة.Background: In this paper, we aim to improve community detection methods using Graph Autoencoder.  Community detection is a crucial stage in comprehend the purpose and composition of social networks. Materials and Methods: We propose a Community Detection framework using the Graph Autoencoder (CDGAE) model, we combined the nodes feature with the network topology as input to our method. A centrality measurement-based strategy is used by CDGAE to deal with the featureless dataset by providing artificial attributes to its nodes. The performance of the model was improved by applying feature selection to node features The basic innovation of CDGAE is that added the number of communities counted using the Bethe Hessian Matrix in the bottleneck layer of the graph autoencoder (GAE) structure, to directly extract communities without using any clustering algorithms. Results: According to experimental findings, adding artificial features to the dataset's nodes improves performance. Additionally, the outcomes in community detection were much better with the feature selection method and a deeper model. Experimental evidence has shown that our approach outperforms existing algorithms. Conclusion: In this study, we suggest a community detection framework using graph autoencoder (CDMEC). In order to take advantage of GAE's ability to combine node features with the network topology, we add node features to the featureless graph nodes using centrality measurement. By applying the feature selection to the features of the nodes, the performance of the model has improved significantly, due to the elimination of data noise. Additionally, the inclusion of the number of communities in the bottleneck layer of the GAE structure allowed us to do away with clustering algorithms, which helped decrease the complexity time. deepening the model also improved the community detection. Because social media platforms are dynamic

    Deep Learning for Community Detection: Progress, Challenges and Opportunities

    Full text link
    As communities represent similar opinions, similar functions, similar purposes, etc., community detection is an important and extremely useful tool in both scientific inquiry and data analytics. However, the classic methods of community detection, such as spectral clustering and statistical inference, are falling by the wayside as deep learning techniques demonstrate an increasing capacity to handle high-dimensional graph data with impressive performance. Thus, a survey of current progress in community detection through deep learning is timely. Structured into three broad research streams in this domain - deep neural networks, deep graph embedding, and graph neural networks, this article summarizes the contributions of the various frameworks, models, and algorithms in each stream along with the current challenges that remain unsolved and the future research opportunities yet to be explored.Comment: Accepted Paper in the 29th International Joint Conference on Artificial Intelligence (IJCAI 20), Survey Trac

    Ensemble Feature Learning-Based Event Classification for Cyber-Physical Security of the Smart Grid

    Get PDF
    The power grids are transforming into the cyber-physical smart grid with increasing two-way communications and abundant data flows. Despite the efficiency and reliability promised by this transformation, the growing threats and incidences of cyber attacks targeting the physical power systems have exposed severe vulnerabilities. To tackle such vulnerabilities, intrusion detection systems (IDS) are proposed to monitor threats for the cyber-physical security of electrical power and energy systems in the smart grid with increasing machine-to-machine communication. However, the multi-sourced, correlated, and often noise-contained data, which record various concurring cyber and physical events, are posing significant challenges to the accurate distinction by IDS among events of inadvertent and malignant natures. Hence, in this research, an ensemble learning-based feature learning and classification for cyber-physical smart grid are designed and implemented. The contribution of this research are (i) the design, implementation and evaluation of an ensemble learning-based attack classifier using extreme gradient boosting (XGBoost) to effectively detect and identify attack threats from the heterogeneous cyber-physical information in the smart grid; (ii) the design, implementation and evaluation of stacked denoising autoencoder (SDAE) to extract highlyrepresentative feature space that allow reconstruction of a noise-free input from noise-corrupted perturbations; (iii) the design, implementation and evaluation of a novel ensemble learning-based feature extractors that combine multiple autoencoder (AE) feature extractors and random forest base classifiers, so as to enable accurate reconstruction of each feature and reliable classification against malicious events. The simulation results validate the usefulness of ensemble learning approach in detecting malicious events in the cyber-physical smart grid

    Development of a Reference Design for Intrusion Detection Using Neural Networks for a Smart Inverter

    Get PDF
    The purpose of this thesis is to develop a reference design for a base level implementation of an intrusion detection module using artificial neural networks that is deployed onto an inverter and runs on live data for cybersecurity purposes, leveraging the latest deep learning algorithms and tools. Cybersecurity in the smart grid industry focuses on maintaining optimal standards of security in the system and a key component of this is being able to detect cyberattacks. Although researchers and engineers aim to design such devices with embedded security, attacks can and do still occur. The foundation for eventually mitigating these attacks and achieving more robust security is to identify them reliably. Thus, a high-fidelity intrusion detection system (IDS) capable of identifying a variety of attacks must be implemented. This thesis provides an implementation of a behavior-based intrusion detection system that uses a recurrent artificial neural network deployed on hardware to detect cyberattacks in real time. Leveraging the growing power of artificial intelligence, the strength of this approach is that given enough data, it is capable of learning to identify highly complex patterns in the data that may even go undetected by humans. By intelligently identifying malicious activity at the fundamental behavior level, the IDS remains robust against new methods of attack. This work details the process of collecting and simulating data, selecting the particular algorithm, training the neural network, deploying the neural network onto hardware, and then being able to easily update the deployed model with a newly trained one. The full system is designed with a focus on modularity, such that it can be easily adapted to perform well on different use cases, different hardware, and fulfill changing requirements. The neural network behavior-based IDS is found to be a very powerful method capable of learning highly complex patterns and identifying intrusion from different types of attacks using a single unified algorithm, achieving up to 98% detection accuracy in distinguishing between normal and anomalous behavior. Due to the ubiquitous nature of this approach, the pipeline developed here can be applied in the future to build in more and more sophisticated detection abilities depending on the desired use case. The intrusion detection module is implemented in an ARM processor that exists at the communication layer of the inverter. There are four main components described in this thesis that explain the process of deploying an artificial neural network intrusion detection algorithm onto the inverter: 1) monitoring and collecting data through a front-end web based graphical user interface that interacts with a Digital Signal Processor that is connected to power-electronics, 2) simulating various malicious datasets based on attack vectors that violate the Confidentiality-Integrity-Availability security model, 3) training and testing the neural network to ensure that it successfully identifies normal behavior and malicious behavior with a high degree of accuracy, and lastly 4) deploying the machine learning algorithm onto the hardware and having it successfully classify the behavior as normal or malicious with the data feeding into the model running in real time. The results from the experimental setup will be analyzed, a conclusion will be made based upon the work, and lastly discussions of future work and optimizations will be discussed

    Sparse Similarity and Network Navigability for Markov Clustering Enhancement

    Get PDF
    Markov clustering (MCL) is an effective unsupervised pattern recognition algorithm for data clustering in high-dimensional feature space that simulates stochastic flows on a network of sample similarities to detect the structural organization of clusters in the data. However, it presents two main drawbacks: (1) its community detection performance in complex networks has been demonstrating results far from the state-of-the-art methods such as Infomap and Louvain, and (2) it has never been generalized to deal with data nonlinearity. In this work both aspects, although closely related, are taken as separated issues and addressed as such. Regarding the community detection, field under the network science ceiling, the crucial issue is to convert the unweighted network topology into a ‘smart enough’ pre-weighted connectivity that adequately steers the stochastic flow procedure behind Markov clustering. Here a conceptual innovation is introduced and discussed focusing on how to leverage network latent geometry notions in order to design similarity measures for pre-weighting the adjacency matrix used in Markov clustering community detection. The results demonstrate that the proposed strategy improves Markov clustering significantly, to the extent that it is often close to the performance of current state-of-the-art methods for community detection. These findings emerge considering both synthetic ‘realistic’ networks (with known ground-truth communities) and real networks (with community metadata), even when the real network connectivity is corrupted by noise artificially induced by missing or spurious links. Regarding the nonlinearity aspect, the development of algorithms for unsupervised pattern recognition by nonlinear clustering is a notable problem in data science. Minimum Curvilinearity (MC) is a principle that approximates nonlinear sample distances in the high-dimensional feature space by curvilinear distances, which are computed as transversal paths over their minimum spanning tree, and then stored in a kernel. Here, a nonlinear MCL algorithm termed MC-MCL is proposed, which is the first nonlinear kernel extension of MCL and exploits Minimum Curvilinearity to enhance the performance of MCL in real and synthetic high-dimensional data with underlying nonlinear patterns. Furthermore, improvements in the design of the so-called MC-kernel by applying base modifications to better approximate the data hidden geometry have been evaluated with positive outcomes. Thus, different nonlinear MCL versions are compared with baseline and state-of-art clustering methods, including DBSCAN, K-means, affinity propagation, density peaks, and deep-clustering. As result, the design of a suitable nonlinear kernel provides a valuable framework to estimate nonlinear distances when its kernel is applied in combination with MCL. Indeed, nonlinear-MCL variants overcome classical MCL and even state-of-art clustering algorithms in different nonlinear datasets. This dissertation discusses the enhancements and the generalized understanding of how network geometry plays a fundamental role in designing algorithms based on network navigability

    Neural correlates of post-traumatic brain injury (TBI) attention deficits in children

    Get PDF
    Traumatic brain injury (TBI) in children is a major public health concern worldwide. Attention deficits are among the most common neurocognitive and behavioral consequences in children post-TBI which have significant negative impacts on their educational and social outcomes and compromise the quality of their lives. However, there is a paucity of evidence to guide the optimal treatment strategies of attention deficit related symptoms in children post-TBI due to the lack of understanding regarding its neurobiological substrate. Thus, it is critical to understand the neural mechanisms associated with TBI-induced attention deficits in children so that more refined and tailored strategies can be developed for diagnoses and long-term treatments and interventions. This dissertation is the first study to investigate neurobiological substrates associated with post-TBI attention deficits in children using both anatomical and functional neuroimaging data. The goals of this project are to discover the quantitatively measurable markers utilizing diffusion tensor imaging (DTI), structural magnetic resonance imaging (MRI), and functional MRI (fMRI) techniques, and to further identify the most robust neuroimaging features in predicting severe post-TBI attention deficits in children, by utilizing machine learning and deep learning techniques. A total of 53 children with TBI and 55 controls from age 9 to 17 are recruited. The results show that the systems-level topological properties in left frontal regions, parietal regions, and medial occipitotemporal regions in structural and functional brain network are significantly associated with inattentive and/or hyperactive/impulsive symptoms in children post-TBI. Semi-supervised deep learning modeling further confirms the significant contributions of these brain features in the prediction of elevated attention deficits in children post-TBI. The findings of this project provide valuable foundations for future research on developing neural markers for TBI-induced attention deficits in children, which may significantly assist the development of more effective and individualized diagnostic and treatment strategies

    Knowledge discovery with recommenders for big data management in science and engineering communities

    Get PDF
    Recent science and engineering research tasks are increasingly becoming dataintensive and use workflows to automate integration and analysis of voluminous data to test hypotheses. Particularly, bold scientific advances in areas of neuroscience and bioinformatics necessitate access to multiple data archives, heterogeneous software and computing resources, and multi-site interdisciplinary expertise. Datasets are evolving, and new tools are continuously invented for achieving new state-of-the-art performance. Principled cyber and software automation approaches to data-intensive analytics using systematic integration of cyberinfrastructure (CI) technologies and knowledge discovery driven algorithms will significantly enhance research and interdisciplinary collaborations in science and engineering. In this thesis, we demonstrate a novel recommender approach to discover latent knowledge patterns from both the infrastructure perspective (i.e., measurement recommender) and the applications perspective (i.e., topic recommender and scholar recommender). In the infrastructure perspective, we identify and diagnose network-wide anomaly events to address performance bottleneck by proposing a novel measurement recommender scheme. In cases where there is a lack of ground truth in networking performance monitoring (e.g., perfSONAR deployments), it is hard to pinpoint the root-cause analysis in a multi-domain context. To solve this problem, we define a "social plane" concept that relies on recommendation schemes to share diagnosis knowledge or work collaboratively. Our solution makes it easier for network operators and application users to quickly and effectively troubleshoot performance bottlenecks on wide-area network backbones. To evaluate our "measurement recommender", we use both real and synthetic datasets. The results show our measurement recommender scheme has high performance in terms of precision, recall, and accuracy, as well as efficiency in terms of the time taken for large volume measurement trace analysis. In the application perspective, our goal is to shorten time to knowledge discovery and adapt prior domain knowledge for computational and data-intensive communities. To achieve this goal, we design a novel topic recommender that leverages a domain-specific topic model (DSTM) algorithm to help scientists find the relevant tools or datasets for their applications. The DSTM is a probabilistic graphical model that extends the Latent Dirichlet Allocation (LDA) and uses the Markov chain Monte Carlo (MCMC) algorithm to infer latent patterns within a specific domain in an unsupervised manner. We evaluate our scheme based on large collections of the dataset (i.e., publications, tools, datasets) from bioinformatics and neuroscience domains. Our experiments result using the perplexity metric show that our model has better generalization performance within a domain for discovering highly-specific latent topics. Lastly, to enhance the collaborations among scholars to generate new knowledge, it is necessary to identify scholars with their specific research interests or cross-domain expertise. We propose a "ScholarFinder" model to quantify expert knowledge based on publications and funding records using a deep generative model. Our model embeds scholars' knowledge in order to recommend suitable scholars to perform multi-disciplinary tasks. We evaluate our model with state-of-the-art baseline models (e.g., XGBoost, DNN), and experiment results show that our ScholarFinder model outperforms state-ofthe-art models in terms of precision, recall, F1-score, and accuracy.Includes bibliographical references (pages 113-124)
    corecore