520,017 research outputs found

    Clones in Graphs

    Full text link
    Finding structural similarities in graph data, like social networks, is a far-ranging task in data mining and knowledge discovery. A (conceptually) simple reduction would be to compute the automorphism group of a graph. However, this approach is ineffective in data mining since real world data does not exhibit enough structural regularity. Here we step in with a novel approach based on mappings that preserve the maximal cliques. For this we exploit the well known correspondence between bipartite graphs and the data structure formal context (G,M,I)(G,M,I) from Formal Concept Analysis. From there we utilize the notion of clone items. The investigation of these is still an open problem to which we add new insights with this work. Furthermore, we produce a substantial experimental investigation of real world data. We conclude with demonstrating the generalization of clone items to permutations.Comment: 11 pages, 2 figures, 1 tabl

    Sequential Patterns Post-processing for Structural Relation Patterns Mining

    Get PDF
    Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences. This article begins with the introduction of a model for the representation of sequential patterns—Sequential Patterns Graph—which motivates the search for new structural relation patterns. An integrative framework for the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides an efficient method for structural knowledge discover

    In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches

    Get PDF
    With the advent of High Throughput Screening techniques, it is feasible to filter possible leads from a mammoth chemical space that can act against a particular target and inhibit its action. Virtual screening complements the in-vitro assays which are costly and time consuming. This process is used to sort biologically active molecules by utilizing the structural and chemical information of the compounds and the target proteins in order to screen potential hits. Various data mining and machine learning tools utilize Molecular Descriptors through the knowledge discovery process using classifier algorithms that classify the potentially active hits for the drug development process.
&#xa

    Constraint-Based Causal Discovery using Partial Ancestral Graphs in the presence of Cycles

    Full text link
    While feedback loops are known to play important roles in many complex systems, their existence is ignored in a large part of the causal discovery literature, as systems are typically assumed to be acyclic from the outset. When applying causal discovery algorithms designed for the acyclic setting on data generated by a system that involves feedback, one would not expect to obtain correct results. In this work, we show that---surprisingly---the output of the Fast Causal Inference (FCI) algorithm is correct if it is applied to observational data generated by a system that involves feedback. More specifically, we prove that for observational data generated by a simple and σ\sigma-faithful Structural Causal Model (SCM), FCI is sound and complete, and can be used to consistently estimate (i) the presence and absence of causal relations, (ii) the presence and absence of direct causal relations, (iii) the absence of confounders, and (iv) the absence of specific cycles in the causal graph of the SCM. We extend these results to constraint-based causal discovery algorithms that exploit certain forms of background knowledge, including the causally sufficient setting (e.g., the PC algorithm) and the Joint Causal Inference setting (e.g., the FCI-JCI algorithm).Comment: Major revision. To appear in Proceedings of the 36 th Conference on Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 202

    A data science approach to pattern discovery in complex structures with applications in bioinformatics

    Get PDF
    Pattern discovery aims to find interesting, non-trivial, implicit, previously unknown and potentially useful patterns in data. This dissertation presents a data science approach for discovering patterns or motifs from complex structures, particularly complex RNA structures. RNA secondary and tertiary structure motifs are very important in biological molecules, which play multiple vital roles in cells. A lot of work has been done on RNA motif annotation. However, pattern discovery in RNA structure is less studied. In the first part of this dissertation, an ab initio algorithm, named DiscoverR, is introduced for pattern discovery in RNA secondary structures. This algorithm works by representing RNA secondary structures as ordered labeled trees and performs tree pattern discovery using a quadratic time dynamic programming algorithm. The algorithm is able to identify and extract the largest common substructures from two RNA molecules of different sizes, without prior knowledge of locations and topologies of these substructures. One application of DiscoverR is to locate the RNA structural elements in genomes. Experimental results show that this tool complements the currently used approaches for mining conserved structural RNAs in the human genome. DiscoverR can also be extended to find repeated regions in an RNA secondary structure. Specifically, this extended method is used to detect structural repeats in the 3\u27-untranslated region of a protein kinase gene

    Transcribing Content from Structural Images with Spotlight Mechanism

    Full text link
    Transcribing content from structural images, e.g., writing notes from music scores, is a challenging task as not only the content objects should be recognized, but the internal structure should also be preserved. Existing image recognition methods mainly work on images with simple content (e.g., text lines with characters), but are not capable to identify ones with more complex content (e.g., structured symbols), which often follow a fine-grained grammar. To this end, in this paper, we propose a hierarchical Spotlight Transcribing Network (STN) framework followed by a two-stage "where-to-what" solution. Specifically, we first decide "where-to-look" through a novel spotlight mechanism to focus on different areas of the original image following its structure. Then, we decide "what-to-write" by developing a GRU based network with the spotlight areas for transcribing the content accordingly. Moreover, we propose two implementations on the basis of STN, i.e., STNM and STNR, where the spotlight movement follows the Markov property and Recurrent modeling, respectively. We also design a reinforcement method to refine the framework by self-improving the spotlight mechanism. We conduct extensive experiments on many structural image datasets, where the results clearly demonstrate the effectiveness of STN framework.Comment: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD'18

    MODERATION EFFECT OF SOCIAL MEDIA TO ENTREPRENEURIAL DISCOVERY AND CREATION, ANTECEDENT’S PRIOR KNOWLEDGE AND ENTREPRENEURIAL ALERTNESS. AN EMPIRICAL STUDY

    Get PDF
    This study aimed to analyze and evaluate the discovery and creation of opportunities in entrepreneurship moderated by social media. Two variables were used as antecedents namely Prior Knowledge and Entrepreneurial Alertness. This research used structural equation modeling method as well as Partial Least Square (PLS) theory. There were 294 final data processed from entrepreneurs who interacted using social media. From the 5 proposed hypotheses, the 5 hypotheses were significantly positive. Prior Knowledge had a significant impact on Entrepreneurial Alertness, social media had a significant impact in moderating Prior knowledge and Entrepreneurial Alertness on finding and creating opportunities. The novelty of this research explores the concept of entrepreneurship in social media and very few researchers conduct research in this fiel

    Causality and independence in systems of equations

    Get PDF
    The technique of causal ordering is used to study causal and probabilistic aspects implied by model equations. Causal discovery algorithms are used to learn causal and dependence structure from data. In this thesis, 'Causality and independence in systems of equations', we explore the relationship between causal ordering and the output of causal discovery algorithms. By combining these techniques, we bridge the gap between the world of dynamical systems at equilibrium and literature regarding causal methods for static systems. In a nutshell, this gives new insights about models with feedback and an improved understanding of observed phenomena in certain (biological) systems. Based on our ideas, we outline a novel approach towards causal discovery for dynamical systems at equilibrium. This work was inspired by a desire to understand why the output of causal discovery algorithms sometimes appears to be at odds with expert knowledge. We were particularly interested in explaining apparent reversals of causal directions when causal discovery methods are applied to protein expression data. We propose the presence of a perfectly adapting feedback mechanism or unknown measurement error as possible explanations for these apparent reversals. We develop conditions for the detection of perfect adaptation from model equations or from data and background knowledge. This can be used to reason about the existence of feedback mechanisms using only partial observations of a system, resulting in additional criteria for data-driven selection of causal models. This line of research was made possible by novel interpretations and extensions of the causal ordering algorithm. Additionally, we challenge a key assumption in many causal discovery algorithms; that the underlying system can be modelled by the well-known class of structural causal models. To overcome the limitations of these models in capturing the causal semantics of dynamical systems at equilibrium, we propose a generalization that we call causal constraints models. Looking beyond standard causal modelling frameworks allows us to further explore the relationship between dynamical models at equilibrium and methods for causal discovery on equilibrium data

    Selective Metal Cation Capture by Soft Anionic Metal-Organic Frameworks via Drastic Single-Crystal-to-Single-Crystal Transformations

    Get PDF
    In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, and splitting and merging. The power of the proposed framework is demonstrated on the medical literature corpus concerned with the autism spectrum disorder (ASD) - an increasingly important research subject of significant social and healthcare importance. In addition to the collected ASD literature corpus which we will make freely available, our contributions also include two free online tools we built as aids to ASD researchers. These can be used for semantically meaningful navigation and searching, as well as knowledge discovery from this large and rapidly growing corpus of literature.Comment: In Proc. Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), 201

    Knowledge Discovery from Vibration Measurements

    Get PDF
    The framework as well as the particular algorithms of pattern recognition process is widely adopted in structural health monitoring (SHM). However, as a part of the overall process of knowledge discovery from data bases (KDD), the results of pattern recognition are only changes and patterns of changes of data features. In this paper, based on the similarity between KDD and SHM and considering the particularity of SHM problems, a four-step framework of SHM is proposed which extends the final goal of SHM from detecting damages to extracting knowledge to facilitate decision making. The purposes and proper methods of each step of this framework are discussed. To demonstrate the proposed SHM framework, a specific SHM method which is composed by the second order structural parameter identification, statistical control chart analysis, and system reliability analysis is then presented. To examine the performance of this SHM method, real sensor data measured from a lab size steel bridge model structure are used. The developed four-step framework of SHM has the potential to clarify the process of SHM to facilitate the further development of SHM techniques
    corecore