536,338 research outputs found
Clones in Graphs
Finding structural similarities in graph data, like social networks, is a
far-ranging task in data mining and knowledge discovery. A (conceptually)
simple reduction would be to compute the automorphism group of a graph.
However, this approach is ineffective in data mining since real world data does
not exhibit enough structural regularity. Here we step in with a novel approach
based on mappings that preserve the maximal cliques. For this we exploit the
well known correspondence between bipartite graphs and the data structure
formal context from Formal Concept Analysis. From there we utilize
the notion of clone items. The investigation of these is still an open problem
to which we add new insights with this work. Furthermore, we produce a
substantial experimental investigation of real world data. We conclude with
demonstrating the generalization of clone items to permutations.Comment: 11 pages, 2 figures, 1 tabl
Sequential Patterns Post-processing for Structural Relation Patterns Mining
Sequential patterns mining is an important data-mining technique used to identify frequently observed sequential
occurrence of items across ordered transactions over time. It has been extensively studied in the literature, and there
exists a diversity of algorithms. However, more complex structural patterns are often hidden behind sequences.
This article begins with the introduction of a model for the representation of sequential patterns—Sequential
Patterns Graph—which motivates the search for new structural relation patterns. An integrative framework for
the discovery of these patterns–Postsequential Patterns Mining–is then described which underpins the postprocessing
of sequential patterns. A corresponding data-mining method based on sequential patterns postprocessing
is proposed and shown to be effective in the search for concurrent patterns. From experiments conducted on three
component algorithms, it is demonstrated that sequential patterns-based concurrent patterns mining provides
an efficient method for structural knowledge discover
In-silico Predictive Mutagenicity Model Generation Using Supervised Learning Approaches
With the advent of High Throughput Screening techniques, it is feasible to filter possible leads from a mammoth chemical space that can act against a particular target and inhibit its action. Virtual screening complements the in-vitro assays which are costly and time consuming. This process is used to sort biologically active molecules by utilizing the structural and chemical information of the compounds and the target proteins in order to screen potential hits. Various data mining and machine learning tools utilize Molecular Descriptors through the knowledge discovery process using classifier algorithms that classify the potentially active hits for the drug development process.

Constraint-Based Causal Discovery using Partial Ancestral Graphs in the presence of Cycles
While feedback loops are known to play important roles in many complex
systems, their existence is ignored in a large part of the causal discovery
literature, as systems are typically assumed to be acyclic from the outset.
When applying causal discovery algorithms designed for the acyclic setting on
data generated by a system that involves feedback, one would not expect to
obtain correct results. In this work, we show that---surprisingly---the output
of the Fast Causal Inference (FCI) algorithm is correct if it is applied to
observational data generated by a system that involves feedback. More
specifically, we prove that for observational data generated by a simple and
-faithful Structural Causal Model (SCM), FCI is sound and complete, and
can be used to consistently estimate (i) the presence and absence of causal
relations, (ii) the presence and absence of direct causal relations, (iii) the
absence of confounders, and (iv) the absence of specific cycles in the causal
graph of the SCM. We extend these results to constraint-based causal discovery
algorithms that exploit certain forms of background knowledge, including the
causally sufficient setting (e.g., the PC algorithm) and the Joint Causal
Inference setting (e.g., the FCI-JCI algorithm).Comment: Major revision. To appear in Proceedings of the 36 th Conference on
Uncertainty in Artificial Intelligence (UAI), PMLR volume 124, 202
A data science approach to pattern discovery in complex structures with applications in bioinformatics
Pattern discovery aims to find interesting, non-trivial, implicit, previously unknown and potentially useful patterns in data. This dissertation presents a data science approach for discovering patterns or motifs from complex structures, particularly complex RNA structures. RNA secondary and tertiary structure motifs are very important in biological molecules, which play multiple vital roles in cells. A lot of work has been done on RNA motif annotation. However, pattern discovery in RNA structure is less studied. In the first part of this dissertation, an ab initio algorithm, named DiscoverR, is introduced for pattern discovery in RNA secondary structures. This algorithm works by representing RNA secondary structures as ordered labeled trees and performs tree pattern discovery using a quadratic time dynamic programming algorithm. The algorithm is able to identify and extract the largest common substructures from two RNA molecules of different sizes, without prior knowledge of locations and topologies of these substructures.
One application of DiscoverR is to locate the RNA structural elements in genomes. Experimental results show that this tool complements the currently used approaches for mining conserved structural RNAs in the human genome. DiscoverR can also be extended to find repeated regions in an RNA secondary structure. Specifically, this extended method is used to detect structural repeats in the 3\u27-untranslated region of a protein kinase gene
MODERATION EFFECT OF SOCIAL MEDIA TO ENTREPRENEURIAL DISCOVERY AND CREATION, ANTECEDENT’S PRIOR KNOWLEDGE AND ENTREPRENEURIAL ALERTNESS. AN EMPIRICAL STUDY
This study aimed to analyze and evaluate the discovery and creation of opportunities in entrepreneurship moderated by social media. Two variables were used as antecedents namely Prior Knowledge and Entrepreneurial Alertness. This research used structural equation modeling method as well as Partial Least Square (PLS) theory. There were 294 final data processed from entrepreneurs who interacted using social media. From the 5 proposed hypotheses, the 5 hypotheses were significantly positive. Prior Knowledge had a significant impact on Entrepreneurial Alertness, social media had a significant impact in moderating Prior knowledge and Entrepreneurial Alertness on finding and creating opportunities. The novelty of this research explores the concept of entrepreneurship in social media and very few researchers conduct research in this fiel
Transcribing Content from Structural Images with Spotlight Mechanism
Transcribing content from structural images, e.g., writing notes from music
scores, is a challenging task as not only the content objects should be
recognized, but the internal structure should also be preserved. Existing image
recognition methods mainly work on images with simple content (e.g., text lines
with characters), but are not capable to identify ones with more complex
content (e.g., structured symbols), which often follow a fine-grained grammar.
To this end, in this paper, we propose a hierarchical Spotlight Transcribing
Network (STN) framework followed by a two-stage "where-to-what" solution.
Specifically, we first decide "where-to-look" through a novel spotlight
mechanism to focus on different areas of the original image following its
structure. Then, we decide "what-to-write" by developing a GRU based network
with the spotlight areas for transcribing the content accordingly. Moreover, we
propose two implementations on the basis of STN, i.e., STNM and STNR, where the
spotlight movement follows the Markov property and Recurrent modeling,
respectively. We also design a reinforcement method to refine the framework by
self-improving the spotlight mechanism. We conduct extensive experiments on
many structural image datasets, where the results clearly demonstrate the
effectiveness of STN framework.Comment: Accepted by KDD2018 Research Track. In proceedings of the 24th ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD'18
Causality and independence in systems of equations
The technique of causal ordering is used to study causal and probabilistic aspects implied by model equations. Causal discovery algorithms are used to learn causal and dependence structure from data. In this thesis, 'Causality and independence in systems of equations', we explore the relationship between causal ordering and the output of causal discovery algorithms. By combining these techniques, we bridge the gap between the world of dynamical systems at equilibrium and literature regarding causal methods for static systems. In a nutshell, this gives new insights about models with feedback and an improved understanding of observed phenomena in certain (biological) systems. Based on our ideas, we outline a novel approach towards causal discovery for dynamical systems at equilibrium. This work was inspired by a desire to understand why the output of causal discovery algorithms sometimes appears to be at odds with expert knowledge. We were particularly interested in explaining apparent reversals of causal directions when causal discovery methods are applied to protein expression data. We propose the presence of a perfectly adapting feedback mechanism or unknown measurement error as possible explanations for these apparent reversals. We develop conditions for the detection of perfect adaptation from model equations or from data and background knowledge. This can be used to reason about the existence of feedback mechanisms using only partial observations of a system, resulting in additional criteria for data-driven selection of causal models. This line of research was made possible by novel interpretations and extensions of the causal ordering algorithm. Additionally, we challenge a key assumption in many causal discovery algorithms; that the underlying system can be modelled by the well-known class of structural causal models. To overcome the limitations of these models in capturing the causal semantics of dynamical systems at equilibrium, we propose a generalization that we call causal constraints models. Looking beyond standard causal modelling frameworks allows us to further explore the relationship between dynamical models at equilibrium and methods for causal discovery on equilibrium data
Knowledge Discovery from Vibration Measurements
The framework as well as the particular algorithms of pattern recognition process is widely adopted in structural health monitoring (SHM). However, as a part of the overall process of knowledge discovery from data bases (KDD), the results of pattern recognition are only changes and patterns of changes of data features. In this paper, based on the similarity between KDD and SHM and considering the particularity of SHM problems, a four-step framework of SHM is proposed which extends the final goal of SHM from detecting damages to extracting knowledge to facilitate decision making. The purposes and proper methods of each step of this framework are discussed. To demonstrate the proposed SHM framework, a specific SHM method which is composed by the second order structural parameter identification, statistical control chart analysis, and system reliability analysis is then presented. To examine the performance of this SHM method, real sensor data measured from a lab size steel bridge model structure are used. The developed four-step framework of SHM has the potential to clarify the process of SHM to facilitate the further development of SHM techniques
Selective Metal Cation Capture by Soft Anionic Metal-Organic Frameworks via Drastic Single-Crystal-to-Single-Crystal Transformations
In this paper we describe a novel framework for the discovery of the topical
content of a data corpus, and the tracking of its complex structural changes
across the temporal dimension. In contrast to previous work our model does not
impose a prior on the rate at which documents are added to the corpus nor does
it adopt the Markovian assumption which overly restricts the type of changes
that the model can capture. Our key technical contribution is a framework based
on (i) discretization of time into epochs, (ii) epoch-wise topic discovery
using a hierarchical Dirichlet process-based model, and (iii) a temporal
similarity graph which allows for the modelling of complex topic changes:
emergence and disappearance, evolution, and splitting and merging. The power of
the proposed framework is demonstrated on the medical literature corpus
concerned with the autism spectrum disorder (ASD) - an increasingly important
research subject of significant social and healthcare importance. In addition
to the collected ASD literature corpus which we will make freely available, our
contributions also include two free online tools we built as aids to ASD
researchers. These can be used for semantically meaningful navigation and
searching, as well as knowledge discovery from this large and rapidly growing
corpus of literature.Comment: In Proc. Pacific-Asia Conference on Knowledge Discovery and Data
Mining (PAKDD), 201
- …