3 research outputs found
Causal Graph Discovery For Hydrological Time Series Knowledge Discovery
Causal inference or causal relationship discovery is an important task in hydrological study to explore the causes of abnormal hydrology phenomena such as drought and flood, which will help improving our prediction and response ability to natural disasters. Different from generic causality study where causalrelation discovery is sufficient, for extreme hydrological situation prediction and modeling, we need not only to construct a causal graph to reveal the contributing factors, but also to provide the lead time of each cause to its effect. Lead time is the time difference between the occurrence of lead and effect. Though causal inference or causal relationship discovery has been a major topic in many science problems, majority of the work has been focused on the validity of such relationship with no knowledge on cause-effect time lead information. Such insight is critical for hydrological modeling and prediction, in which time lead information is desired for knowing how long different factors will affect certain extreme situations such as flood or drought. The most commonly used computational algorithms for causality discovered can be categorized as using regression approaches or Bayesian approaches. Regression based approaches such as Granger\u27s causality assume linear causality and first order causal relationship. Bayesian approaches, such as the PC algorithm from Pearl\u27s causality definition, have exponential runtime complexity which makes it difficult to be applied to hydrological systems with a high number of variables. Furthermore, no existing approaches incorporate the lead time concept in the discovery of causal relationship. In this paper, we propose a new approach, mutual information causal (MI-Causal), for causal relationship discovery, which embodies the advantages of existing approaches and overcomes the limitations to satisfy the hydrologic need. The experimental results from both synthetic and real time hydrological data show that our proposed method outperforms regression approaches and Bayesian based approaches
Exploring Technical Phrase Frames from Research Paper Titles
This paper proposes a method for exploring technical phrase frames by extracting word n-grams that match our information needs and interests from research paper titles. Technical phrase frames, the outcome of our method, are phrases with wildcards that may be substituted for any technical term. Our method, first of all, extracts word trigrams from research paper titles and constructs a co-occurrence graph of the trigrams. Even by simply applying Page Rank algorithm to the co-occurrence graph, we obtain the trigrams that can be regarded as technical key phrases at the higher ranks in terms of Page Rank score. In contrast, our method assigns weights to the edges of the co-occurrence graph based on Jaccard similarity between trigrams and then apply weighted Page Rank algorithm. Consequently, we obtain widely different but more interesting results. While the top-ranked trigrams obtained by unweighted Page Rank have just a self-contained meaning, those obtained by our method are technical phrase frames, i.e., A word sequence that forms a complete technical phrase only after putting a technical word (or words) before or/and after it. We claim that our method is a useful tool for discovering important phrase logical patterns, which can expand query keywords for improving information retrieval performance and can also work as candidate phrasings in technical writing to make our research papers attractive.29th IEEE International Conference on Advanced Information Networking and Applications Workshops, WAINA 2015; Gwangju; South Korea; 25 March 2015 through 27 March 201
Recommended from our members
Mining Patterns and Networks from Sequence Data
Sequence data are ubiquitous in diverse domains such as bioinformatics, computational neuroscience, and user behavior analysis. As a result, many critical applications require extracting knowledge from sequences in multi-level. For example, mining frequent patterns is the central goal of motif discovery in biological sequences, while in computational neuronal science, one essential task is to infer causal networks from neural event sequences (spike trains). Given the wide application of pattern and network mining tools for sequence data, they are facing new challenges posted by modern instruments. That is, as large scale and high resolution sequence data become available, we need new methods with better efficiency and higher accuracy.In this dissertation, we propose several approaches to improve existing pattern and network mining tools to meet new challenges in terms of efficiency and accuracy. The first problem is how to scale existing motif discovery algorithms. Our work on motif discovery focuses on the challenge of discovering motifs from a large scale of short sequences that none of existing motif finding algorithms can handle. We propose an anchor based clustering algorithm that could significantly improve the scalability of all the existing motif finding algorithms without losing accuracy at all. In particular, our algorithm could reduce the running time of a very popular motif finding algorithm, MEME, from weeks to a few minutes with even better accuracy.In another work, we study the problem of how to accurately infer a functional network from neural recordings (spike trains), which is an essential task in many real world applications such as diagnosing neurodegenerative diseases. We introduce a statistical tool that could be used to accurately identify inhibitory causal relations from spike trains. While most of existing works devote their efforts on characterizing the statistics of neural spike trains, we show that it is crucial to make predictions about the response of neurons to changes. More importantly, our results are validated by real biological experiments with a novel instrument, which makes this work the first of its kind. Furthermore, while most existing methods focus on learning functional networks from purely observational data, we propose an active learning framework that could intelligently generate and utilize interventional data. We demonstrate that by intelligently adopting interventional data using the active learning models we propose, the accuracy of the inferred functional network could be substantially improved with the same amount of training data