21 research outputs found

    Structure Discovery in Bayesian Networks: Algorithms and Applications

    Get PDF
    Bayesian networks are a class of probabilistic graphical models that have been widely used in various tasks for probabilistic inference and causal modeling. A Bayesian network provides a compact, flexible, and interpretable representation of a joint probability distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure from the data. This is called structure discovery. Structure discovery in Bayesian networks is a host of several interesting problem variants. In the optimal Bayesian network learning problem (we call this structure learning), one aims to find a Bayesian network that best explains the data and then utilizes this optimal Bayesian network for predictions or inferences. In others, we are interested in finding the local structural features that are highly probable (we call this structure discovery). Both structure learning and structure discovery are considered very hard because existing approaches to these problems require highly intensive computations. In this dissertation, we develop algorithms to achieve more accurate, efficient and scalable structure discovery in Bayesian networks and demonstrate these algorithms in applications of systems biology and educational data mining. Specifically, this study is conducted in five directions. First of all, we propose a novel heuristic algorithm for Bayesian network structure learning that takes advantage of the idea of curriculum learning and learns Bayesian network structures by stages. We prove theoretical advantages of our algorithm and also empirically show that it outperforms the state-of-the-art heuristic approach in learning Bayesian network structures. Secondly, we develop an algorithm to efficiently enumerate the k-best equivalence classes of Bayesian networks where Bayesian networks in the same equivalence class are equally expressive in terms of representing probability distributions. We demonstrate our algorithm in the task of Bayesian model averaging. Our approach goes beyond the maximum-a-posteriori (MAP) model by listing the most likely network structures and their relative likelihood and therefore has important applications in causal structure discovery. Thirdly, we study how parallelism can be used to tackle the exponential time and space complexity in the exact Bayesian structure discovery. We consider the problem of computing the exact posterior probabilities of directed edges in Bayesian networks. We present a parallel algorithm capable of computing the exact posterior probabilities of all possible directed edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways. Fourthly, we develop novel algorithms for computing the exact posterior probabilities of ancestor relations in Bayesian networks. Existing algorithm assumes an order-modular prior over Bayesian networks that does not respect Markov equivalence. Our algorithm allows uniform prior and respects the Markov equivalence. We apply our algorithm to a biological data set for discovering protein signaling pathways. Finally, we introduce Combined student Modeling and prerequisite Discovery (COMMAND), a novel algorithm for jointly inferring a prerequisite graph and a student model from student performance data. COMMAND learns the skill prerequisite relations as a Bayesian network, which is capable of modeling the global prerequisite structure and capturing the conditional independence between skills. Our experiments on simulations and real student data suggest that COMMAND is better than prior methods in the literature. COMMAND is useful for designing intelligent tutoring systems that assess student knowledge or that offer remediation interventions to students

    Computing Posterior Probabilities of Ancestor Relations in Bayesian Networks

    Get PDF
    In this paper we develop a dynamic programming algorithm to compute the exact posterior probabilities of ancestor relations in Bayesian networks. Previous dynamic programming (DP) algorithm by (Parviainen and Koivisto, 2011) evaluates all possible ancestor relations in time O(n3n) and space O(3n) . However, their algorithm assumes an order-modular prior over DAGs that does not respect Markov equivalence. The resulting posteriors would bias towards DAGs consistent with more linear orders. To adhere to uniform prior, we develop a new DP algorithm that computes the exact posteriors of all possible ancestor relations in time O(n5n-1) and space O(3n)

    A Parallel Algorithm for Exact Bayesian Structure Discovery in Bayesian Networks

    Full text link
    Exact Bayesian structure discovery in Bayesian networks requires exponential time and space. Using dynamic programming (DP), the fastest known sequential algorithm computes the exact posterior probabilities of structural features in O(2(d+1)n2n)O(2(d+1)n2^n) time and space, if the number of nodes (variables) in the Bayesian network is nn and the in-degree (the number of parents) per node is bounded by a constant dd. Here we present a parallel algorithm capable of computing the exact posterior probabilities for all n(n1)n(n-1) edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. That is, if p=2kp=2^k processors are used, the run-time reduces to O(5(d+1)n2nk+k(nk)d)O(5(d+1)n2^{n-k}+k(n-k)^d) and the space usage becomes O(n2nk)O(n2^{n-k}) per processor. Our algorithm is based the observation that the subproblems in the sequential DP algorithm constitute a nn-DD hypercube. We take a delicate way to coordinate the computation of correlated DP procedures such that large amount of data exchange is suppressed. Further, we develop parallel techniques for two variants of the well-known \emph{zeta transform}, which have applications outside the context of Bayesian networks. We demonstrate the capability of our algorithm on datasets with up to 33 variables and its scalability on up to 2048 processors. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways.Comment: 32 pages, 12 figure

    Spatial motif discovery in papain-like cysteine protease family

    Get PDF
    Spatial motifs, which are amino acid packing patterns, occur frequently within a set of proteins with some common specific functions and features. In this study, we report the application of a novel frequent subgraph mining algorithm to retrieve conserved spatial motifs from protein 3D structures of Papain-like cysteine protease family. Each of the frequent spatial motifs we identified were found highly specific to the PCP family, measured by P-value<10-49. And we showed that the combination of these family specific motifs can discriminate between the PCP family members and the background (a nonredundant subset of PDB) with very good sensitivity and predicative accuracy. These spatial motifs were found to cover either structurally important or functionally important sites, such as the catalytic dyad and the hydrophobic pocket that determines the substrate specificity. A PROSITE-like consensus sequence pattern assembled by mapping these structural motifs to sequence level identifies the PCP sequences in Swiss-Prot database with 100% precision and good recall. These suggest that structurally and functionally specific amino acid packing patterns or motifs can be discovered by computational and statistical geometry analysis of protein structures and used to annotate protein structures and sequences

    Opinion Mining from Online Reviews: Consumer Satisfaction Analysis with B&B Hotels

    Get PDF
    Given the enormous growth and significant impact of user generated content in online hotel reviews, this study aims to mining the determinants of consumer satisfaction with B&Bs and build a hierarchical structure of these determinants. Content analysis was conducted based on the consumer review data from two well-known hotel booking websites. Ten determinants of customer satisfaction were identified. The interpretive structural modeling (ISM) technique was then used to develop a five-level hierarchical structural model based on these determinants to illustrate the influencing paths. Finally, the cross-impact matrix multiplication applied to classification (MICMAC) technique was used to analyze the driver and dependence power for each determinant. This study has the potential to make significant contributions from both the theoretical and practical perspectives in this research area

    GPR35 acts a dual role and therapeutic target in inflammation

    Get PDF
    GPR35 is a G protein-coupled receptor with notable involvement in modulating inflammatory responses. Although the precise role of GPR35 in inflammation is not yet fully understood, studies have suggested that it may have both pro- and anti-inflammatory effects depending on the specific cellular environment. Some studies have shown that GPR35 activation can stimulate the production of pro-inflammatory cytokines and facilitate the movement of immune cells towards inflammatory tissues or infected areas. Conversely, other investigations have suggested that GPR35 may possess anti-inflammatory properties in the gastrointestinal tract, liver and certain other tissues by curbing the generation of inflammatory mediators and endorsing the differentiation of regulatory T cells. The intricate role of GPR35 in inflammation underscores the requirement for more in-depth research to thoroughly comprehend its functional mechanisms and its potential significance as a therapeutic target for inflammatory diseases. The purpose of this review is to concurrently investigate the pro-inflammatory and anti-inflammatory roles of GPR35, thus illuminating both facets of this complex issue

    321 Tb/s E/S/C/L-band Transmission with E-band Bismuth-Doped Fiber Amplifier and Optical Processor

    Get PDF
    Using a newly developed bismuth doped fiber amplifier operating across the E-band and a multi-port optical processor, we investigate wideband E/S/C/L-band transmission with signal bandwidths up to 27.8 THz and distances up to 200 km. Dense wavelength-division multiplexed (D-WDM) transmission is enabled by using a combination of thulium, erbium and bismuth doped-fiber amplifiers in combination with distributed Raman amplification. For 50 km transmission, we transmit a wideband DWDM signal comprising 1097 channels covering 212.3 nm (27.8 THz) from 1410.8 nm to 1623.1 nm for a record single-mode fiber (SMF) data-rate of 321 Tb/s (301 Tb/s after decoding), an increase of 25% on the previous record data-rate. We further show single span transmission at 100 km and 150 km before recording 270.9 Tb/s (258.1 Tb/s after decoding) for 200 km transmission over 2 amplified spans. These results show the potential of E-band transmission, to increase the information carrying capability of optical fibers and open the door to multi-band fiber networks built on already deployed fibers

    Structure Discovery in Bayesian Networks: Algorithms and Applications

    No full text
    Bayesian networks are a class of probabilistic graphical models that have been widely used in various tasks for probabilistic inference and causal modeling. A Bayesian network provides a compact, flexible, and interpretable representation of a joint probability distribution. When the network structure is unknown but there are observational data at hand, one can try to learn the network structure from the data. This is called structure discovery. Structure discovery in Bayesian networks is a host of several interesting problem variants. In the optimal Bayesian network learning problem (we call this structure learning), one aims to find a Bayesian network that best explains the data and then utilizes this optimal Bayesian network for predictions or inferences. In others, we are interested in finding the local structural features that are highly probable (we call this structure discovery). Both structure learning and structure discovery are considered very hard because existing approaches to these problems require highly intensive computations. In this dissertation, we develop algorithms to achieve more accurate, efficient and scalable structure discovery in Bayesian networks and demonstrate these algorithms in applications of systems biology and educational data mining. Specifically, this study is conducted in five directions. First of all, we propose a novel heuristic algorithm for Bayesian network structure learning that takes advantage of the idea of curriculum learning and learns Bayesian network structures by stages. We prove theoretical advantages of our algorithm and also empirically show that it outperforms the state-of-the-art heuristic approach in learning Bayesian network structures. Secondly, we develop an algorithm to efficiently enumerate the k-best equivalence classes of Bayesian networks where Bayesian networks in the same equivalence class are equally expressive in terms of representing probability distributions. We demonstrate our algorithm in the task of Bayesian model averaging. Our approach goes beyond the maximum-a-posteriori (MAP) model by listing the most likely network structures and their relative likelihood and therefore has important applications in causal structure discovery. Thirdly, we study how parallelism can be used to tackle the exponential time and space complexity in the exact Bayesian structure discovery. We consider the problem of computing the exact posterior probabilities of directed edges in Bayesian networks. We present a parallel algorithm capable of computing the exact posterior probabilities of all possible directed edges with optimal parallel space efficiency and nearly optimal parallel time efficiency. We apply our algorithm to a biological data set for discovering the yeast pheromone response pathways. Fourthly, we develop novel algorithms for computing the exact posterior probabilities of ancestor relations in Bayesian networks. Existing algorithm assumes an order-modular prior over Bayesian networks that does not respect Markov equivalence. Our algorithm allows uniform prior and respects the Markov equivalence. We apply our algorithm to a biological data set for discovering protein signaling pathways. Finally, we introduce Combined student Modeling and prerequisite Discovery (COMMAND), a novel algorithm for jointly inferring a prerequisite graph and a student model from student performance data. COMMAND learns the skill prerequisite relations as a Bayesian network, which is capable of modeling the global prerequisite structure and capturing the conditional independence between skills. Our experiments on simulations and real student data suggest that COMMAND is better than prior methods in the literature. COMMAND is useful for designing intelligent tutoring systems that assess student knowledge or that offer remediation interventions to students.</p

    Finding the k-best Equivalence Classes of Bayesian Network Structures for Model Averaging

    No full text
    In this paper we develop an algorithm to find the k-best equivalence classes of Bayesian networks. Our algorithm is capable of finding much more best DAGs than the previous algorithm that directly finds the k-best DAGs (Tian, He and Ram 2010). We demonstrate our algorithm in the task of Bayesian model averaging. Empirical results show that our algorithm significantly outperforms the k-best DAG algorithm in both time and space to achieve the same quality of approximation. Our algorithm goes beyond the maximum-a-posteriori (MAP) model by listing the most likely network structures and their relative likelihood and therefore has important applications in causal structure discovery
    corecore