718 research outputs found

    Gene Regulatory Network Reconstruction Using Dynamic Bayesian Networks

    Get PDF
    High-content technologies such as DNA microarrays can provide a system-scale overview of how genes interact with each other in a network context. Various mathematical methods and computational approaches have been proposed to reconstruct GRNs, including Boolean networks, information theory, differential equations and Bayesian networks. GRN reconstruction faces huge intrinsic challenges on both experimental and theoretical fronts, because the inputs and outputs of the molecular processes are unclear and the underlying principles are unknown or too complex. In this work, we focused on improving the accuracy and speed of GRN reconstruction with Dynamic Bayesian based method. A commonly used structure-learning algorithm is based on REVEAL (Reverse Engineering Algorithm). However, this method has some limitations when it is used for reconstructing GRNs. For instance, the two-stage temporal Bayes network (2TBN) cannot be well recovered by application of REVEAL; it has low accuracy and speed for high dimensionality networks that has above a hundred nodes; and it even cannot accomplish the task of reconstructing a network with 400 nodes. We implemented an algorithm for DBN structure learning with Friedman\u27s score function to replace REVEAL, and tested it on reconstruction of both synthetic networks and real yeast networks and compared it with REVEAL in the absence or presence of preprocessed network generated by Zou and Conzen\u27s algorithm. The new score metric improved the precision and recall of GRN reconstruction. Networks of gene interactions were reconstructed using a Dynamic Bayesian Network (DBN) approach and were analyzed to identify the mechanism of chemical-induced reversible neurotoxicity through reconstruction of gene regulatory networks in earthworms with tools curating relevant genes from non-model organism\u27s pathway to model organism pathway

    Network-based analysis of gene expression data

    Get PDF
    The methods of molecular biology for the quantitative measurement of gene expression have undergone a rapid development in the past two decades. High-throughput assays with the microarray and RNA-seq technology now enable whole-genome studies in which several thousands of genes can be measured at a time. However, this has also imposed serious challenges on data storage and analysis, which are subject of the young, but rapidly developing field of computational biology. To explain observations made on such a large scale requires suitable and accordingly scaled models of gene regulation. Detailed models, as available for single genes, need to be extended and assembled in larger networks of regulatory interactions between genes and gene products. Incorporation of such networks into methods for data analysis is crucial to identify molecular mechanisms that are drivers of the observed expression. As methods for this purpose emerge in parallel to each other and without knowing the standard of truth, results need to be critically checked in a competitive setup and in the context of the available rich literature corpus. This work is centered on and contributes to the following subjects, each of which represents important and distinct research topics in the field of computational biology: (i) construction of realistic gene regulatory network models; (ii) detection of subnetworks that are significantly altered in the data under investigation; and (iii) systematic biological interpretation of detected subnetworks. For the construction of regulatory networks, I review existing methods with a focus on curation and inference approaches. I first describe how literature curation can be used to construct a regulatory network for a specific process, using the well-studied diauxic shift in yeast as an example. In particular, I address the question how a detailed understanding, as available for the regulation of single genes, can be scaled-up to the level of larger systems. I subsequently inspect methods for large-scale network inference showing that they are significantly skewed towards master regulators. A recalibration strategy is introduced and applied, yielding an improved genome-wide regulatory network for yeast. To detect significantly altered subnetworks, I introduce GGEA as a method for network-based enrichment analysis. The key idea is to score regulatory interactions within functional gene sets for consistency with the observed expression. Compared to other recently published methods, GGEA yields results that consistently and coherently align expression changes with known regulation types and that are thus easier to explain. I also suggest and discuss several significant enhancements to the original method that are improving its applicability, outcome and runtime. For the systematic detection and interpretation of subnetworks, I have developed the EnrichmentBrowser software package. It implements several state-of-the-art methods besides GGEA, and allows to combine and explore results across methods. As part of the Bioconductor repository, the package provides a unified access to the different methods and, thus, greatly simplifies the usage for biologists. Extensions to this framework, that support automating of biological interpretation routines, are also presented. In conclusion, this work contributes substantially to the research field of network-based analysis of gene expression data with respect to regulatory network construction, subnetwork detection, and their biological interpretation. This also includes recent developments as well as areas of ongoing research, which are discussed in the context of current and future questions arising from the new generation of genomic data

    +microstate: A MATLAB toolbox for brain microstate analysis in sensor and cortical EEG/MEG

    Get PDF
    +microstate is a MATLAB toolbox for brain functional microstate analysis. It builds upon previous EEG microstate literature and toolboxes by including algorithms for source-space microstate analysis. +microstate includes codes for performing individual- and group-level brain microstate analysis in resting-state and task-based data including event-related potentials/fields. Functions are included to visualise and perform statistical analysis of microstate sequences, including novel advanced statistical approaches such as statistical testing for associated functional connectivity patterns, cluster-permutation topographic ANOVAs, and analysis of microstate probabilities in response to stimuli. Additionally, codes for simulating microstate sequences and their associated M/EEG data are included in the toolbox, which can be used to generate artificial data with ground truth microstates and to validate the methodology. +microstate integrates with widely used toolboxes for M/EEG processing including Fieldtrip, SPM, LORETA/sLORETA, EEGLAB, and Brainstorm to aid with accessibility, and includes wrappers for pre-existing toolboxes for brain-state estimation such as Hidden Markov modelling (HMM-MAR) and independent component analysis (FastICA) to aid with direct comparison with these techniques. In this paper, we first introduce +microstate before subsequently performing example analyses using open access datasets to demonstrate and validate the methodology. MATLAB live scripts for each of these analyses are included in +microstate, to act as a tutorial and to aid with reproduction of the results presented in this manuscript

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    Multivariate Models and Algorithms for Systems Biology

    Get PDF
    Rapid advances in high-throughput data acquisition technologies, such as microarraysand next-generation sequencing, have enabled the scientists to interrogate the expression levels of tens of thousands of genes simultaneously. However, challenges remain in developingeffective computational methods for analyzing data generated from such platforms. In thisdissertation, we address some of these challenges. We divide our work into two parts. Inthe first part, we present a suite of multivariate approaches for a reliable discovery of geneclusters, often interpreted as pathway components, from molecular profiling data with replicated measurements. We translate our goal into learning an optimal correlation structure from replicated complete and incomplete measurements. In the second part, we focus on thereconstruction of signal transduction mechanisms in the signaling pathway components. Wepropose gene set based approaches for inferring the structure of a signaling pathway.First, we present a constrained multivariate Gaussian model, referred to as the informed-case model, for estimating the correlation structure from replicated and complete molecular profiling data. Informed-case model generalizes previously known blind-case modelby accommodating prior knowledge of replication mechanisms. Second, we generalize theblind-case model by designing a two-component mixture model. Our idea is to strike anoptimal balance between a fully constrained correlation structure and an unconstrained one.Third, we develop an Expectation-Maximization algorithm to infer the underlying correlation structure from replicated molecular profiling data with missing (incomplete) measurements.We utilize our correlation estimators for clustering real-world replicated complete and incompletemolecular profiling data sets. The above three components constitute the first partof the dissertation. For the structural inference of signaling pathways, we hypothesize a directed signal pathway structure as an ensemble of overlapping and linear signal transduction events. We then propose two algorithms to reverse engineer the underlying signaling pathway structure using unordered gene sets corresponding to signal transduction events. Throughout we treat gene sets as variables and the associated gene orderings as random.The first algorithm has been developed under the Gibbs sampling framework and the secondalgorithm utilizes the framework of simulated annealing. Finally, we summarize our findingsand discuss possible future directions

    Neural Representations and Decoding with Optimized Kernel Density Estimates

    Get PDF
    In in-vivo neurophysiology, firing rates from single neurons are traditionally presented in the form of spike counts or peri-stimulus time histograms which are accumulated and averaged across many presumably identical trials. These histograms may on the one hand provide either only noisy representations of the true underlying spiking activity, or on the other hand do not enable single trial resolution. Kernel density estimates (KDE), a weighted moving average with Gaussian kernels centered around spike times, act as a low-pass filters averaging out rapid changes in the firing frequency. Optimized KDEs with the width of the Gaussians (bandwidth) determined through cross-validation or bootstrapping reflect more accurately the underlying spiking activity and also allow for single trial resolution. We found that optimized bandwidth estimates obtained through unbiased cross-validation (UCV) are an information rich measure, which is applicable to more problems than firing rate estimation, by analyzing both simulations and multiple single-unit recordings from the prefrontal cortex (PFC) of behaving rats. Optimized bandwidth estimates provide a characteristic value for the temporal spiking structure of single units and can be modeled as a function of the temporal precision within spiking patterns accounting for the signal-to-noise ratio in simulated data. The distribution of optimized bandwidth estimates of PFC units and their joint distribution with further spike train metrics allows to segregate groups of cells with distinct spiking properties. Additionally, optimized KDEs obtained with UCV-based bandwidths perform reliable or superior compared to non-optimized KDEs when decoding behavioral events during the task. Moreover, when applied to analyze mechanisms of encoding and internal processing during self-paced cognitive tasks, optimized KDEs facilitate across-trial comparisons of firing activity during trials varying in length, enable to identify neuronal ensembles encoding for task-related events and can unfold population dynamics displaying the underlying neural process

    Can otolith microchemistry be used to identify spawning stocks and characterize the life history of Hickory Shad (Alosa mediocris)?

    Get PDF
    Highly migratory and diadromous fishes present an array of challenges to fisheries managers. This is particularly true when stocks extend across management borders and occupy multiple management jurisdictions. It is valuable for managers to be able to separate fish populations into sub populations and spawning populations because it allows them to allocate sustainable harvest regulations. One popular method of achieving this is the use of otolith microchemistry. Otoliths are paired mechanosensory structures found in the teleost inner ear that help with hearing, balance, and environmental orientation. The same biogeochemical properties that allow otoliths to serve their respective physiological functions allow researchers to make quantitative inferences concerning population diversity, movements, and various other aspects of life history. Here, the species of interest is the Hickory Shad (Alosa mediocris (Mitchell 1814). The Hickory Shad is an anadromous clupeid found in Atlantic coastal systems. During the spring, Hickory Shad migrate inshore to spawn in freshwater where they are a popular target of a multimillion-dollar sport fishery. While the Hickory Shad represents a significant economic asset, little is known about its life history. The current management strategy for Hickory Shad groups it with three other native anadromous clupeids, and management decisions rely on the assumption that Hickory Shad and American Shad (Alosa sapidissima) may have similar life histories. However, the extent to which this represents the actual life history is unknown. For instance, it is assumed that like American Shad, Hickory Shad exhibit natal homing. Yet expression of natal homing has never been confirmed in Hickory Shad The overall goal of this study was to determine if otolith microchemistry could be used to discriminate spawning stocks of Hickory Shad. Hickory Shad were captured in 26 locations within 18 major rivers along the known spawning range. LA-ICP-MS was used to quantify seven elements (Mg, Mn, Cu, Zn, Sr, Ba, and Pb) along a continuous transect that ran from the ventral to dorsal edge through the otolith core, resulting in a time resolved model of the environmental exposure history of each fish. Hickory Shad captured in the same locations frequently had similar element profiles distinct from other capture locations, which immediately suggested natal homing. To test this hypothesis quantitatively, a combination of Bayesian inference and unsupervised learning techniques were used to estimate the natal river element signature of each fish and determine if it was similar in Hickory Shad captured in the same location. A hidden Markov model was fit to the strontium profile of each otolith to identify the initial transition between the natal freshwater river and euryhaline environments, and average element ratios within the first regime were assumed to be estimates of each natal river element signature. Since the origin of each natal river signature was unknown, a Gaussian mixture model was used to estimate the number of mixture distributions (i.e., clusters) present in the data and assign Hickory Shad to each cluster under a probabilistic paradigm. In most cases, between 50% and 100% of Hickory Shad captured in the same location were assigned to the same cluster, indicating that they had similar natal watershed element signatures. A Chi-Square test confirmed that there was a significant relationship between capture location and cluster assignment (p<0.01). These results provide the first piece of evidence that Hickory Shad do exhibit natal homing, and provide an important inferential baseline for further characterization of the rate of natal homing. While these results provided strong evidence of that Hickory Shad exhibit natal homing, they were not able to quantify the spatial extent of natal homing and straying, which in this case would require knowledge of the spatiotemporal variability of element ratios in the spawning rivers. Therefore, the second objective of this study was to quantify element variation in each capture location. Water chemistry data were not available for the capture locations in this study, so elements deposited on the edge of Hickory Shad otoliths were used as a proxy. Element ratios on the ventral otolith edge (~30 [Mu]m of absolute distance) were compared across capture locations. Hickory Shad captured in five locations had distinct ratios of and one or two elements, and these differences were minute. Based on knowledge from previous literature and several empirical observations, I concluded that the edge of Hickory Shad otoliths did not reflect the ambient element ratios of the capture location, which was likely a function of a rapid spawning migration. Overall, the results of this study suggest that otolith microchemistry may be a valuable tool for identifying spawning stocks of Hickory Shad, but the information it can provide may be constrained by a number of physiological, ecological, and life history traits that need further refinement. Our results provide strong evidence that Hickory Shad exhibit natal homing, and element signatures that are incorporated early in life may be useful for further characterizing the rate of natal homing and straying. Results also suggest that otolith element signatures produced later in life may not provide accurate descriptions of environmental exposure histories, and otolith regions produced beyond the first year of life may not be as useful for stock discrimination
    • …
    corecore