8,677 research outputs found
Bioinformatics tools in predictive ecology: Applications to fisheries
This article is made available throught the Brunel Open Access Publishing Fund - Copygith @ 2012 Tucker et al.There has been a huge effort in the advancement of analytical techniques for molecular biological data over the past decade. This has led to many novel algorithms that are specialized to deal with data associated with biological phenomena, such as gene expression and protein interactions. In contrast, ecological data analysis has remained focused to some degree on off-the-shelf statistical techniques though this is starting to change with the adoption of state-of-the-art methods, where few assumptions can be made about the data and a more explorative approach is required, for example, through the use of Bayesian networks. In this paper, some novel bioinformatics tools for microarray data are discussed along with their ‘crossover potential’ with an application to fisheries data. In particular, a focus is made on the development of models that identify functionally equivalent species in different fish communities with the aim of predicting functional collapse
Stochastic dynamic modeling of short gene expression time-series data
Copyright [2008] IEEE. This material is posted here with permission of the IEEE. Such permission of the IEEE does not in any way imply IEEE endorsement of any of Brunel University's products or services. Internal or personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution must be obtained from the IEEE by writing to [email protected]. By choosing to view this document, you agree to all provisions of the copyright laws protecting it.In this paper, the expectation maximization (EM) algorithm is applied for modeling the gene regulatory network from gene time-series data. The gene regulatory network is viewed as a stochastic dynamic model, which consists of the noisy gene measurement from microarray and the gene regulation first-order autoregressive (AR) stochastic dynamic process. By using the EM algorithm, both the model parameters and the actual values of the gene expression levels can be identified simultaneously. Moreover, the algorithm can deal with the sparse parameter identification and the noisy data in an efficient way. It is also shown that the EM algorithm can handle the microarray gene expression data with large number of variables but a small number of observations. The gene expression stochastic dynamic models for four real-world gene expression data sets are constructed to demonstrate the advantages of the introduced algorithm. Several indices are proposed to evaluate the models of inferred gene regulatory networks, and the relevant biological properties are discussed
Evolutionary model type selection for global surrogate modeling
Due to the scale and computational complexity of currently used simulation codes, global surrogate (metamodels) models have become indispensable tools for exploring and understanding the design space. Due to their compact formulation they are cheap to evaluate and thus readily facilitate visualization, design space exploration, rapid prototyping, and sensitivity analysis. They can also be used as accurate building blocks in design packages or larger simulation environments. Consequently, there is great interest in techniques that facilitate the construction of such approximation models while minimizing the computational cost and maximizing model accuracy. Many surrogate model types exist ( Support Vector Machines, Kriging, Neural Networks, etc.) but no type is optimal in all circumstances. Nor is there any hard theory available that can help make this choice. In this paper we present an automatic approach to the model type selection problem. We describe an adaptive global surrogate modeling environment with adaptive sampling, driven by speciated evolution. Different model types are evolved cooperatively using a Genetic Algorithm ( heterogeneous evolution) and compete to approximate the iteratively selected data. In this way the optimal model type and complexity for a given data set or simulation code can be dynamically determined. Its utility and performance is demonstrated on a number of problems where it outperforms traditional sequential execution of each model type
Bayesian network structure learning using characteristic properties of permutation representations with applications to prostate cancer treatment.
Over the last decades, Bayesian Networks (BNs) have become an increasingly popular technique to model data under presence of uncertainty. BNs are probabilistic models that represent relationships between variables by means of a node structure and a set of parameters. Learning efficiently the structure that models a particular dataset is a NP-hard task that requires substantial computational efforts to be successful. Although there exist many families of techniques for this purpose, this thesis focuses on the study and improvement of search and score methods such as Evolutionary Algorithms (EAs). In the domain of BN structure learning, previous work has investigated the use of permutations to represent variable orderings within EAs. In this thesis, the characteristic properties of permutation representations are analysed and used in order to enhance BN structure learning. The thesis assesses well-established algorithms to provide a detailed analysis of the difficulty of learning BN structures using permutation representations. Using selected benchmarks, rugged and plateaued fitness landscapes are identified that result in a loss of population diversity throughout the search. The thesis proposes two approaches to handle the loss of diversity. First, the benefits of introducing the Island Model (IM) paradigm are studied, showing that diversity loss can be significantly reduced. Second, a novel agent-based metaheuristic is presented in which evolution is based on the use of several mutation operators and the definition of a distance metric in permutation spaces. The latter approach shows that diversity can be maintained throughout the search while exploring efficiently the solution space. In addition, the use of IM is investigated in the context of distributed data, a common property of real-world problems. Experiments prove that privacy can be preserved while learning BNs of high quality. Finally, using UK-wide data related to prostate cancer patients, the thesis assesses the general suitability of BNs alongside the proposed learning approaches for medical data modeling. Following comparisons with tools currently used in clinical settings and with alternative classifiers, it is shown that BNs can improve the predictive power of prostate cancer staging tools, a major concern in the field of urology
A hybrid algorithm for Bayesian network structure learning with application to multi-label learning
We present a novel hybrid algorithm for Bayesian network structure learning,
called H2PC. It first reconstructs the skeleton of a Bayesian network and then
performs a Bayesian-scoring greedy hill-climbing search to orient the edges.
The algorithm is based on divide-and-conquer constraint-based subroutines to
learn the local structure around a target variable. We conduct two series of
experimental comparisons of H2PC against Max-Min Hill-Climbing (MMHC), which is
currently the most powerful state-of-the-art algorithm for Bayesian network
structure learning. First, we use eight well-known Bayesian network benchmarks
with various data sizes to assess the quality of the learned structure returned
by the algorithms. Our extensive experiments show that H2PC outperforms MMHC in
terms of goodness of fit to new data and quality of the network structure with
respect to the true dependence structure of the data. Second, we investigate
H2PC's ability to solve the multi-label learning problem. We provide
theoretical results to characterize and identify graphically the so-called
minimal label powersets that appear as irreducible factors in the joint
distribution under the faithfulness condition. The multi-label learning problem
is then decomposed into a series of multi-class classification problems, where
each multi-class variable encodes a label powerset. H2PC is shown to compare
favorably to MMHC in terms of global classification accuracy over ten
multi-label data sets covering different application domains. Overall, our
experiments support the conclusions that local structural learning with H2PC in
the form of local neighborhood induction is a theoretically well-motivated and
empirically effective learning framework that is well suited to multi-label
learning. The source code (in R) of H2PC as well as all data sets used for the
empirical tests are publicly available.Comment: arXiv admin note: text overlap with arXiv:1101.5184 by other author
Design of Evolutionary Methods Applied to the Learning of Bayesian Network Structures
Bayesian Network, Ahmed Rebai (Ed.), ISBN: 978-953-307-124-4, pp. 13-38
Epidemiological Prediction using Deep Learning
Department of Mathematical SciencesAccurate and real-time epidemic disease prediction plays a significant role in the health system and is of great importance for policy making, vaccine distribution and disease control. From the SIR model by Mckendrick and Kermack in the early 1900s, researchers have developed a various mathematical model to forecast the spread of disease. With all attempt, however, the epidemic prediction has always been an ongoing scientific issue due to the limitation that the current model lacks flexibility or shows poor performance. Owing to the temporal and spatial aspect of epidemiological data, the problem fits into the category of time-series forecasting. To capture both aspects of the data, this paper proposes a combination of recent Deep Leaning
models and applies the model to ILI (influenza like illness) data in the United States. Specifically, the graph convolutional network (GCN) model is used to capture the geographical feature of the U.S. regions and the gated recurrent unit (GRU) model is used to capture the temporal dynamics of ILI. The result was compared with the Deep Learning model proposed by other researchers, demonstrating the proposed model outperforms the previous methods.clos
- …